The whole pipeline, in plain English.
No black box. Here's exactly what happens between you uploading a file and seeing your report.
Step 1 — You upload your raw DNA file
The file you get from 23andMe ("Browse Raw Data → Download"), AncestryDNA ("DNA Settings → Download Raw DNA Data"), or MyHeritage ("DNA → Manage DNA Kits → Download"). It's a text file (or a .zip containing one) with ~600,000 rows, one per single-nucleotide polymorphism (SNP) on the chip.
PharmTwin auto-detects the format (each vendor uses a slightly different column layout) and accepts the .zip directly so you don't have to extract anything.
Step 2 — We extract the ~1,200 SNPs that matter for drug response
Your file has ~600k SNPs. We only need the subset that the pharmacogene panel actually uses — about 1,200 specific positions in 23 genes (CYP2C19, CYP2D6, CYP2C9, VKORC1, SLCO1B1, RYR1, and others). We match by rsID, which means no coordinate-system conversion is needed: both consumer arrays and the PharmCAT panel use stable rsIDs across reference genome builds.
For a typical 23andMe v5 file we cover about 220 of those 1,200 panel positions — the rest are rare variants not on the consumer chip. PharmCAT handles missing positions cleanly by marking them as no-call rather than guessing.
Step 3 — PharmCAT calls your diplotypes
We use PharmCAT — the open-source variant-caller and clinical-recommendation engine published by Stanford and PharmGKB. It's the same software that university medical centers run on their CLIA-certified data, just applied to your consumer file. It outputs star-allele diplotypes (e.g., CYP2C19 *2/*2) and phenotypes (e.g., Poor Metabolizer) for each of the 23 pharmacogenes.
We run PharmCAT twice for one specific reason. PharmCAT's research mode can call CYP2D6 SNV-based diplotypes — but it refuses to generate full drug recommendations when research mode is on. So we run pass 1 in research mode to extract the CYP2D6 call, write it to PharmCAT's official "outside call" file, then run pass 2 in normal mode with that file fed back in. The result: CYP2D6-dependent drug recommendations that actually reflect your CYP2D6 phenotype. (CYP2D6 copy-number variants and gene-hybrid alleles still aren't detectable from SNP-array data — that requires sequencing.)
Step 4 — We match diplotypes against four guideline sources
Every drug recommendation in PharmTwin comes from one of four published sources:
- CPIC — the Clinical Pharmacogenetics Implementation Consortium. Peer-reviewed, graded evidence (Strong / Moderate / Optional), the gold standard for drug-gene clinical guidance. Most of your recommendations come from here.
- DPWG — the Dutch Pharmacogenetics Working Group. Peer-reviewed, sometimes covers drugs CPIC hasn't graded yet (acenocoumarol, mavacamten, quetiapine).
- FDA Label — drugs whose FDA-approved prescribing label specifically cites a pharmacogenomic marker (Cibinqo/abrocitinib, esomeprazole, clobazam, etc.).
- FDA PGx Association — the FDA's published table of pharmacogenomic associations. Informational; weakest of the four tiers.
When the same drug appears in multiple sources, the highest-evidence source wins for the headline recommendation — but every drug card shows the others so you can see the consensus. "CPIC Strong + DPWG + FDA Label all say avoid clopidogrel for poor metabolizers" is a stronger signal than any one of them alone.
Step 5 — We write your explanations
For each actionable drug, we generate a 2-3 sentence plain-English explanation using OpenAI's gpt-4o-mini. The model only sees your gene-level results (e.g. "CYP2C19 Poor Metabolizer" + the verbatim CPIC text for the relevant drug) — never your raw DNA sequence or individual SNP identifiers. The system prompt is strict: only use the data provided, never invent CPIC text, always frame action as "discuss with your prescriber," and never give medical advice.
All ~50 narrations are generated in parallel and cached, so re-generating your report is free.
Step 6 — You get the report, chat, and doctor PDF
Everything renders into a single page: gene profile on top, drug recommendations by therapeutic category, conversational chat panel on the right side. The chat is grounded in your structured records — you can ask "Is Plavix ok for me?", "Anything I should tell my anesthesiologist?", "Are any antidepressants safer for me than others?" — and every answer cites the drug records it was based on.
One click downloads the doctor-handoff PDF: ~19 pages, tables (not consumer cards), verbatim CPIC text, PubMed citations, and the CLIA disclaimer prominently at the top.
What this pipeline does not do
- It is not CLIA-certified. Consumer SNP arrays are run for ancestry, not medicine. For any avoid-level finding, the doctor PDF explicitly recommends confirming with CLIA-certified pharmacogenomic testing before changing therapy.
- It does not detect CYP2D6 copy-number variants (whole-gene deletions like *5, duplications, or CYP2D7 hybrid alleles). Those require long-read or copy-number-aware sequencing data.
- It does not perform HLA typing. HLA-B*15:02 (carbamazepine), HLA-B*57:01 (abacavir), and other HLA-driven recommendations require dedicated typing assays.
- It does not interpret cancer risk, polygenic scores, carrier status, or anything outside the pharmacogenomic scope.