How it works

The whole pipeline, in plain English.

No black box. Here's exactly what happens between you uploading a file and seeing your report.

Step 1 — You upload your raw DNA file

The file you get from 23andMe ("Browse Raw Data → Download"), AncestryDNA ("DNA Settings → Download Raw DNA Data"), or MyHeritage ("DNA → Manage DNA Kits → Download"). It's a text file (or a .zip containing one) with ~600,000 rows, one per single-nucleotide polymorphism (SNP) on the chip.

PharmTwin auto-detects the format (each vendor uses a slightly different column layout) and accepts the .zip directly so you don't have to extract anything.

Step 2 — We extract the ~1,200 SNPs that matter for drug response

Your file has ~600k SNPs. We only need the subset that the pharmacogene panel actually uses — about 1,200 specific positions in 23 genes (CYP2C19, CYP2D6, CYP2C9, VKORC1, SLCO1B1, RYR1, and others). We match by rsID, which means no coordinate-system conversion is needed: both consumer arrays and the PharmCAT panel use stable rsIDs across reference genome builds.

For a typical 23andMe v5 file we cover about 220 of those 1,200 panel positions — the rest are rare variants not on the consumer chip. PharmCAT handles missing positions cleanly by marking them as no-call rather than guessing.

Step 3 — PharmCAT calls your diplotypes

We use PharmCAT — the open-source variant-caller and clinical-recommendation engine published by Stanford and PharmGKB. It's the same software that university medical centers run on their CLIA-certified data, just applied to your consumer file. It outputs star-allele diplotypes (e.g., CYP2C19 *2/*2) and phenotypes (e.g., Poor Metabolizer) for each of the 23 pharmacogenes.

We run PharmCAT twice for one specific reason. PharmCAT's research mode can call CYP2D6 SNV-based diplotypes — but it refuses to generate full drug recommendations when research mode is on. So we run pass 1 in research mode to extract the CYP2D6 call, write it to PharmCAT's official "outside call" file, then run pass 2 in normal mode with that file fed back in. The result: CYP2D6-dependent drug recommendations that actually reflect your CYP2D6 phenotype. (CYP2D6 copy-number variants and gene-hybrid alleles still aren't detectable from SNP-array data — that requires sequencing.)

Step 4 — We match diplotypes against four guideline sources

Every drug recommendation in PharmTwin comes from one of four published sources:

When the same drug appears in multiple sources, the highest-evidence source wins for the headline recommendation — but every drug card shows the others so you can see the consensus. "CPIC Strong + DPWG + FDA Label all say avoid clopidogrel for poor metabolizers" is a stronger signal than any one of them alone.

Step 5 — We write your explanations

For each actionable drug, we generate a 2-3 sentence plain-English explanation using OpenAI's gpt-4o-mini. The model only sees your gene-level results (e.g. "CYP2C19 Poor Metabolizer" + the verbatim CPIC text for the relevant drug) — never your raw DNA sequence or individual SNP identifiers. The system prompt is strict: only use the data provided, never invent CPIC text, always frame action as "discuss with your prescriber," and never give medical advice.

All ~50 narrations are generated in parallel and cached, so re-generating your report is free.

Step 6 — You get the report, chat, and doctor PDF

Everything renders into a single page: gene profile on top, drug recommendations by therapeutic category, conversational chat panel on the right side. The chat is grounded in your structured records — you can ask "Is Plavix ok for me?", "Anything I should tell my anesthesiologist?", "Are any antidepressants safer for me than others?" — and every answer cites the drug records it was based on.

One click downloads the doctor-handoff PDF: ~19 pages, tables (not consumer cards), verbatim CPIC text, PubMed citations, and the CLIA disclaimer prominently at the top.


What this pipeline does not do

Get my report — $49 More questions →