Humankind - Personal operating instructions built from your DNA

This is the detailed methodology reference for the Humankind Archetype Classification System. It's written for readers who want to understand exactly how the model works — the scoring approach, the engineering decisions, the limitations, and where we've drawn lines and why.

For the conceptual overview, read the blog series first: Behind the archetypes, The seven biological axes, and The discovery network.

What this system is — and isn't

What it is: A probabilistic model and engineered abstraction designed to make complex genetic information interpretable and actionable.

What it isn't: A direct biological partitioning that discovers "natural types" of humans.

Human biology varies continuously along every dimension we measure. There is no bright line in nature separating an Endurance-Built from a Power-Built. These are operational categories imposed on continuous distributions for interpretability — because continuous scores are harder to remember and apply, because cohorting requires categories, and because operational labels drive action better than percentiles. But the underlying biology is continuous. Your score matters more than your label, and someone just below our threshold is not biologically different from someone just above it.

Everything that follows describes modelling decisions, not biological facts. We think that distinction matters, and we try never to obscure it.

Scoring approach

Variant effect model

For each variant, we define a Pole A effect allele. Your genotype determines your variant effect:

Genotype	Variant effect
Homozygous Pole A	+1.0
Heterozygous	+0.25
Homozygous Pole B	-1.0

The heterozygote weight (+0.25) is a deliberate conservatism choice to reduce boundary over-assignment. It is not a claim about biological dominance or precise effect size.

Weighted variant contribution

Not all variants contribute equally. We weight by evidence strength — variants with strong replication and clear mechanisms contribute more than those with supporting but less robust evidence. These weights are engineering choices that reflect our confidence in each variant's relevance. As evidence accumulates, weights are adjusted.

Axis score calculation

Your score on each axis is a weighted average of contributing variants, normalised by available data:

AxisScore = Σ(VariantEffect × Weight) / Σ(AvailableWeights)

Missing data doesn't artificially inflate or deflate your score — we work with what's available and report confidence accordingly.

Imputation

Consumer genotyping arrays capture approximately 700,000 directly measured variants. We use statistical imputation to infer genotypes at millions of additional positions by leveraging known patterns of genetic co-inheritance (linkage disequilibrium) in reference populations. We use the TOPMed reference panel (approximately 97,000 deeply sequenced genomes) via the Michigan Imputation Server for statistical imputation. TOPMed provides substantially better imputation accuracy than older reference panels, particularly for variants with minor allele frequencies above 1%.

Each imputed variant carries an r² score reflecting inference confidence. We apply strict quality thresholds — higher for individual variant interpretation, lower for aggregate scoring where individual noise averages out across hundreds or thousands of variants. Imputation quality varies by ancestry and genomic region, and we report when coverage is limited.

Genome-wide scoring

For traits where biology is distributed across hundreds of genomic regions, we compute genome-wide scores using SBayesR, a Bayesian multiple regression method that models the distribution of effect sizes across the genome using a mixture of normal distributions. This approach — used by the Polygenic Index Repository and the largest published personality GWAS — weights each variant's contribution based on the surrounding genetic architecture and handles highly polygenic traits (where thousands of variants each contribute tiny effects) more accurately than earlier methods. This produces a single score per trait that captures the aggregate signal across the genome, rather than relying on a curated shortlist of candidate variants.

Scores are normalised against a reference population to produce standardised values and percentile rankings. Reference populations are currently derived from European-ancestry samples, creating known limitations for other populations. We flag reduced confidence rather than reporting potentially miscalibrated results.

Genome-wide scores are computed from summary statistics published by large international consortia. Current sources include: Big Five personality traits (Gupta et al. 2024, N=237K–682K; upgrade to Schwaba et al. 2025, N=611K–1.14M, pending), chronotype (Jones et al. 2019, N=697K), insomnia (Jansen et al. 2019, N=1.3M), ADHD (Demontis et al. 2023, N=1M+), depression (PGC MDD Working Group 2024, N=686K), BMI (GIANT Consortium, N=700K+), and subjective wellbeing (SSGAC / PGI Repository v2). As larger and more diverse studies are published, we update our source data — the pipeline is designed so that swapping in new summary statistics requires no architectural changes.

Hybrid scoring architecture

Axes with strong, well-characterised biological pathways are scored primarily from high-impact variants — a small number of genetic changes with large, well-understood effects. These axes are enriched by genome-wide findings for narrative context, but the core scoring remains anchored to candidate variants.

Axes capturing more distributed traits supplement candidate variant scores with genome-wide scores, blending both signals at configured weights that reflect the evidence base for each axis. Blending weights are calibrated to the published predictive accuracy (R²) of the genome-wide score for each trait — typically 2–9% for personality dimensions — ensuring that directly genotyped high-impact variants remain the primary driver of axis classification. Genome-wide signal provides a principled correction, not a co-equal vote.

Genome-wide signal can confirm, nuance, or occasionally contradict candidate-gene scores. When they disagree, we surface this as a confidence qualifier rather than forcing agreement. When both methods point in the same direction, classification confidence increases.

Composite genetic profiles

Composites combine multiple genome-wide signals into recognisable dimensions of experience. Each composite is a weighted average of standardised scores from related traits, where the weights reflect published genetic correlations between component signals.

We compute composites across three tiers:

Core profiles — computed for every user, covering dimensions like emotional bandwidth, sleep architecture, self-regulation style, temperament, and metabolic margin
Extended profiles — for users who want to go deeper, covering social energy, motivation architecture, resilience, substance response, inflammatory architecture, and biological ageing signals
Child-specific profiles — for parent and practitioner audiences

Composites are further informed by published genetic correlations — the degree to which the same genetic variants influence multiple traits simultaneously. For example, the genetic variants associated with neuroticism also show significant overlap with depression risk, insomnia, and anxiety, while conscientiousness genetics overlap with health behaviours, educational attainment, and longevity. We use these published correlations (estimated via LD Score regression across 67+ outcomes in 9 domains) to contextualise each user's profile: your personality genetic architecture has implications beyond personality itself, connecting to sleep, stress, metabolic health, and wellbeing. Where Mendelian randomisation studies support a causal direction — not just correlation — we note this in the manual.

The component signals are individually derived from published genome-wide studies. The genetic correlations between them are published. The specific composites and their framing are our interpretive layer — informed by science, not direct scientific findings. Composites are hypothesis-generating constructs that make complex multi-trait biology recognisable and actionable. We don't claim they are validated findings.

Signal agreement and tensions

When genome-wide scores and candidate variants point in different directions — or when composite profiles create apparent contradictions — we surface these as tensions rather than hiding them.

For example, a user's candidate-gene stress variants might suggest Fast-Recovering, while their genome-wide signal suggests higher emotional reactivity. Rather than collapsing that disagreement into a single label, we surface it as a nuance worth exploring. Similarly, cross-composite tensions emerge when — say — high emotional bandwidth coexists with high sensory sensitivity, a combination that has specific protocol implications.

Tensions are hypothesis-generating prompts, not validated predictions. They're framed as questions for self-experimentation.

Population calibration

Threshold selection

We set thresholds to identify meaningful minorities — approximately the top and bottom 15% of each axis in our reference distribution. These are designed percentile cut-offs, not biologically natural boundaries.

A five-tier system classifies each axis:

Strongly Pole A
Leans Pole A
Mid-range
Leans Pole B
Strongly Pole B

If your score falls between the Pole A and Pole B thresholds, you're assigned Mixed for that axis. We don't force assignment to the nearest pole. As our user base grows, we recalibrate thresholds empirically based on observed score distributions.

Ancestry considerations

Current thresholds are calibrated on an initial consumer cohort that reflects the historical composition of genetic studies — which have oversampled European populations. This creates known limitations:

Allele frequency variation: Some variants have dramatically different frequencies across populations. A variant common in one population may be rare in another, which means a threshold calibrated on one population may classify individuals from other populations differently than intended.

Genome-wide accuracy: Scores derived from primarily European-ancestry studies have reduced accuracy for other populations. We report ancestry-adjusted confidence levels and flag when results should be interpreted with additional caution.

Important boundary: Allele frequency differences must not be used to infer ancestry. That's not what this system does.

Our approach: we acknowledge this limitation explicitly, report your score and percentile alongside your category, and treat current calibration as provisional. It will improve as cohort size and diversity increase. The cohort model partially addresses this by naturally segmenting people by genetic similarity — but it's not a complete solution, and we don't pretend otherwise.

The variant panel

We directly score 80+ candidate variants across metabolic function, physical architecture, fuel partitioning, sensory processing, circadian rhythm, stress response, inflammatory integration, and drive. Genome-wide scoring draws on millions of imputed variants for traits where biology is distributed across the genome, and we compute genome-wide scores across 18+ published traits.

Selection criteria

Variants are included based on:

Replication: Consistent findings across multiple independent studies
Effect size: Meaningful biological impact, not just statistical significance
Mechanistic plausibility: Understood pathway from genotype to phenotype
Platform coverage: Available on common consumer genotyping arrays

Domain examples

Metabolic Function (Layer 1a): Variants in cytochrome P450 enzymes (the CYP family) that determine how quickly you metabolise caffeine, alcohol, and medications. The panel includes Phase II conjugation enzymes (UGT1A1, NAT2) and glutathione S-transferases (GSTP1). This is the most clinically validated domain — pharmacogenomics is standard of care for some medications.

Physical Architecture (Layer 1a): Variants affecting muscle fibre composition (ACTN3, which influences fast-twitch muscle protein), oxygen utilisation, and cardiovascular efficiency. The panel includes COL5A1 (connective tissue and injury susceptibility), VEGFA (angiogenesis and VO2max training response), and MSTN regulatory variants (muscle mass regulation).

Sensory Gain (Layer 1b): Variants affecting taste sensitivity (TAS2R38), pain processing pathways, and aspects of sensory processing. Enriched with GWAS data from the Global Biobank Engine for pain sensitivity and hearing difficulty phenotypes.

Fuel Partitioning (Layer 1b): Variants affecting energy storage and expenditure balance, anchored by FTO and MC4R (melanocortin 4 receptor, satiety signalling). Supporting variants in PPARG (adipogenesis), ADRB2 (lipolysis rate), and LEPR (leptin receptor). Well-replicated in obesity GWAS meta-analyses and UK Biobank body composition studies. High polygenicity means individual variant effects are modest.

Circadian Rhythm (Layer 2): Variants in core clock genes (CLOCK, PER2, PER3, CRY1) that influence sleep-wake timing and chronotype. Interpretable anchors with replicated associations, though chronotype remains highly polygenic and strongly modulated by light exposure and social schedule.

Stress Response (Layer 2): Variants affecting HPA axis activation and cortisol feedback efficiency, particularly FKBP5 which regulates glucocorticoid receptor sensitivity. Supporting variants include IL6 rs1800795 (stress-inflammation bridge) and IL10 rs1800896 (resolution efficiency). This axis describes physiological recovery dynamics — it does not infer psychological history or trauma.

Drive Profile (Layer 3): Variants anchored on GWAS-replicated risk tolerance loci (CADM2, LINC00961/LRFN2) with supporting evidence from dopaminergic pathway variants (DRD2/ANKK1, MAOA, BDNF). Former candidate gene anchors (DRD4, DRD2) were demoted based on inconsistent GWAS replication.

Inflammatory Integration (Expression-only): Calibration anchors — IL6 rs1800795, TNF rs1800629, CRP rs1205/rs3093077, IL10 rs1800896, IL1B rs16944. These are not used for axis scoring. They establish individual baseline expectations for interpreting inflammatory expression state and CRP values within cohort context.

What we exclude — and why

We deliberately exclude several domains that might seem like obvious candidates:

Clinical risk variants (e.g., APOE for Alzheimer's): Clinical implications require genetic counselling, not a consumer profile
Disease polygenic risk scores: We focus on operating characteristics, not disease prediction
Rare variants: Require whole genome sequencing; consumer arrays miss them
Single-study findings: We wait for replication before including anything
Lipid processing: Insufficient protocol discovery value; clinical territory
Processing speed / reaction time: A unidimensional performance measure, not a bipolar biological trade-off
Educational attainment: A social outcome, not a biological mechanism — the ethical framing challenges are unsolvable
ADHD / autism spectrum as diagnostic labels: We don't classify anyone as "having ADHD" or "being on the spectrum." These are clinical diagnoses, not genetic operating modes. However, we do use genome-wide attention regulation signals (from published ADHD GWAS involving over 1 million participants) as one input into composite profiles like self-regulation, motivation architecture, and executive function support. The distinction matters: we use the underlying polygenic signal as a biological dimension, not the diagnostic category as a label. Autism spectrum polygenic scores are excluded entirely — the genetic architecture is too heterogeneous for meaningful individual-level signal.

The last three are worth dwelling on. We're aware these are axes users sometimes ask about. We exclude them because we think including them would be scientifically misleading and potentially harmful — not because the science doesn't exist, but because the science doesn't support the use case.

Stress Recovery: two-dimensional modelling

The stress axis uses pattern matching rather than linear scoring because stress biology involves two distinct systems:

Dimension 1 — Activation: How readily your stress response engages. Involves HPA axis sensitivity and neurotransmitter clearance rates.

Dimension 2 — Recovery: How quickly your stress response clears. Primarily driven by FKBP5 variants that regulate glucocorticoid receptor sensitivity.

Slow-Recovering and Fast-Recovering describe recovery speed conditional on a normal activation response:

Pattern	Description
Slow-Recovering	Stress response engages effectively but clears slowly; effects linger
Fast-Recovering	Stress response engages effectively and clears quickly; rapid reset
Mixed	No clear directional pattern

Important boundary: This axis describes physiological recovery dynamics only. No psychological, historical, or experiential inference is made or permitted from genetic data.

Rarity

Rarity describes how uncommon your combination of archetypes is within the reference population.

The independence assumption

If axes were fully independent, combination rarity would multiply: a single archetype (~15%) would be common, two together (~2%) uncommon, three or more rare. In reality, axes are not fully independent — pleiotropy (single genes affecting multiple traits), linkage (nearby variants inherited together), and population structure all mean that certain combinations are more common than the multiplicative model predicts.

We report qualitative rarity bands (Common, Uncommon, Rare, Very Rare) rather than false-precision percentiles. Rarity is calculated from Layer 1 and Layer 2 axes only — Drive and Qualifiers are excluded. All rarity statements are relative to the calibrated reference population and are not globally comparable.

Rarity is not a value judgement. "Very Rare" means the intersection of your tendencies puts you in a small cohort — the people whose protocols are most likely to transfer to you.

Five types of confidence

Not all classifications carry equal certainty. We report five separately, because collapsing them into a single confidence score would hide information you need.

Genotype Quality: Did we successfully read the relevant variants? Reflects data completeness — how many of the relevant SNPs were present in your raw file, plus imputation quality scores for inferred variants.

Trait Reliability: How strongly do these genes influence the trait? Varies substantially by domain — Very High for Metabolic Pacing, Moderate for Sensory Gain, Low-Moderate for Drive Profile.

Signal Agreement: For axes using hybrid scoring, do the candidate-gene signal and genome-wide signal agree? Agreement increases confidence; disagreement is surfaced as a qualifier rather than hidden.

Expression Confidence: How reliable is the methylation data? Reflects methylation array coverage for the relevant CpG sites on each axis. Where methylation data is unavailable, expression states are Unknown.

Biomarker Context: Where blood biomarkers are available, each value is reported with your reading, cohort median, cohort interquartile range, and direction of change from previous measurement.

How this differs from consumer polygenic risk scores

We use the same underlying science as polygenic risk scores — genome-wide association data from large population studies. The difference is what we do with it.

Consumer PRS products aggregate thousands of variants to predict a single outcome, usually disease risk. They're statistically powerful but operationally opaque: you get a number, but limited intuition for why or what to do about it. And they're individual — your score has no connection to anyone else's.

Our approach differs in three ways:

Systems, not outcomes. We use genome-wide data to model biological operating dimensions rather than predicting end-states. The results explain how you work, not just what might happen to you.

Interpretable architecture. Seven axes with named poles, cross-axis composites, and tension profiles are easier to reason about than a percentile rank. Raw genome-wide scores never surface to users — they're intermediate calculations that feed into axes, composites, and cohort assignment.

Cohort utility. PRS is individual. Our model cohorts people by shared architecture, enabling collective learning that no individual score can provide. Your genetic profile isn't just a report — it's a key that connects you to your biological peers.

Validated signal, not just statistical association. The largest published personality GWAS (Schwaba et al. 2025) includes within-family analyses showing that direct genetic effects account for approximately 96% of population-level prediction — meaning the personality signals we measure are overwhelmingly genetic, not artefacts of shared family environment. This level of validation is uncommon in consumer genetics and supports the use of personality polygenic scores as genuine biological signal rather than environmental confounds dressed up as genetics.

PRS predicts. We explain and connect.

Limitations

We think it's important to be specific about what the model can't do, not just what it can.

Limitation	Our approach
Archetypes are imposed categories, not natural types	Explicit framing throughout; continuous scores reported alongside labels
Calibration reflects initial cohort composition	Scores and percentiles reported alongside categories; calibration is provisional and will improve
Rarity calculation assumes axis independence	Labelled as theoretical; pleiotropy acknowledged; Layer 3 excluded from rarity
Behavioural genetics has modest individual-level effects	Drive Profile at Layer 3; emphasis on cohort patterns rather than individual prediction
Consumer arrays miss rare variants	Limitation acknowledged; whole genome sequencing would change this
Trait reliability varies substantially by domain	Reported separately from genotype quality; layered by confidence
Biomarkers are state measurements, not diagnostic	Out-of-range values flagged for clinician consultation; cohort context, not clinical interpretation
Imputation quality varies by ancestry and region	Per-variant quality scores; strict thresholds; reduced confidence flagged
Genome-wide scores less accurate for non-European populations	Ancestry-adjusted confidence levels; limitation acknowledged; calibration will improve with diversity
Composites are interpretive, not validated findings	Framed as hypothesis-generating; component signals individually validated; genetic correlations published

Expression layer detail

Per-axis methylation markers

Each axis has associated CpG sites whose methylation levels indicate current expression. States are reported as Baseline, Elevated, Suppressed, or Unknown.

Fuel Partitioning expression is tracked via PPARG promoter and FTO regulatory region methylation. Expression states are labelled Storage-Shifted or Expenditure-Shifted rather than the generic Elevated/Suppressed, to reflect the directional nature of the signal.

Expression data requires methylation array input (e.g., Illumina EPIC) in addition to standard genotyping.

Inflammatory integration state

Three-part structure: genotype-calibrated baselines (IL6, TNF, CRP, IL10, IL1B variants establish individual expected ranges) → methylation measurement (IL6/TNF promoter methylation shows whether inflammation is epigenetically programmed) → CRP validation.

Output states: Quiescent (low inflammatory tone), Acute (transient elevation), Chronic (sustained elevation with epigenetic reprogramming), Unknown.

Biological age clocks

Four established epigenetic clocks reported as a panel with age acceleration metrics: Horvath (pan-tissue epigenetic age), GrimAge (mortality-associated methylation patterns), PhenoAge (phenotypic age calibrated against clinical biomarkers), DunedinPACE (rate of biological ageing per calendar year).

Estimates carry confidence intervals of ±3–5 years. Different clocks can disagree, reflecting different aspects of biological ageing. Single measurements are less informative than trends — what matters is direction over time.

Technical summary

Axes measured	7 (2 Layer 1a + 2 Layer 1b + 2 Layer 2 + 1 Layer 3)
Candidate variant panel	80+ directly scored variants across established domains
Imputation	~700K genotyped → millions of imputed variants via statistical inference
Genome-wide traits	18+ traits scored from published genome-wide association studies
Composite profiles	Three tiers: core, extended, and child-specific
Scoring	Hybrid candidate-gene + genome-wide for applicable axes; candidate-only for high-impact pathway axes
Weighting	Evidence-based, favouring replicated findings with clear mechanisms
Profile assignment	Population-calibrated percentile thresholds (reference-population-specific)
Threshold model	Five-tier: strongly Pole A, leans Pole A, mid-range, leans Pole B, strongly Pole B
Confidence reporting	Genotype quality + trait reliability + signal agreement + expression confidence + biomarker context, reported separately
Cohorting	Layer 1 + Layer 2 axes form cohort key; Layer 3 excluded
Archetype code	One letter per assigned axis in canonical order; absent = mid-range
Expression layer	Per-axis methylation states (Baseline / Elevated / Suppressed / Unknown); requires methylation array
Biological age	Horvath, GrimAge, PhenoAge, DunedinPACE; reported as panel with age acceleration metrics
Validation layer	Per-axis blood biomarkers; quarterly cadence; CRP as whole-system integration marker
Quality assurance	Per-variant imputation quality, per-axis confidence, ancestry adjustment, signal agreement tracking
Calibration	Provisional; subject to revision as cohort grows and diversifies

Technical documentation