Why do we care about studying human genetics? One reason is our innate curiosity about ourselves and our evolution. Another reason is the benefits for human welfare—how genetics contribute to many diseases and phenotypic traits. It is also a powerful tool for scientific breakthroughs and exploring questions of identity and rights.
The study of human genetics has exploded over the past three decades with the Human Genome Project. Unfortunately, this research has often exhibited inherent biases, primarily stemming from the overrepresentation of individuals of European descent. These biases, and our understanding of the human genome, extend to epigenomics and anchor the other ‘omics—especially transcriptomics. Addressing these biases requires a concerted effort to enhance inclusivity, build trust, and establish partnerships with historically marginalized groups.
Scientists are currently working to increase diversity with the new reference genome. With this goal at the forefront, we can improve representation in research and work toward dismantling inherent biases as a matter of justice, equity, and scientific advancement.
Understanding our diversity
Humans possess a genetic variation of about 0.1 percent. This translates to approximately three million of the three billion base pairs in a haploid cell. This genetic variation encodes for the diverse traits we see and many that we don’t, as well as unexpressed or silent variation.
One way to examine genetics is by separating the human population based on major continents. Studies have shown that about 90 percent of genetic variation exists within populations, and only 10 percent exists between populations.1 This has significant social implications, as it demonstrates the lack of biological basis for discrimination amongst racialized groups.
The need to improve
Most data in genome-wide association studies (GWAS) comes from individuals of European descent. Consequently, European genomics is better characterized when compared to people of non-European ancestry. A combination of logistical, systemic, and historical factors likely perpetuates this engrained bias.
Populations of African descent harbor the most genomic and phenotypic diversity. Therefore, by including more representative sampling in GWAS, we can gain novel insights into genome biology, overcome systemic biases in policies and procedures, and improve clinical care. Until high-powered GWAS representing all major human ancestral populations are standard, we will continue to produce inaccurate and incomplete results.
Studies have shown that about 90 percent of genetic variation exists within populations, and only 10 percent exists between populations.
Whole-genome sequencing (WGS) is commonly used to infer the cause of rare undiagnosed diseases by determining if putative genes are rare across populations. If databases are insufficient, a benign variant with a high prevalence in only one population may be mistaken for pathogenic and universal. For example, cystic fibrosis is statistically more prevalent in Europeans than people of African descent, but it is often underdiagnosed due to differing causative alleles. Generally, genetic markers optimized to reflect diversity in one population will not accurately reflect it in another, highlighting the importance of increasing diversity to assess the validity and broader relevance of findings. The lack of diversity in these studies risks the incorrect translation of genetic research into clinical practice or public health policy.
Forensics can introduce further sampling bias. When researchers use biological samples originally collected for forensic purposes in unrelated studies, sampling biases will systematically favor some outcomes over others. This can create misleading correlations between race, genetics, and behavior.
The reference genome
Over one million human genomes have been sequenced, and most are analyzed by alignment with the reference genome, GRCh38. The reference genome is the standard comparison to which individual fragments are mapped.
GRCh38 has several limitations. It mainly represents a single haplotype, leading to an incomplete representation of genetic diversity.2 Many reads in WGS are discarded because they cannot be mapped to GRCh38. This introduces reference bias, as reads containing the reference alleles have better odds of correctly mapping than reads containing alternate alleles.
To address these limitations and better represent human genomic diversity, efforts are underway to modernize the reference genome through the Human Pangenome Reference Consortium (HPRC). Rather than using one “representative” genome as a reference, pangenomes incorporate many genomes to represent species-level diversity better. Currently, the pangenome contains 47 genome sequences from genetically diverse people, with plans to increase this number to 350 by mid-2024.3 Establishing a comprehensive reference is essential in addressing the bias of previous genomic research.
The creation of the pangenome has been made possible with the development of long-read sequencing technologies. These technologies have enabled the assembly of large and repeat-rich regions and improved the representation of GC-rich genomic regions that are typically missing in shorter reads.
The pangenome has various benefits, including removing reference bias, creating a community resource for sequence mapping, and enabling the study of disease variants. Enriching the reference genome will enhance our general understanding of genomics and allow all populations to benefit from downstream applications. The HPRC also includes an ethics group that is working with Indigenous communities to incorporate their genome sequences and helping to guide informed consent, prioritize different samples, and explore regulatory issues for clinical application. Establishing these standards will set the stage for disseminating science to diverse communities.
Rather than using one "representative" genome as a reference, pangenomes incorporate many genomes to represent species-level diversity better.
The HPRC recognizes that widespread adoption of the pangenome is essential yet challenging. Efforts to publicize the consortium and engage end-users have already begun with the creation of a website and social media accounts. Additionally, the HRPC will develop training materials that explain the new pangenome reference and its relation to GRCh38. The current linear reference tools will also seamlessly integrate with the pangenome reference, and pangenome-based results will be translatable to the current reference to facilitate user accessibility.
Genomic diversity and human evolution
Decoding the human genome could reveal answers to fundamental questions about human origins and the genetic basis for unique traits. The recent innovation in sequencing technology has enabled the analysis of short mitochondrial and nuclear DNA segments from diverse archaic hominins and prehistoric humans. These data inform historical human migration and genetic admixture events that have shaped health and disease.4
New insights into genetic diversity across the African continent have revealed 11 ancestral populations with geographic and linguistic separation. By comparison, only 12 ancestries have been identified in the rest of the world.5 Understanding this diversity can provide insights into human history, lineage patterns, and how adaptations to environmental challenges have shaped the human genome.
Broader impacts of genomic diversity
The lack of genomic diversity has a large role in healthcare and research outcomes, but this impact seeps into many other facets of life. One consequence of excluding diverse groups from studies is that scientific breakthroughs or “evidence-based” policy decisions primarily benefit privileged populations. This is evident in the exclusionary criteria for women’s testosterone levels in the Olympics, which fails to include the range of natural expression levels and can disproportionately affect women in minority groups.6,7
Recognizing and incorporating the diversity of the human genome is crucial for advancing scientific knowledge and creating a more equitable society. Recent updates have made significant strides in understanding the intricate tapestry of human genetic variations. Further addressing inherent biases and actively including diverse populations will pave the way for a more inclusive understanding of genetics and uphold ethical principles in genomics and society.
1. Lewontin, R. C. “The apportionment of human diversity.” Evol Biol 6, 381–398 (1972).
2. Aganezov, S. et al. “A complete reference genome improves analysis of human genetic variation.” Science (1979) 376, eabl3533 (2022).
3. Liao, W. W. et al. “A draft human pangenome reference.” Nature 617, 312–324 (2023).
4. Pollen, A. A., Kilik, U., Lowe, C. B. & Camp, J. G. “Human-specific genetics: new tools to explore the molecular and cellular basis of human evolution.” Nat Rev Genet (2023) doi:10.1038/s41576-022-00568-4.
5. Bentley, A. R., Callier, S. & Rotimi, C. N. “Diversity and inclusion in genomic research: why the uneven progress?” J Community Genet 8, 255–266 (2017).
6. Javorsky, E., Perkins, A. C., Hillebrand, G., Miyamoto, K. & Kimball, A. B. “Race, rather than skin pigmentation, predicts facial hair growth in women.” J Clin Aesthet Dermatol 7, 24–26 (2014).
7. Karrer-Voegeli, S. et al. “Androgen dependence of hirsutism, acne, and alopecia in women.” Medicine 88, 32–45 (2009).