More than 60 percent of individuals in the US with European ancestry—including those that have not undergone genetic testing themselves—can be identified through their DNA using data from open genetic genealogy databases, a new study reports.
The results underscore the power of rapidly growing consumer genomic databases and suggest a need for policies designed to both ensure people's genetic privacy and to prevent the misuse of publicly available genetic information. Direct-to-consumer genetic testing and related third-party services, particularly those that offer genetic genealogical ancestry (the identification of relatives through shared DNA), have witnessed a meteoric rise in popularity. However, these services are increasingly being used by law enforcement agencies for forensic purposes. Perhaps the most notable recent case resulted in identifying a suspect in the "Golden State Killer" case, where the suspect's identity was discovered by tracking down genetic relatives found in an open consumer genomic database, using crime scene DNA.
To better understand the forensic power such methods have in identifying unknown individuals, Yaniv Erlich and colleagues analyzed a dataset of more than 1.2 million anonymous individuals who had undergone commercial sequencing with the consumer genetic provider My Heritage (a company for which Erlich is the Chief Science Officer). For more than 60 percent of the individuals within the dataset, a family member with matching DNA segments roughly corresponding to a third cousin relation or closer was also found.
Furthermore, using publicly available genealogical records, Erlich et al. demonstrate that once one or more relatives are found, the identity of an individual can be determined through family lineages combined with specific demographic information, such as approximate age or area of residence. To illustrate this potential, the authors used the method to reconstruct the identity of an anonymous woman whose DNA information was publicly available on the internet. The authors note that their results raise significant privacy concerns and they suggest that reevaluation of current DNA data practices is necessary at both commercial and federal levels. While the data used represents only a small portion of the US population, Erlich et al. found that once a genetic database covers roughly two percent of a target population, nearly any person within that group could be matched at least at a third cousin level. Given the rapid growth of consumer genomics, such possibilities are likely achievable in the near future, according to the authors.