Sometime soon, a patient will be sitting nervously in a waiting room, early for her appointment with a genetic counsellor. Her teenage children gave her a direct-to-consumer genotyping kit for Christmas. It was supposed to be fun learning about her heritage and genetics.
She read the disclaimers as she went through the kit. The words explaining how she might learn something she can’t unlearn, and thought to herself, “That’s the point, isn’t it?” But she was not prepared for the reality. When her results arrived, she read with alarm that she was at high risk for breast cancer based on her BRCA1 genetics.
Advanced Lab Management Certificate
The Advanced Lab Management certificate is more than training—it’s a professional advantage.
Gain critical skills and IACET-approved CEUs that make a measurable difference.
Her next weeks would be anxiety-filled, ultimately ending up in the waiting room with a genetic counsellor, only to find out that the direct-to-consumer test was simply wrong. Aside from incorrect results from such services, there are other problems facing genomic scientists and clinicians today, ranging from computing power to sourcing talent. Addressing these problems will be key to furthering genomics research.
What is genomics?
“Genomics” was coined in 1986 by Thomas H. Roderick of the Jackson Laboratory in Bar Harbor, Maine. It became the name of a new journal for all things related to the human genome. Since then, Sanger sequencing using gels and radioactivity has been replaced by next-generation sequencing (NGS) techniques that produce digital readouts of thousands of bases. Third-generation sequencing (TGS) is in a growth phase, promising long reads from single native DNA molecules. Combine sequencing results with the high-powered computations of bioinformatics, and we have genomics as we know it today.
Genomics challenges
Data sets
What is considered “normal” in human genomics is an increasingly complicated question. Ryan Lamont, PhD, clinical associate professor, University of Calgary, explains, “funders don't like to fund sequencing for normal people.” Researchers typically work on problems, so the focus of their grants tends to be on identifying the causes or finding solutions to those problems. In genetics terms, this means funding is linked to patients with the condition, so doing genomics on normal people is not part of the process. Without a wealth of data from normal individuals, the variants that show up when randomly sequencing a person—as in the case of direct-to-consumer kits—have no context.
I asked Lamont about the data from direct-to-consumer sequencing companies to fill in what is normal. Unfortunately, those kits rely on short nucleotide polymorphism sequencing techniques, and they are highly inaccurate. Direct-to-consumer sequencing kits also add to over-burdened medical systems by creating what Lamont calls patients-in-waiting — patients seeing genetic variants in their results, now waiting for the disease. These results are often meaningless because genetics is not the only factor in the development of most diseases. Also, due to the inaccuracy of such kits, many patients get results that are just wrong. These patients end up in the medical system, scared because their fun kit told them they might get breast cancer.
What is considered “normal” in human genomics is an increasingly complicated question.
Storage and computing capacity
Defining what is normal will need decades of further sequencing, but in the meantime, we also need to find ways to store, access, and share that data. NGS machines can produce terabytes of data every year. That data needs to be kept somewhere. This is another sticking point for researchers or clinicians because they don’t always have the funds or the space for on-premises storage. Cloud storage may be a solution for such researchers, but getting approval for storing data out of the control of the lab adds another hurdle.
Large data sets also need computing power for analysis. Wubishet Bekele, PhD, research scientist, Ottawa Research and Development Centre, Agriculture and Agri-Food Canada, sequences thousands of samples every year in his work to create new high-yielding varieties of oats and barley. “My training and prediction data set has reached over 35,000 data points, or individuals, and it's becoming increasingly challenging to predict more than 10 traits using training data from the past almost 10 years.” Often, larger data sets demand more powerful computers to process it all.
Economic sustainability
NGS sequencers are expensive and require skilled technical expertise to operate. Service contracts are also expensive, often as much or more than the initial capital costs. Funding agencies often refuse to pay for service contracts, but if they do, they will only fund such contracts during the initial grant period. What happens after that can create a problem for the continued use of such devices. Researchers may be able to use or create a core facility instead. Core facilities are independent and able to fill up their machines for each run by taking samples from anyone needing the work done.
Bioinformatics expertise
Fast-moving bioinformatics technology changes pose challenges for busy researchers trying to keep up. The lack of standardized bioinformatics pipelines can also cause collaboration complications. For someone like Bekele, who is self-taught in bioinformatics and has a small lab, this means that the assemblies they create can be difficult to share with colleagues in other labs or countries. Until bioinformatics pipelines become standardized, labs are forced to reinvent the wheel.
The technology and bioinformatics pipelines for short-read sequences are becoming standardized, which makes life easier for human geneticists. However, the future for long-read sequences is still catching up. According to Lamont, “I think we are in long-read sequencing where we were in short-read sequencing 10 years ago. So, it's expensive to run. Instruments are few and far between, and the bioinformatics hasn't really caught up yet.”
The future of genomics
Despite the challenges, the potential of precision medicine or the ability to tailor breeding programs to allow Bekele’s oats to survive northern winters is exciting. Lamont and Bekele both shared their vision for where genomics is headed in the next few years:
Reduced costs per sample
Improvements to sequencing technology have greatly reduced costs and increased throughput for short-read sequences. Lamont says, “The sequencing cost is actually the cheapest part of the whole process.” Improvements in machine designs, computing, and sample preparations are some of the reasons that more samples can be sequenced today than even five years ago. These costs are only likely to become less expensive.
International collaborations
As more data is mined, it helps us understand what we consider normal in humans and every other sequencing project. Despite the current turmoil in the world’s political climate, there is still great optimism. Bekele says that international collaboration has led to the creation of 30 oat genome assemblies. “The global community has delivered the oat pangenome. Researchers from Canada, Germany, Finland, [the] USA, Australia, [the] UK, Spain, Sweden, Poland, and Switzerland contribute to genome assemblies.”
Improvements to sequencing technology have greatly reduced costs and increased throughput for short-read sequences.
Machine learning
“In the language space, [machine learning is] much further developed than it is in the genomic space,” says Lamont. The public consciousness proves as much: many people equate artificial intelligence with programs like ChatGPT, which are powered by large language models. Bekele agrees, saying that machine learning isn’t currently better than traditional methods of genomic prediction. However, he already uses some AI tools to check his bioinformatics scripts, and he is hopeful that AI will develop quickly to help ease some of the bioinformatics workload.
Long-read sequencing
Third-generation sequencing is still behind next-generation sequencing. All the competing technologies for long-read sequencing are still very expensive, both in terms of capital cost and per read. However, the potential for long reads is promising. Bekele says, “Most of the panoat assemblies were done using long-read assemblies, and it has already transformed cereal genome assemblies, including oat,” which is four times the size of the human genome.
Epigenetics, such as methylation status, can be determined with some long-read technologies. “We are realizing that we have way more structural variation in the average human gene than short read[s] ever allowed us to think about,” explains Lamont. We still need a catalog of what is normal in terms of the structure of native DNA, but these long reads will facilitate great discoveries challenging the idea that DNA is simply a linear code of four letters.
Genomics has come a long way from 1986, when it started as a new word to name a journal. The inventions of high-throughput sequencers and powerful computational advances allow scientists to sequence an entire genome for less than $1000. Compared to the Human Genome Project’s first human assembly in 2003, which cost $3 billion, that’s a bargain.