When Watson and Crick elucidated the structure of DNA in 1953, their Nobel Prizewinning research raised immediate issues for the nascent field of genomics: how to analyze, characterize, and sequence genes. Improvements on crude, early sequencing protocols proceeded slowly until Frederick Sanger broke the logjam with a “chain termination,” or dideoxy sequencing method, which became known as Sanger sequencing. Sanger expressed hope in his 1980 Nobel Prize autobiography that his technique would “contribute ... to our understanding of living matter.” For those desiring a more in-depth explanation of the technical details, James Heather and Benjamin Chain’s 2016 review of the history of gene sequencing is required reading. Similarly, a review in the journal Viruses provides a comprehensive list of significant improvements up to the age of next-generation sequencing (NGS).
Sanger sequencing improved over the years, in large part due to automation, and was the basis for sequencing the first human genome in 2000. In less than two decades, however, NGS has overtaken Sanger sequencing due to its high throughput, parallel operation, and much lower cost per base.
Yet Sanger sequencing is not going away, not by a long shot. Sanger and NGS are developing, evolving, and finding their rightful places within genomics and with respect to each other. Hence the preference for NGS in situations where many genes must be sequenced simultaneously, in discovery mode for novel gene variants, for low-abundance samples, and for microbial genomes (particularly for pathogen subtyping).
Sanger remains useful for sequencing single genes or amplicon targets of up to 100 base pairs in length, for projects involving 96 or fewer samples, for microbial identification and gene fragment analysis, and for analyzing short tandem repeats. Moreover, Sanger is considered the “gold standard” sequencing method for validating the sequence of specific genes, including those already sequenced through NGS.
“Next-generation sequencing and Sanger sequencing go hand in hand and are not mutually exclusive,” says Tammy Joska, PhD, senior scientist at GENEWIZ (South Plainfield, NJ). “Experimental applications involving NGS frequently rely on Sanger sequencing, but Sanger sequencing applications do not rely on NGS.”
Sanger, Joska explains, is ideal for sequencing homogeneous samples that include one template, one gene, or one region, whereas NGS provides a large multiple of such reads and is capable of sequencing heterogeneous samples. NGS is great for diverse gene populations, in a tumor, or in a PCR product containing many variants of a gene at one location, for example. NGS will support the identification and quantitation of each variant from within a given sample, whereas Sanger provides qualitative results only. The gene of interest or expected sequence either is or is not there. NGS supports broad applications such as whole genome sequencing, whole exome sequencing, and whole transcriptome sequencing or RNA sequencing, etc.
Additionally, sequencing large contiguous genomic fragments directly is often not possible with sequence length-limited NGS (illumina short-read technology) but relatively simple using Sanger methods. Advancements in NGS long-read technology, such as with the PacBio Sequel, are now enabling NGS approaches to begin to cover larger contiguous fragments; however, Sanger can still be a more cost-effective choice when it comes to small-scale questions.
Biologists also call on Sanger sequencing to solve problems left unanswered by short-read NGS technologies. Some genes, for example, contain extended regions of repeat units, many of which are implicated in human disease. Short-read NGS, which assembles larger sequences from shorter fragments of up to about 250 base pairs, can miss the significance or even the existence of such repeat regions. Another problem is regions with high GC content, which short-read NGS may read poorly. “When the number of reads you get from a specific location is low, you can validate your result through Sanger sequencing,” Joska tells Lab Manager.
One of the great draws of NGS has been its Moore’s Lawtype improvement, to the point of predicting the eventual availability of genomes (or at least exomes) at a cost of just a few hundred dollars. Sanger sequencing has improved tremendously as well, although at nowhere near that pace. “Sanger chemistries have changed over time to provide longer and more robust reads,” Joska says. “Automation has also shortened analysis time and the delivery of sequencing results to customers, and quality is definitely improving. Our development team is constantly working to develop methods for difficult templates.”
Related Article: New Protein Sequencing Method Could Transform Biological Research
Joska believes that while NGS improvements will continue at least for a while at an exponential pace, advances in Sanger sequencing will be less headline-worthy.
Chemistries will continue to improve, as will—thanks to automation and bioinformatics—throughput and parallelism, according to Joska. “And we’re having greater success with difficult sequences, things that were not sequenceable just a few years ago. We’re getting better at genes with high GC content and are constantly increasing the size limit of what is achievable with Sanger. Where a typical Sanger sequencing plasmid template is up to 10 kilobases in size, we’ve had success sequencing from 40 kilobase constructs.”
Toward a "Super Sanger"
“Sanger sequencing provides longer read lengths than shortread NGS, so it is often the method of choice when you have just a few samples,” says Jonas Korlach, PhD, chief scientific officer at Pacific Biosciences (Menlo Park, CA). Throughput is quite high with NGS, so using it for a single gene or for just a few samples isn’t cost-effective. Sanger sequencing is also preferred for sequences spanning 600 to 700 bases. “Short NGS reads cover only 150 bases or so. Stitching together multiple short sequences becomes cumbersome.”
Gene product and services companies turn to Sanger sequencing to validate the sequences of commercial microbiological constructs including plasmids, vectors, cloned genes, or artificial DNA to ensure that the product contains the desired sequence.
Pacific Biosciences has made its reputation on a sequencing platform, Sequel, based on proprietary single-molecule real-time (SMRT) sequencing. SMRT is based on the importance of long-read sequencing data, a feature lacking with NGS and addressed to some degree by Sanger sequencing. SMRT allows direct sequencing through uniform coverage of very long reads—typically 15,000, sometimes as many as 100,000, base pairs.
Reads significantly longer than Sanger reads allow the assembly of high-quality de novo genomes, cataloging of full-length isoforms, unambiguous sequence alignment, characterization of fully phased alleles, full reads of repetitive and complex elements, and resolution of structural variants. Pacific Biosciences claims the “longest average read lengths” at the “highest consensus accuracy,” and SMRT is suitable for direct epigenetic characterization and single-molecule resolution. To further blur the distinction between Sanger and NGS capabilities, SMRT works on multiple samples.
A 2018 paper by researchers at the University of Guelph (Guelph, Ontario, Canada) testing the Sequel platform on 658-base-pair amplicons of the mitochondrial cytochrome c oxidase I gene concluded that, “SMRT and Sanger sequences were very similar, but SMRT sequencing provided more complete coverage, especially for amplicons with homopolymer tracts. Because it can characterize amplicon pools from 10,000 DNA extracts in a single run, the Sequel can greatly reduce sequencing costs in comparison to first (Sanger) and second-generation platforms (Illumina, Ion).”
The Guelph group found that Sequel was more accurate than Sanger sequencing and less costly by a factor of 40, which “makes a strong financial argument for adopting this technology,” according to Korlach.
“If you have just a few samples, then our system doesn’t make sense,” he adds, “just as with NGS. But the throughput advantages for larger sample sets mean higher throughput, which translates directly to time and cost savings.” One Pacific Biosciences customer estimated the cost of sequencing 200,000 samples via the Sanger method at $1.2 million, compared with $30,000 with Sequel.
“It’s been a given that reduction in cost of short-read NGS always came at the expense of quality, of not having complete genome analysis, of missing structural variations,” Korlach says. Despite being less expensive than Sanger sequencing, NGS still has not reached a level of quality and reliability sufficient to support personalized medicine. “SMRT has the potential to replicate Moore’s Law-type improvements in accuracy and quality, to be a driver for precision medicine, because you can’t have precision medicine supported by imprecise methods.”
Questioning the Gold Standard
The primary advantage of NGS at its introduction was its ability to sequence millions of DNA sequences at once, compared with just 384 sequences for Sanger sequencing. Yet current best practices call for confirming critical NGS findings with the Sanger method. Investigators are beginning to question whether this makes sense.
In a 2016 study, Leslie Biesecker, MD, and co-workers at the National Human Genome Research Institute noted that no hard evidence supports the alleged superior accuracy of the Sanger method. According to their report, NGS may be even more accurate than Sanger sequencing in some situations, which, if true, holds important implications for clinical DNA analysis.
Biesecker compared the two methods systematically, using patient DNA samples from ClinSeq®, a DNA sequencing project targeting mostly healthy individuals. Researchers focused on 19 genes commonly analyzed through genetic testing and found that within five samples, Sanger offered no independent advantage, according to Biesecker, “because no discrepancies between the two techniques were identified.” Biesecker then subjected a larger sample to dual testing and found that of the 5,660 variants identified by NGS, Sanger sequencing could not confirm 19. Reanalysis of these samples reduced the unconfirmed sequences to just two, a confirmation rate of 99.965 percent.
Biesecker concluded that Sanger sequencing confirmation was unnecessary, since it possibly introduced more errors than it corrected. Under standard testing protocols, 17 of 19 results would have been erroneously discarded.
At the time, Biesecker said, “We didn't expect to find this at all. We expected that the Sanger sequencing would correct NGS, which was wrong. Instead, this means that when a clinical lab uses Sanger sequencing to validate results, it is more likely to discard results that were in fact true from the NGS than it is to find NGS errors.”