AncestryDNA® Learning Hub

 

AncestryDNA® Learning Hub

 

AncestryDNA® Learning Hub

DNA Sequencing

DNA sequencing is the process of reading every single nucleotide base pair in a string of DNA. This could mean reading the DNA sequence of a single gene, all the genes in a person’s DNA, or their whole genome including the non-coding DNA.

The results of this scientific analysis aren’t just interesting to know—they could be vital. For example, the reading of DNA sequences can mean that physicians may now be able to provide more targeted medical treatments based on genetics.

The Human Genome Project

Perhaps the most well-known example of DNA sequencing is the Human Genome Project, which started in 1990. The project’s ambitious goal was nothing less than to characterize a human’s whole genome sequence for the first time. The Project was able to successfully map out 92% of the DNA sequence, and in 2001 the “first-draft” of the human genome was published. In 2004, a more complete and updated version was published.

But scientists’ ability to accomplish such a feat didn’t happen overnight. Sequencing techniques and technologies have improved substantially since first being developed in the 1970s. As a result, it has become easier, less expensive, and faster to sequence large amounts of DNA. This has expanded the opportunities and applications for gene sequencing.

How DNA Sequencing Evolved

After DNA’s structure was discovered in 1953, it took several more years before scientists were able to read the base pair sequence of even a very short DNA strand. Ray Wu and Armin D. Kaiser were the first to develop a method for reading a DNA sequence in 1968. The first DNA sequence they reported was only about a dozen nucleotides of a virus’s genome.

A breakthrough came in 1977 when the biochemist Frederick Sanger published his method for DNA sequencing. It described the full sequence—approximately 5,000 nucleotides—of a virus’s genome. Sanger’s method proved both reliable and adaptable, and updates to it eventually allowed for fully automated sequencing. The first-generation Sanger method would become the dominant one for the next 30 years.

The second generation of sequencing methods, commonly referred to as “next-generation sequencing,” relied on innovations in chemistry, computer vision, and bioinformatics to read millions of DNA fragments simultaneously. Next-generation sequencing transformed the biotechnology industry with its speed, reliability, and cost-effectiveness. In fact, sequencing data became so abundant that it fueled an equally rapid revolution in bioinformatics to analyze all the data.

A third generation of sequencers soon followed, which excelled at reading longer DNA fragments than previous generations. Using third-generation sequencers, scientists completed the “telomere-to-telomere” sequencing of the human genome in 2022—decoding the last 8% of the human genome. Third-generation sequencers were well suited to read the many long stretches of highly repetitive DNA that had to be left unresolved by Sanger and second-generation sequencers during the Human Genome Project.

Today, second- and third-generation sequencing are commonly used in tandem. Their integrated use has enhanced the speed and lowered the cost of DNA sequencing. Though the first human genome took $3 billion and 15 years to sequence, by 2015 a single human genome could be sequenced for under $1,000 in a few days. And by 2022, some companies were already promising a full human genome sequence for $200.

DNA Sequencing Technologies

Here’s a brief look at the features that distinguish the main sequencing techniques and technologies. In all cases, bioinformatic tools play a crucial role in analyzing and reconstructing a full DNA sequence.

  • Sanger Sequencing (First Generation): A template DNA strand is combined with chain-terminating, fluorescently labeled nucleotides to produce DNA fragments. These fragments, of different lengths, are separated by size through a process called capillary electrophoresis. The fragments are then recorded with human or computer vision.
  • Sequencing by Synthesis (Second Generation): Single-stranded DNA templates are first affixed to a glass slide and clonally amplified. Next, fluorescently labeled reversible terminator nucleotides are added sequentially to synthesize a complementary DNA strand. The identity and order of added bases is recorded with computer vision.
  • Single Molecule Real-Time Sequencing (Third Generation): First, a single-stranded DNA template is isolated in a micro-well. Fluorescently labeled nucleotides are then used to synthesize a complementary DNA strand with a “slow-motion” DNA polymerase. This technique allows for the identity and order of each added base to be recorded with computer vision.
  • Real-Time Nanopore Sequencing (Third Generation): A DNA molecule is passed through a nanopore in an ultra-thin membrane. The changes in electrical current across the membrane surface are used to decode the identity and order of bases in the DNA strand. Without the need for labeled nucleotides, nanopore sequencers can be reduced to the size of a smartphone or thumb drive.

DNA Sequencing Uses Today

As DNA sequencing has become easier, it’s been used in more applications. Here are a few examples:

  • The spread of infectious diseases can be tracked by sequencing DNA found in a community’s wastewater, which allows for the monitoring of certain viruses and bacteria.
  • Fraud can be identified by sequencing fish or meat at the market. This makes sure the products match their labels (and are not a cheaper or adulterated product).
  • Sequencing a patient’s cancer cells can help determine which specific mutations they carry in their genetic code. This in turn enables physicians to administer medications that are likely to have the greatest effect.
  • Digital information can now be stored and retrieved in synthesized strands of DNA. This provides an alternative—and more stable way—to store information compared to hard-drives or flash drives.

Other Ways to Analyze DNA

An AncestryDNA® test uses SNP-genotyping—not DNA sequencing—to look at approximately 700,000 DNA markers across your genome. While 700,000 is a lot less than the 3 billion we would get from sequencing your DNA, scientists don’t know what most of the bases in your DNA do. Only about 2% of your genome encodes for proteins, and it might be as little as 8% that has some other kind of function. We choose the 700,000 DNA markers to look at specifically to be most informative about your DNA matches, genetic ethnicity, and predicted traits.

 

References

Angeles, Arlou Kristina, Simone Bauer, Leonie Ratz, et al. "Genome-Based Classification and Therapy of Prostate Cancer." Diagnostics, September 2018. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6164491/.

Church, George M., Yuan Gao, and Sriram Kosuri. "Next-Generation Digital Information Storage in DNA." Science, August 16, 2012. https://www.science.org/doi/10.1126/science.1226355.

Fatima, Tamseel and Andreas Ebertz. "A Journey Through The History Of DNA Sequencing." Eurofins Genomics. Accessed December 1, 2022. https://the-dna-universe.com/2020/11/02/a-journey-through-the-history-of-dna-sequencing/.

Ganguly, Prabarna and Rachael Zisk. "Researchers generate the first complete, gapless sequence of a human genome." National Human Genome Research Institute. March 31, 2022. https://www.genome.gov/news/news-release/researchers-generate-the-first-complete-gapless-sequ ence-of-a-human-genome.

Graur, Dan. “An Upper Limit on the Functional Fraction of the Human Genome.” Genome Biology and Evolution, July 2017. https://doi.org/10.1093/gbe/evx121.

Green, Eric D. "Completing the Human Genome Sequence (Again)." Scientific American, March 31, 2022. https://www.scientificamerican.com/article/completing-the-human-genome-sequence-again/.

Heather, James M. and Benjamin Chain. "The sequence of sequencers: The history of sequencing DNA." Genomics, January 2016. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4727787/.

Human Genome Project Timeline, National Human Genome Research Institute, Accessed December 2, 2022. https://www.genome.gov/human-genome-project/timeline.

Peebles, Angelica. "Illumina Aims to Push Genetics Beyond the Lab With $200 Genome." Bloomberg, September 29, 2022. https://www.bloomberg.com/news/articles/2022-09-29/illumina-delivers-200-genome-with-new-dna-sequencing-machine.

Rands, Chris M., Stephen Meader, Chris P. Onting, and Gerton Lunter. “8.2% of the Human Genome Is Constrained: Variation in Rates of Turnover across Functional Element Classes in the Human Lineage.” PLoS Genetics, Jul 24, 2014. https://doi.org/10.1371/journal.pgen.1004525.

Sanger, F., G. M. Air, B. G. Barrell, N. L. Brown, et al. "Nucleotide sequence of bacteriophage phi X174 DNA." Nature, 1977. https://pubmed.ncbi.nlm.nih.gov/870828/.

Sanger F., S. Nicklen, and A. R. Coulson. "DNA sequencing with chain-terminating inhibitors." Proceedings of the National Academy of Science, December 1, 1977. https://www.pnas.org/doi/abs/10.1073/pnas.74.12.5463

Wu, Ray and A. D. Kaiser. “Structure and base sequence in the cohesive ends of bacteriophage lambda DNA.” Journal of Molecular Biology, August 1968. https://pubmed.ncbi.nlm.nih.gov/4299833/.

Related articles