High-Throughput Sequencing

INTRODUCTION

The first decade of the new millennium has seen rapid development of high-throughput sequencing methods—fast, cheap ways to sequence and analyze large genomes. The techniques are often referred to as massively parallel DNA sequencing, because thousands or millions of sequencing reactions are run at the same time to greatly speed up the process. The methods use miniaturization techniques first developed for the electronics industry, as well as the principles of DNA replication, often in combination with the polymerase chain reaction (PCR).

High-throughput sequencing methods are evolving rapidly. This animation describes two high-throughput methods. In one method, DNA is amplified on a solid surface and then sequenced using fluorescently labeled nucleotides. In the second method, the DNA is amplified by PCR on microbeads and analyzed by pyrosequencing using the enzyme luciferase to produce a light reaction.

ANIMATION SCRIPT

Method 1: PCR on a Solid Surface and DNA Sequencing with Fluorescently Labeled Nucleotides

High-throughput sequencing, also called massively parallel sequencing, refers to methods in which millions of DNA templates are sequenced simultaneously in a single reaction. The general strategy begins with the fragmentation of cellular DNA into small pieces. In most techniques, the cellular DNA is then amplified by the polymerase chain reaction (PCR) and sequenced by a method that usually involves DNA polymerase. The nucleotide sequence data is then assembled, using computer algorithms, into a continuous genome sequence.

Here we will describe a method that employs fluorescently colored nucleotides in the sequencing reaction. To begin, the fragments of cellular DNA are ligated to oligonucleotides referred to as adapters.

The cellular DNA fragments are denatured into single strands and then captured on a solid surface.

The attachment occurs through complementary base pairing with oligonucleotides that are fixed to the surface.

In the next step—PCR—DNA polymerase synthesizes a second strand of DNA, starting from the annealed oligonucleotide, which serves as a primer.

The two long strands are now separated, leaving one strand attached by a covalent bond to a fixed oligonucleotide.

The DNA fragment can form a bridge with another fixed oligonucleotide, and another round of PCR proceeds.

The strands are separated, followed by additional rounds of PCR. After many rounds, a small patch of DNA forms in the location of each original cellular DNA fragment.

These single-stranded DNA molecules represent two sequences that are complementary to each other. Now, one of the strands is clipped at its attached oligonucleotide, leaving molecules of just a single DNA sequence.

In this way, tiny islands of DNA form, with each containing many identical copies of a unique DNA sequence.

The PCR step produces enough DNA template material for the following sequencing reactions. DNA sequencing begins with annealing a primer to the template. The primer is complementary in sequence to the adapter DNA near the 3' end of the template.

The primers provide a 3' hydroxyl group onto which DNA polymerase can begin to add nucleotides. The four types of nucleotides have fluorescently labeled groups that can distinguish them.

DNA polymerase adds a nucleotide to the end of the primer. The added guanine-bearing nucleotide is complementary to the cytosine-bearing nucleotide in the template DNA. Note that the nucleotides have a blocking group on their 3' ends, so no additional nucleotides can be added at this time. The rest of the nucleotides are washed away.

A laser induces the nucleotide to fluoresce, and the color is recorded. Each spot on the solid surface has a particular fluorescence, depending on which nucleotide was incorporated into the growing strand.

The fluorescent label and the blocking group are now removed from the nucleotide. Because the blocking group can be removed, this type of nucleotide is called a reversible chain terminating nucleotide. The nucleotide now has the 3' OH group required for adding additional nucleotides.

More nucleotides are now added, and another becomes incorporated into the growing strand. The free nucleotides are washed away, and the new nucleotide is induced to fluoresce.

The cycle can be repeated up to 100 times. The sequences from all of the spots are recorded simultaneously.

Software packages assemble overlapping sequence fragments into longer pieces, and in this way determine the overall sequence of a genome.

Method 2: PCR on Microbeads and Pyrosequencing with Luciferase

For massively parallel sequencing using microbeads, the genomic DNA is first cut into fragments of 300–800 base pairs (bp).

Small DNA adaptors are added to each end of a fragment, and the double-stranded DNA is denatured to a single-stranded DNA.

One and a half million tiny microbeads, each less than 20 microns in diameter, are coated with DNA primers complementary to one of the adaptors. The single-stranded DNA molecules attach to the primers by complementary base pairing under conditions that favor just one DNA fragment attachment per bead.

For the next step, each bead must occupy its own tiny reaction chamber. A fine emulsion of oil and the reaction mixture is created by homogenization, such that each bead is isolated in its own reaction bubble. Inside the bubbles, an amplification procedure called PCR (for polymerase chain reaction) will amplify the DNA so that enough identical DNA strands are available to analyze.

In PCR, DNA polymerase synthesizes a second strand of DNA starting from a primer. The two long strands are separated by heat, and then new primers are allowed to bind. DNA synthesis continues from the new primers. With each cycle, heat separates the strands that are not permanently affixed to the bead by the attached primer. The PCR cycles continue until each bead has about two million identical copies of DNA.

Beads are loaded into tiny wells with room for a single bead each, resulting in an array of a million beads. Each well consists of a different amplified DNA fragment.

Along with a single bead, the well contains beads covered with two types of enzymes, sulfurylase and luciferase, that create light signals during the sequencing reaction.

Some sequencing reactions employ fluorescently labeled dNTPs. However, the technique depicted here, called pyrosequencing, uses a different strategy. First, a primer is allowed to attach to the DNA, and then DNA polymerase begins to add nucleotides. A single type of dNTP, in this case dTTP is flowed across the wells and becomes part of the reaction. The thymine can form a base pair with the adenine in the DNA strand, allowing the polymerase to attach the incoming nucleotide to the primer. In the process, phosphate groups are snipped off the dNTP in the form of a pyrophosphate ion.

On the enzyme bead, the enzymes use the pyrophosphate and other substrates to perform a series of reactions. One enzyme converts the pyrophosphate into ATP and the other uses the energy in ATP to produce a flash of light. The light is detected by a camera and recorded.

The nucleotides are washed out of the wells, and a new nucleotide is added. Because an adenine does not pair with a guanine, no nucleotides are incorporated, and no light is emitted from the well.

The next set of nucleotides bear a cytosine base, and one is incorporated into the sequence, emitting light through the action of the enzyme beads.

The next nucleotides bear a guanine base. Three nucleotides are added to the growing chain, and each pyrophosphate ion results in the emission of a photon. Three times as much light is emitted from the reaction with the guanine-bearing nucleotides, indicating that three cytosine bases appear consecutively in the template strand.

The four nucleotides are flowed sequentially into the wells, and the set of four flows is repeated for 100 cycles. This grid represents a moment in time in which a single type of nucleotide has been flowed across the wells. In the dark wells, no nucleotide is incorporated. In the dimly lit wells, a single nucleotide is incorporated. In the brighter wells, multiple nucleotides are incorporated. Data from 100 cycles are processed.

The data from a single well can be depicted on a chart. A calibration sequence is built into each sequencing reaction, showing the level of light emitted for the addition of single nucleotides. For the sequencing reaction in this well, a thymine was incorporated first. Adenine could not be incorporated next, but the following flow with cytosine worked. The light emitted from the next reaction with guanine is three times as intense as for the single incorporations of nucleotides, indicating that three guanine-bearing nucleotides were incorporated in a row. One flow of nucleotides after another reveals the sequence of the DNA.

With 1,000,000 reads occurring simultaneously and an average read length of 400 bases, a single 10-hour run time can produce 400 million bases of sequence information.

Software packages assemble the sequence fragments into longer pieces, and in this way determine the overall sequence of a genome.

CONCLUSION

The power of high-throughput sequencing methods derives from the following factors:

• They are fully automated and miniaturized.

• Millions of different fragments are sequenced at the same time.

• They are inexpensive ways to sequence large genomes. For example, at the time of this writing, a complete human genome can be sequenced in less than a day for $1,000. This is in contrast to the Human Genome Project, which took 13 years and $2.7 billion to sequence one genome!

The technology used to sequence millions of short DNA fragments is only half the story, however. Once these sequences have been determined, the problem becomes how to put them together. The field of bioinformatics was developed to analyze DNA sequences using sophisticated mathematics and computer programs to handle the large amounts of data generated in genome sequencing.