10.4 Determining the Base Sequence of a DNA Segment

After we have cloned and identified our desired gene or have amplified it by PCR, the task of trying to understand its function begins. The ultimate language of the genome is composed of strings of the nucleotides A, T, C, and G. Obtaining the complete nucleotide sequence of a segment of DNA is often an important part of understanding the organization of a gene and its regulation, its relation to other genes, or the function of its encoded RNA or protein. Indeed, the DNA sequence can be used to determine the protein primary structure since, for the most part, translating the nucleic acid sequence of a cDNA molecule to discover the amino sequence of its encoded polypeptide chain is simpler than directly sequencing the polypeptide itself. In this section, we consider the techniques used to read the nucleotide sequence of DNA.

As with recombinant-DNA technologies and PCR, DNA sequencing exploits base-pair complementarity together with an understanding of the basic biochemistry of DNA replication. Several techniques have been developed, but one of them has been the predominant method used to date to sequence most DNA molecules. While it is still the most commonly used technique to sequence shorter DNAs, new sequencing technology has largely supplanted this technique when the goal is to determine the sequence of entire genomes, as described in detail in Chapter 14.

This sequencing technique is called dideoxy sequencing or, sometimes, Sanger sequencing after its inventor. The term dideoxy comes from a special modified nucleotide, called a dideoxynucleotide triphosphate (generically, a ddNTP). This modified nucleotide is key to the Sanger technique because of its ability to block continued DNA synthesis. A dideoxynucleotide lacks the 3′-hydroxyl group as well as the 2′-hydroxyl group, which is also absent in a deoxynucleotide (Figure 10-16; compare to Figure 7-5). For DNA synthesis to take place, the DNA polymerase must catalyze a reaction between the 3′-hydroxyl group of the last nucleotide and the 5′-phosphate group of the next nucleotide to be added. Because a dideoxynucleotide lacks the 3′-hydroxyl group, this reaction cannot take place, and therefore DNA synthesis is blocked at the point of addition.

Figure 10-16: The structure of 2′,3′-dideoxynucleotides
Figure 10-16: 2′,3′-Dideoxynucleotides, which are employed in the Sanger DNA-sequencing method, are missing the ribose hydroxyl group present in DNA.

The logic of dideoxy sequencing is straightforward. Suppose we want to read the sequence of a cloned DNA segment of up to 800 base pairs. This DNA segment could be a plasmid insert or even a PCR product. First, we denature the two strands of this segment. Next, we create a primer for DNA synthesis that will hybridize to exactly one location on the cloned DNA segment and then add a special “cocktail” of DNA polymerase, normal deoxynucleotide triphosphates (dATP, dCTP, dGTP, and dTTP), and a small amount of a dideoxynucleotide for one of the four bases (for example, ddATP). The polymerase will begin to synthesize the complementary DNA strand, starting from the primer, but will stop at any point at which the dideoxynucleotide triphosphate is incorporated into the growing DNA chain in place of the normal deoxynucleotide triphosphate. Suppose the DNA segment that we’re trying to sequence is

375

5′ ACGGGATAGCTAATTGTTTACCGCCGGAGCCA 3′

We would then start DNA synthesis from a complementary primer:

Using the special DNA-synthesis cocktail “spiked” with a small amount of ddATP, for example, we will create a nested set of DNA fragments that have the same starting point but different end points because the fragments stop at whatever point the insertion of ddATP instead of dATP halted DNA replication. The array of different ddATP-arrested DNA chains looks like the list of sequences below. (*A indicates the dideoxynucleotide.)

We can generate an array of such fragments for each of the four possible dideoxynucleotide triphosphates in four separate cocktails (one spiked with ddATP, one with ddCTP, one with ddGTP, and one with ddTTP). Each will produce a different array of fragments, with no two spiked cocktails producing fragments of the same size. Next, the DNA fragments generated in the four cocktails are separated and displayed in order using gel electrophoresis. By running the fragments in four adjacent lanes of a polyacrylamide gel that can resolve fragments of DNA that vary by only one nucleotide in length, we see that the fragments can be ordered by length with the lengths increasing by one base at a time.

The newly synthesized strands must be labeled in some way to make the bands visible on the gel. Strands are labeled as they are made either by using a primer that is radioactively labeled or having one of the regular dNTPs carry a radioactive label. Fluorescent labels can also be used, and in this case they are carried by each ddNTP (see below).

The products of such dideoxy sequencing reactions are shown in Figure 10-17. That result is a ladder of labeled DNA chains increasing in length by one, and so all we need do is read up the gel to read the DNA sequence of the synthesized strand in the 5′-to-3′ direction.

Figure 10-17: The dideoxy sequencing method
Figure 10-17: DNA is efficiently sequenced by including dideoxynucleotides among the nucleotides used to copy a DNA segment. (a) A labeled primer (designed from the flanking vector sequence) is used to initiate DNA synthesis. The addition of four different dideoxynucleotides (ddATP is shown here) randomly arrests synthesis. (b) The resulting fragments are separated electrophoretically and subjected to autoradiography. The inferred sequence is shown at the right. (c) Sanger sequencing gel.
[(c) Loida Escote-Carlson, Ph.D.]

If the tag is a fluorescent dye, a different fluorescent color emitter is used for each of the four ddNTP reactions, and a detector at the end of the gel can distinguish each color. The four reactions take place in the same test tube, and the four sets of nested DNA chains can undergo electrophoresis together. Thus, four times as many sequences can be produced in the same amount of time as can be produced by running the reactions separately. This logic is used in fluorescence detection by automated DNA-sequencing machines. Thanks to these machines, DNA sequencing can proceed at a massive level, and sequences of whole genomes have been obtained by scaling up the procedures described in this section. Figure 10-18 illustrates a readout of automated sequencing. Each colored peak represents a different-size fragment of DNA, ending with a fluorescent base that was detected by the fluorescent scanner of the automated DNA sequencer; the four different colors represent the four bases of DNA. Applications of automated sequencing technology on a genome-wide scale is a major focus of Chapter 14.

Figure 10-18: Reading the DNA sequence from an automatic sequencer
Figure 10-18: Printout from an automatic sequencer that uses fluorescent dyes. Each of the four colors represents a different base. The letter N represents a base that cannot be assigned because peaks are too low. Note that, if this were a gel as in Figure 10-17c, each of these peaks would correspond to one of the dark bands on the gel; in other words, these colored peaks represent a different readout of the same sort of data as are produced on a sequencing gel.

376

377

KEY CONCEPT

A cloned DNA segment can be sequenced by characterizing the end bases of a serial set of truncated synthetic DNA fragments, each terminated at different positions corresponding to the incorporation of a dideoxynucleotide.