5.1 The Exploration of Genes Relies on Key Tools
The rapid progress in biotechnology—indeed its very existence—is a result of a few key techniques.
1. Restriction-enzyme analysis. Restriction enzymes are precise molecular scalpels that allow an investigator to manipulate DNA segments.
2. Blotting techniques. Southern and northern blots are used to separate and identify DNA and RNA sequences, respectively. The western blot, which uses antibodies to characterize proteins, was described in Chapter 3.
3. DNA sequencing. The precise nucleotide sequence of a molecule of DNA can be determined. Sequencing has yielded a wealth of information concerning gene architecture, the control of gene expression, and protein structure.
4. Solid-phase synthesis of nucleic acids. Precise sequences of nucleic acids can be synthesized de novo and used to identify or amplify other nucleic acids.
5. The polymerase chain reaction (PCR). The polymerase chain reaction leads to a billionfold amplification of a segment of DNA. One molecule of DNA can be amplified to quantities that permit characterization and manipulation. This powerful technique can be used to detect pathogens and genetic diseases, determine the source of a hair left at the scene of a crime, and resurrect genes from the fossils of extinct organisms.
A final set of techniques relies on the computer, without which, it would be impossible to catalog, access, and characterize the abundant information generated by the methods outlined above. Such uses of the computer will be presented in Chapter 6.
Restriction enzymes split DNA into specific fragments
Restriction enzymes, also called restriction endonucleases, recognize specific base sequences in double-helical DNA and cleave both strands of that duplex at specific places. To biochemists, these exquisitely precise scalpels are marvelous gifts of nature. They are indispensable for analyzing chromosome structure, sequencing very long DNA molecules, isolating genes, and creating new DNA molecules that can be cloned.
Restriction enzymes are found in a wide variety of prokaryotes. Their biological role is to cleave foreign DNA molecules, providing the host organism with a primitive immune system. Many restriction enzymes recognize specific sequences of four to eight base pairs and hydrolyze a phosphodiester bond in each strand in this region. A striking characteristic of these cleavage sites is that they almost always possess twofold rotational symmetry. In other words, the recognized sequence is palindromic, or an inverted repeat, and the cleavage sites are symmetrically positioned. For example, the sequence recognized by a restriction enzyme from Streptomyces achromogenes is
Palindrome
A word, sentence, or verse that reads the same from right to left as it does from left to right.
Roma tibi subito motibus ibit amor
Derived from the Greek palindromos, “running back again.”
In each strand, the enzyme cleaves the C–G phosphodiester bond on the 3′ side of the symmetry axis. As we shall see in Chapter 9, this symmetry corresponds to that of the structures of the restriction enzymes themselves.
Several hundred restriction enzymes have been purified and characterized. Their names consist of a three-letter abbreviation for the host organism (e.g., Eco for Escherichia coli, Hin for Haemophilus influenzae, Hae for Haemophilus aegyptius) followed by a strain designation (if needed) and a roman numeral (to distinguish multiple enzymes from the same strain). The specificities of several of these enzymes are shown in Figure 5.1.
Figure 5.1: Specificities of some restriction endonucleases. The sequences that are recognized by these enzymes contain a twofold axis of symmetry. The two strands in these regions are related by a 180-degree rotation about the axis marked by the green symbol. The cleavage sites are denoted by red arrows. The abbreviated name of each restriction enzyme is given at the right of the sequence that it recognizes. Note that the cuts may be staggered or even.
Restriction enzymes are used to cleave DNA molecules into specific fragments that are more readily analyzed and manipulated than the entire parent molecule. For example, the 5.1-kb circular duplex DNA of the tumor-producing SV40 virus is cleaved at one site by EcoRI, at four sites by HpaI, and at 11 sites by HindIII. A piece of DNA, called a restriction fragment, produced by the action of one restriction enzyme can be specifically cleaved into smaller fragments by another restriction enzyme. The pattern of such fragments can serve as a fingerprint of a DNA molecule, as will be considered shortly. Indeed, complex chromosomes containing hundreds of millions of base pairs can be mapped by using a series of restriction enzymes.
Restriction fragments can be separated by gel electrophoresis and visualized
In Chapter 3, we considered the use of gel electrophoresis to separate protein molecules (Section 3.1). Because the phosphodiester backbone of DNA is highly negatively charged, this technique is also suitable for the separation of nucleic acid fragments. Among the many applications of DNA electrophoresis are the detection of mutations that affect restriction fragment size (such as insertions and deletions) and the isolation, purification, and quantitation of a specific DNA fragment.
Figure 5.2: Gel-electrophoresis pattern of a restriction digest. This gel shows the fragments produced by cleaving DNA from two viral strains (odd- vs. even-numbered lanes) with each of four restriction enzymes. These fragments were made fluorescent by staining the gel with ethidium bromide.
[Data from Carr et al., Emerging Infectious Diseases, www.cdc.gov/eid, 17(8), August 2011.]
For most gels, the shorter the DNA fragment, the farther the migration. Polyacrylamide gels are used to separate, by size, fragments containing as many as 1000 base pairs, whereas more-porous agarose gels are used to resolve mixtures of larger fragments (as large as 20 kb). An important feature of these gels is their high resolving power. In certain kinds of gels, fragments differing in length by just one nucleotide out of several hundred can be distinguished. Bands or spots of radioactive DNA in gels can be visualized by autoradiography. Alternatively, a gel can be stained with a dye such as ethidium bromide, which fluoresces an intense orange under irradiation with ultraviolet light when bound to a double-helical DNA molecule (Figure 5.2). A band containing only 10 ng of DNA can be readily seen.
It is often necessary to determine if a particular base sequence is represented in a given DNA sample. For example, one may wish to confirm the presence of a specific mutation in genomic DNA isolated from patients known to be at risk for a particular disease. This specific sequence can be identified by hybridizing it with a labeled complementary DNA strand (Figure 5.3). A mixture of restriction fragments is separated by electrophoresis through an agarose gel, denatured to form single-stranded DNA, and transferred to a nitrocellulose sheet. The positions of the DNA fragments in the gel are preserved during the transfer. The nitrocellulose is then exposed to a 32P-labeled or fluorescently tagged DNA probe, a short stretch of single-stranded DNA which contains a known base sequence. The probe hybridizes with a restriction fragment having a complementary sequence, and autoradiography or fluorescence imaging then reveals the position of the restriction-fragment–probe duplex. A particular fragment amid a million others can be readily identified in this way. This powerful technique is named Southern blotting, for its inventor Edwin Southern.
Figure 5.3: Southern blotting. A DNA fragment containing a specific sequence can be identified by separating a mixture of fragments by electrophoresis, transferring them to nitrocellulose, and hybridizing with a 32P-labeled probe complementary to the sequence. The fragment containing the sequence is then visualized by autoradiography.
In a similar manner, RNA molecules of a specific sequence can also be readily identified. After separation by gel electrophoresis and transfer to nitrocellulose, specific sequences can be detected by DNA probes. This analogous technique for the analysis of RNA has been whimsically termed northern blotting. A further play on words accounts for the term western blotting, which refers to a technique for detecting a particular protein by staining with specific antibody (Section 3.3).
DNA can be sequenced by controlled termination of replication
Figure 5.4: Fluorescence detection of oligonucleotide fragments produced by the dideoxy method. A sequencing reaction is performed with four chain-terminating dideoxy nucleotides, each labeled with a tag that fluoresces at a different wavelength. The color of each fragment indicates the identity of the last base in the chain. The fragments are separated by size using capillary electrophoresis and the fluorescence at each of the four wavelengths indicates the sequence of the complement of the original DNA template.
The analysis of DNA structure and its role in gene expression have been markedly facilitated by the development of powerful techniques for the sequencing of DNA molecules. One of the first and most widely-used techniques for DNA sequencing is controlled termination of replication, also referred to as the Sanger dideoxy method for its pioneer, Frederick Sanger. The key to this approach is the generation of DNA fragments whose length is determined by the last base in the sequence (Figure 5.4). In the current application of this method, a DNA polymerase is used to make the complement of a particular sequence within a single-stranded DNA molecule. The synthesis is primed by a chemically synthesized fragment that is complementary to a part of the sequence known from other studies. In addition to the four deoxyribonucleoside triphosphates, the reaction mixture contains a small amount of the 2′,3′-dideoxy analog of each nucleotide, each carrying a different fluorescent label attached to the base (e.g., a green emitter for termination at A and a red one for termination at T).
The incorporation of this analog blocks further growth of the new chain because it lacks the 3′-hydroxyl terminus needed to form the next phosphodiester bond. The concentration of the dideoxy analog is low enough that chain termination will take place only occasionally. The polymerase will insert the correct nucleotide sometimes and the dideoxy analog other times, stopping the reaction. For instance, if the dideoxy analog of dATP is present, fragments of various lengths are produced, but all will be terminated by the dideoxy analog. Importantly, this dideoxy analog of dATP will be inserted only where a T was located in the DNA being sequenced. Thus, the fragments of different length will correspond to the positions of T.
The resulting fragments are separated by a technique known as capillary electrophoresis, in which the mixture is passed through a very narrow tube containing a gel matrix at high voltage to achieve efficient separation within a short time. As the DNA fragments emerge from the capillary, they are detected by their fluorescence; the sequence of their colors directly gives the base sequence. Sequences of as many as 1000 bases can be determined in this way. Indeed, automated Sanger sequencing machines can read more than 1 million bases per day.
DNA probes and genes can be synthesized by automated solid-phase methods
DNA strands, like polypeptides (Section 3.4), can be synthesized by the sequential addition of activated monomers to a growing chain that is linked to a solid support. The activated monomers are protected deoxyribonucleoside 3′-phosphoramidites. In step 1, the 3′-phosphorus atom of this incoming unit becomes joined to the 5′-oxygen atom of the growing chain to form a phosphite triester (Figure 5.5). The 5′-OH group of the activated monomer is unreactive because it is blocked by a dimethoxytrityl (DMT) protecting group, and the 3′-phosphoryl oxygen atom is rendered unreactive by attachment of the β-cyanoethyl (βCE) group. Likewise, amino groups on the purine and pyrimidine bases are blocked.
Figure 5.5: Solid-phase synthesis of a DNA chain by the phosphite triester method. The activated monomer added to the growing chain is a deoxyribonucleoside 3′-phosphoramidite containing a dimethoxytrityl (DMT) protecting group on its 5′-oxygen atom, a β-cyanoethyl (βCE) protecting group on its 3′-phosphoryl oxygen atom, and a protecting group on the base.
Coupling is carried out under anhydrous conditions because water reacts with phosphoramidites. In step 2, the phosphite triester (in which P is trivalent) is oxidized by iodine to form a phosphotriester (in which P is pentavalent). In step 3, the DMT protecting group on the 5′-OH group of the growing chain is removed by the addition of dichloroacetic acid, which leaves other protecting groups intact. The DNA chain is now elongated by one unit and ready for another cycle of addition. Each cycle takes only about 10 minutes and usually elongates more than 99% of the chains.
This solid-phase approach is ideal for the synthesis of DNA, as it is for polypeptides, because the desired product stays on the insoluble support until the final release step. All the reactions take place in a single vessel, and excess soluble reagents can be added to drive reactions to completion. At the end of each step, soluble reagents and by-products are washed away from the resin that bears the growing chains. At the end of the synthesis, NH3 is added to remove all protecting groups and release the oligonucleotide from the solid support. Because elongation is never 100% complete, the new DNA chains are of diverse lengths—the desired chain is the longest one. The sample can be purified by high-performance liquid chromatography or by electrophoresis on polyacrylamide gels. DNA chains of as many as 100 nucleotides can be readily synthesized by this automated method.
The ability to rapidly synthesize DNA chains of any selected sequence opens many experimental avenues. For example, a synthesized oligonucleotide labeled at one end with 32P or a fluorescent tag can be used to search for a complementary sequence in a very long DNA molecule or even in a genome consisting of many chromosomes. The use of labeled oligonucleotides as DNA probes is powerful and general. For example, a DNA probe that can base-pair to a known complementary sequence in a chromosome can serve as the starting point of an exploration of adjacent uncharted DNA. Such a probe can be used as a primer to initiate the replication of neighboring DNA by DNA polymerase. An exciting application of the solid-phase approach is the synthesis of new tailor-made genes. New proteins with novel properties can now be produced in abundance by the expression of synthetic genes. Finally, the synthetic scheme heretofore described can be slightly modified for the solid-phase synthesis of RNA oligonucleotides, which can be very powerful reagents for the degradation of specific mRNA molecules in living cells by a technique known as RNA interference (Section 5.4).
Selected DNA sequences can be greatly amplified by the polymerase chain reaction
Figure 5.6: The first cycle in the polymerase chain reaction (PCR). A cycle consists of three steps: DNA double strand separation, the hybridization of primers, and the extension of primers by DNA synthesis.
In 1984, Kary Mullis devised an ingenious method called the polymerase chain reaction (PCR) for amplifying specific DNA sequences. Consider a DNA duplex consisting of a target sequence surrounded by nontarget DNA. Millions of copies of the target sequences can be readily obtained by PCR if the sequences flanking the target are known. PCR is carried out by adding the following components to a solution containing the target sequence: (1) a pair of primers that hybridize with the flanking sequences of the target, (2) all four deoxyribonucleoside triphosphates (dNTPs), and (3) a heat-stable DNA polymerase. A PCR cycle consists of three steps (Figure 5.6).
1. Strand separation. The two strands of the parent DNA molecule are separated by heating the solution to 95°C for 15 s.
2. Hybridization of primers. The solution is then abruptly cooled to 54°C to allow each primer to hybridize to a DNA strand. One primer hybridizes to the 3′ end of the target on one strand, and the other primer hybridizes to the 3′ end on the complementary target strand. Parent DNA duplexes do not form, because the primers are present in large excess. Primers are typically from 20 to 30 nucleotides long.
3. DNA synthesis. The solution is then heated to 72°C, the optimal temperature for heat-stable polymerases. One such enzyme is Taq DNA polymerase, which is derived from Thermus aquaticus, a thermophilic bacterium that lives in hot springs. The polymerase elongates both primers in the direction of the target sequence because DNA synthesis is in the 5′-to-3′ direction. DNA synthesis takes place on both strands but extends beyond the target sequence.
These three steps—strand separation, hybridization of primers, and DNA synthesis—constitute one cycle of the PCR amplification and can be carried out repetitively just by changing the temperature of the reaction mixture. The thermostability of the polymerase makes it feasible to carry out PCR in a closed container; no reagents are added after the first cycle. At the completion of the second cycle, four duplexes containing the target sequence have been generated (Figure 5.7). Of the eight DNA strands comprising these duplexes, two short strands constitute only the target sequence—the sequence including and bounded by the primers. Subsequent cycles will amplify the target sequence exponentially. Ideally, after n cycles, the desired sequence is amplified 2n-fold. The amplification is a millionfold after 20 cycles and a billionfold after 30 cycles, which can be carried out in less than an hour.
Several features of this remarkable method for amplifying DNA are noteworthy. First, the sequence of the target need not be known. All that is required is knowledge of the flanking sequences so that complementary primers can be synthesized. Second, the target can be much larger than the primers. Targets larger than 10 kb have been amplified by PCR. Third, primers do not have to be perfectly matched to flanking sequences to amplify targets. With the use of primers derived from a gene of known sequence, it is possible to search for variations on the theme. In this way, families of genes are being discovered by PCR. Fourth, PCR is highly specific because of the stringency of hybridization at relatively high temperature. Stringency is the required closeness of the match between primer and target, which can be controlled by temperature and salt. At high temperatures, only the DNA between hybridized primers is amplified. A gene constituting less than a millionth of the total DNA of a higher organism is accessible by PCR. Fifth, PCR is exquisitely sensitive. A single DNA molecule can be amplified and detected.
Figure 5.7: Multiple cycles of the polymerase chain reaction. The two short strands produced at the end of the third cycle (along with longer stands not shown) represent the target sequence. Subsequent cycles will amplify the target sequence exponentially and the parent sequence arithmetically.
PCR is a powerful technique in medical diagnostics, forensics, and studies of molecular evolution
Figure 5.8: DNA and forensics. DNA isolated from sperm obtained during the examination of a rape victim was amplified by PCR, then compared with DNA from the victim and three potential suspects—the victim’s husband and two additional individuals—using gel electrophoresis and autoradiography. Sperm DNA matched the pattern of Suspect 1, but not that of Suspect 2 or the victim’s husband. Sizing marker and K562 lanes refer to control DNA samples.
[Martin Shields/Science Source.]
PCR can provide valuable diagnostic information in medicine. Bacteria and viruses can be readily detected with the use of specific primers. For example, PCR can reveal the presence of small amounts of DNA from the human immunodeficiency virus (HIV) in persons who have not yet mounted an immune response to this pathogen. In these patients, assays designed to detect antibodies against the virus would yield a false negative test result. Finding Mycobacterium tuberculosis bacilli in tissue specimens is slow and laborious. With PCR, as few as 10 tubercle bacilli per million human cells can be readily detected. PCR is a promising method for the early detection of certain cancers. This technique can identify mutations of certain growth-control genes, such as the ras genes (Chapter 14). The capacity to greatly amplify selected regions of DNA can also be highly informative in monitoring cancer chemotherapy. Tests using PCR can detect when cancerous cells have been eliminated and treatment can be stopped; they can also detect a relapse and the need to immediately resume treatment. PCR is ideal for detecting leukemias caused by chromosomal rearrangements.
In addition, PCR has made an impact on forensics and legal medicine. An individual DNA profile is highly distinctive because many genetic loci are highly variable within a population. For example, variations at one specific location determine a person’s HLA type (human leukocyte antigen type; Section 34.5); organ transplants are rejected when the HLA types of the donor and recipient are not sufficiently matched. PCR amplification of multiple genes is being used to establish biological parentage in disputed paternity and immigration cases. Analyses of blood stains and semen samples by PCR have implicated guilt or innocence in numerous assault and rape cases (Figure 5.8). The root of a single shed hair found at a crime scene contains enough DNA for typing by PCR.
DNA is a remarkably stable molecule, particularly when shielded from air, light, and water. Under such circumstances, large fragments of DNA can remain intact for thousands of years or longer. PCR provides an ideal method for amplifying such ancient DNA molecules so that they can be detected and characterized (Section 6.5). PCR can also be used to amplify DNA from microorganisms that have not yet been isolated and cultured. As will be discussed in Chapter 6, sequences from these PCR products can be sources of considerable insight into evolutionary relationships between organisms.
The tools for recombinant DNA technology have been used to identify disease-causing mutations
Let us consider how the techniques just described have been utilized in concert to study ALS, introduced at the beginning of this chapter. Five percent of all patients suffering from ALS have family members who also have been diagnosed with the disease. A heritable disease pattern is indicative of a strong genetic component of disease causation. To identify these disease-causing genetic alterations, researchers identify polymorphisms (instances of genetic variation) within an affected family that correlate with the emergence of disease. These polymorphisms may themselves cause disease or be closely linked to the responsible genetic alteration. Restriction-fragment-length polymorphisms (RFLPs) are polymorphisms within restriction sites that change the sizes of DNA fragments produced by the appropriate restriction enzyme. Using restriction digests and Southern blots of the DNA from members of ALS-affected families, researchers identified RFLPs that were found preferentially in those family members diagnosed with the disease. For some of these families, strong evidence was obtained for the disease-causing mutation within a specific region of chromosome 21.
After the probable location of one disease-causing gene had been identified, this same research group compared the locations of the ALS-associated RFLPs with the known sequence of chromosome 21. They noted that this chromosomal locus contains the SOD1 gene, which encodes the Cu/Zn superoxide dismutase protein SOD1, an enzyme important for the protection of cells against oxidative damage (Section 18.3). PCR amplification of regions of the SOD1 gene from the DNA of affected family members, followed by Sanger dideoxy sequencing of the targeted fragment, enabled the identification of 11 disease-causing mutations from 13 different families. This work was pivotal for focusing further inquiry into the roles that superoxide dismutase and its corresponding mutant forms play in the pathology of some forms of ALS.