Genomic DNA can be cut up before cloning
Genomic DNA is obtained directly from the chromosomes of the organism under study, usually by grinding up fresh tissue and purifying the DNA. Chromosomal DNA can be used as the starting point for both in vivo and PCR methods to isolate genes. For the in vivo method, genomic DNA needs to be cut up before cloning is possible. As described later in this section, genomic DNA does not have to be cut up to perform PCR because the specific short primers that anneal to it identify the start site for DNA polymerase that directs the replication of the intervening DNA.
The long chromosome-size DNA molecules of genomic DNA must be cut into fragments of much smaller size before they can be inserted into a vector. Most cutting is done with the use of bacterial restriction enzymes. These enzymes cut at specific DNA sequences, called restriction sites, and this property is one of the key features that make restriction enzymes suitable for DNA manipulation. These enzymes are examples of endonucleases, enzymes that cleave a phosphodiester bond between nucleotides in DNA. Purely by chance, any DNA molecule, from any organism, may contain restriction-enzyme recognition sites. Thus, a restriction enzyme will cut the DNA into a set of restriction fragments determined by the locations of the restriction sites, and will produce the same pattern of fragments every time it is cut.
Another key property of some restriction enzymes is that many create “sticky ends” in the fragments. Let’s look at an example. The restriction enzyme EcoRI (from E. coli) recognizes the following sequence of six nucleotide pairs in the DNA of any organism:
This type of segment is called a DNA palindrome, which means that both strands have the same nucleotide sequence but in antiparallel orientation (reading 5′ to 3′ produces the same sequence on either strand). Different restriction enzymes cut at different palindromic sequences. Sometimes the cuts are in the same position on each of the two antiparallel strands, leaving blunt ends. However, the most useful restriction enzymes make cuts that are offset, or staggered. The enzyme EcoRI makes cuts only between the G and the A nucleotides on each strand of the palindrome:
These staggered cuts leave a pair of ends that each have an identical four base-pair single-stranded end (AATT). The ends are called sticky because, being single stranded, they can base-pair (that is, stick) to a complementary sequence. Combining complementary single-strands so that they pair is called hybridization. Figure 10-2 illustrates the restriction enzyme EcoRI making staggered double-strand cuts in a circular DNA molecule such as a plasmid; the cut opens up the circle, and the resulting linear molecule has two sticky ends. It can now hybridize with a fragment of a different DNA molecule having the same complementary sticky ends.
Figure 10-2: Formation of a recombinant DNA molecule
Figure 10-2: To form a recombinant DNA molecule, the restriction enzyme EcoRI cuts a circular DNA molecule bearing one target sequence, resulting in a linear molecule with single-stranded sticky ends. Because of complementarity, other linear molecules with EcoRI-cut sticky ends can hybridize with the linearized circular DNA, forming a recombinant DNA molecule.
Digesting human genomic DNA with EcoRI generates approximately 500,000 fragments. You will see later in this section how scientists sift through all of these fragments to find the needle in the haystack—the one or two fragments out of the 500,000 that contain the DNA sequence of interest (in our example, the human insulin gene).
KEY CONCEPT
Genomic DNA can be used directly for cloning genes. As a first step, restriction enzymes cut DNA into fragments of manageable size, and many of them generate single-stranded sticky ends suitable for making recombinant DNA.
The polymerase chain reaction amplifies selected regions of DNA in vitro
Figure 10-3: Polymerase chain reaction
Figure 10-3: The polymerase chain reaction quickly copies a target DNA sequence. (a) Double-stranded DNA containing the target sequence. (b) Two chosen or synthesized primers have sequences complementing primer-binding sites at the 3′ ends of the target gene on the two strands. The strands are separated by heating, then cooled to allow the two primers to anneal to the primer-binding sites. Together, the primers thus flank the targeted sequence. (c) After the temperature is raised, Taq polymerase then synthesizes the first set of complementary strands by the addition of the four nucleotide triphosphates which are also in the reaction mixture. These first two strands are of varying length because they do not have a common stop signal. They extend beyond the ends of the target sequence as delineated by the primer-binding sites. (d) The two duplexes are heated again, exposing four binding sites. After cooling, the two primers again bind to their respective strands at the 3′ ends of the target region. (e) After the temperature is raised, Taq polymerase synthesizes four complementary strands. Although the template strands at this stage are variable in length, two of the four strands just synthesized from them are precisely the length of the target sequence desired. This precise length is achieved because each of these strands begins at the primer-binding site, at one end of the target sequence, and proceeds until it runs out of template, at the other end of the sequence. (f) The process is repeated for many cycles, each time creating more double-stranded DNA molecules identical with the target sequence. ANIMATED ART: Polymerase chain reaction
If we endeavored to clone the human insulin gene today, armed with the human genome sequence, knowing the gene and flanking sequences would allow us to use a more direct method. Today, we can simply amplify the gene in vitro using the polymerase chain reaction (PCR). The basic strategy of PCR is outlined in Figure 10-3. The process uses multiple copies of a pair of short chemically synthesized DNA primers, approximately 20 bases long, designed so that each primer will bind to one end of the gene or region to be amplified. The two primers bind to opposite DNA strands surrounding the target sequence, with their 3′ ends pointing toward each other. DNA polymerases add bases to the 3′ ends of these primers and copy the target sequence. Repeating the polymerization process produces an exponentially growing number of double-stranded DNA molecules. The details are as follows.
We start with a solution containing the DNA template, the primers, the four deoxyribonucleotide triphosphates (required for DNA synthesis; see Figure 7-15), and a heat-tolerant DNA polymerase. The target DNA is denatured by heat (95°C), resulting in single-stranded DNA molecules. When the solution is cooled (to between 50 and 65°C) the primers hybridize (or anneal) to their complementary sequences in the single-stranded DNA molecules. After the temperature is raised to 72°C, the heat-tolerant DNA polymerase replicates the single-stranded DNA segments extending from a primer. The Taq DNA polymerase, from the bacterium Thermus aquaticus, is one such enzyme commonly used. (To survive in the extreme heat of thermal vents, this bacterium has evolved proteins that are extremely heat resistant. Its DNA polymerase thus survives the high temperatures required to denature the DNA duplex, which would denature and inactivate DNA polymerase from most other species.) Complementary new strands are synthesized as in normal DNA replication in cells, forming two double-stranded DNA molecules identical with the parental double-stranded molecule. One cycle consists of these three steps, leading to a single replication of the segment between the two primers.
After the replication of the segment between the two primers is completed, the two new duplexes are again heat-denatured to generate single-stranded templates, and a second cycle of replication is carried out by lowering the temperature in the presence of all the components necessary for the polymerization to produce four identical duplexes. Repeated cycles of denaturation, annealing, and synthesis result in an exponential increase in the number of segments replicated. Because a typical cycle lasts five minutes, amplification by as much as a billion-fold can be readily achieved within 2.5 hours. As you will see later in this section, the PCR products can be further amplified by cloning them in bacterial cells.
PCR is a powerful technique that is routinely used to isolate specific genes or DNA fragments when there is prior knowledge of the sequence to be amplified. In fact, if the sequences corresponding to the primers are each present only once in the genome and are sufficiently close together, the only DNA segment that can be amplified is the one between the two primers. PCR is a very sensitive technique with numerous applications in biology. It can amplify target sequences that are present in extremely low copy numbers in a sample as long as primers specific to this rare sequence are used. For example, crime investigators can amplify segments of human DNA from the few follicle cells surrounding a single pulled-out hair. If the investigators chose to do so, they could amplify the insulin gene from this DNA sample using its precise location on chromosome 11 to design flanking primers for direct PCR.
It would not be an overstatement to say that PCR has revolutionized the study of many fields of biology where DNA analysis is required. In recognition of its importance to science, Kary Mullis was awarded the Nobel Prize in Chemistry in 1993 for developing the first viable PCR protocol.
KEY CONCEPT
The polymerase chain reaction uses specially designed primers for direct isolation and amplification of specific regions of DNA in a test tube.
DNA copies of mRNA can be synthesized
Figure 10-4: Double-stranded cDNA is synthesized from mRNA
Figure 10-4: The formation of cDNA for the insulin gene. The insulin gene (with its two introns) is transcribed in the pancreas into pre-mRNA. The introns are removed by splicing, and A residues are added to the 3′ end to form polyadenylated mRNA. In the laboratory, mRNAs are isolated from pancreatic cells and a short oligo-dT primer is hybridized to the poly(A) tail of all mRNAs to prime synthesis of complementary DNA from the RNA template by reverse transcriptase. Reverse transcriptase synthesizes a stem-loop structure that acts as a primer for synthesis of the second cDNA strand, after the mRNA strand has been degraded (by treatment with NaOH or with RNAseH).
As we have seen in Chapter 8, eukaryotic genes often contain one or more introns that disrupt the coding regions. Further, as we will see in Chapters 14 and 15, protein-coding genes are often less than 5 percent of the genomic DNA of multicellular eukaryotes. As mentioned in the previous section, the human insulin gene contains two introns, a problem if the goal is to create bacteria that synthesize human insulin because bacteria do not have the ability to splice out introns present in natural genomic DNA. Instead, because we are interested only in the coding sequence, we can use insulin mRNA as a starting material for PCR. For insulin and other protein-coding genes in higher eukaryotes, collections of mRNA in which intron sequences were removed by spliceosomes are a more useful starting point than genomic DNA. The sequence of the mRNA can be virtually “translated” into the amino acid sequence of the protein by simply reading the triplet codons.
Complementary DNA (cDNA) is a DNA version of an mRNA molecule. Researchers use cDNA rather than mRNA itself because RNAs are inherently less stable than DNA. Moreover, RNA cannot be manipulated by the enzymes available for DNA cloning, and techniques for routinely amplifying and purifying individual RNA molecules do not exist. The cDNA is made from mRNA with the use of a special enzyme called reverse transcriptase, originally isolated from retroviruses (see Chapter 15). Retroviruses have RNA genomes that are copied into DNA that inserts into the host chromosome. Can you think of why it is called reverse transcriptase? To make cDNA, a researcher begins by purifying mRNA from a tissue that produces a large amount of the desired protein. Insulin is produced in the β-islet cells of the pancreas, so we would use that organ as our source for insulin mRNA. Next, the purified mRNA is added to a test tube containing reverse transcriptase, the four dNTPs, and a short primer of polymerized dTTP residues (called an oligo-dT primer). The oligo-dT primer anneals to the poly(A) tail of the mRNA molecule being copied. Using this mRNA molecule as a template, reverse transcriptase catalyzes the synthesis of a single-stranded DNA molecule starting from the oligo-dT primer. When it reaches the end of the RNA template, the reverse transcriptase doubles back and synthesizes a stem-loop. When the mRNA is removed (by treating with a basic solution), the stem-loop can serve as a natural primer for DNA polymerase to copy the cDNA into a double-stranded DNA molecule (Figure 10-4). Like fragments of genomic DNA or PCR products, this double-stranded cDNA can be inserted into recombinant DNA molecules for further amplification or used in any other DNA-based procedure, as described throughout this chapter.
KEY CONCEPT
mRNA is often a preferable starting point in the isolation of a gene. Enzymatic conversion of mRNA into cDNA allows for the isolation of a gene copy without introns.
Attaching donor and vector DNA
As described above, we have several options for obtaining the human insulin gene from the genome or from purified mRNA. These methods produce genomic DNA fragments, PCR products, or double-stranded cDNA. The next step is to construct recombinant DNA molecules by inserting donor DNA into vector DNA.
Cloning DNA fragments with sticky ends Recall that the original scientists who isolated the human insulin gene did not know the gene sequence, and so they needed to create alibrary of human genome DNA fragments from which to isolate the specific gene. To make recombinant DNA molecules containing donor genomic DNA fragments, both donor and vector DNAs are digested by a restriction enzyme that produces the same complementary sticky ends (see Figure 10-2). The resulting fragments are then mixed to allow the sticky ends of vector and donor DNA to hybridize with each other and form recombinant molecules. Figure 10-5a shows a bacterial plasmid DNA that carries a single EcoRI restriction site, so digestion with the restriction enzyme EcoRI converts the circular DNA into a single linear molecule with sticky ends. Donor DNA from any other source, such as human DNA, also is treated with the EcoRI enzyme to produce a population of fragments carrying the same sticky ends. When the two populations are mixed under the proper physiological conditions, DNA fragments from the two sources can hybridize because double helices form between their sticky ends (Figure 10-5b). In any cloning reaction, there are many linearized plasmid molecules in the solution, as well as many different EcoRI fragments of donor DNA, a tiny fraction of which will have the target DNA. Therefore, a diverse array of plasmids recombined with different donor fragments will be produced. At this stage, the hybridized molecules do not have covalently joined sugar-phosphate backbones and are likely to fall apart because eight hydrogen bonds provide only weak links between the sequences. However, the backbones can be covalently sealed by the addition of the enzyme DNA ligase, which creates phosphodiester linkages at the junctions (Figure 10-5c).
Figure 10-5: Inserting a gene into a recombinant DNA plasmid
Figure 10-5: Method for generating a collection of recombinant DNA plasmids containing genes derived from restriction enzyme digestion of donor DNA.
Cloning DNA fragments with blunt ends Knowing the human insulin gene sequence helps us zero in on the gene, but it adds a small complication in the cloning reaction. Some restriction enzymes produce blunt ends rather than staggered cuts. In addition, cDNA and the DNA fragments that arise from PCR have blunt or near-blunt ends. While blunt end fragments from all these sources can be joined to the vector with the use of ligase alone, this is a very inefficient reaction because blunt ends cannot stick together. One alternative method is to create PCR products with sticky ends by using specially designed PCR primers that contain restriction endonuclease recognition sequences at their 5′ ends (Figure 10-6). Digestion of the final PCR product with the restriction enzyme (EcoRI in this case) produces a fragment that is ready to be inserted into a vector (see Figure 10-5b).
Figure 10-6: Producing PCR products with sticky ends
Figure 10-6: Adding EcoRI sites to the ends of PCR products. (a) A pair of PCR primers is designed so that their 3′ ends anneal to the target sequence while their 5′ ends contain sequences encoding the restriction enzyme site (EcoRI in this case). Two additional (random) nucleotides are added to the 5′ end because restriction enzymes require sequences on both sides of the recognition sequence for efficient cutting. The target DNA is denatured, and 5′ ends with the restriction sites remain single stranded while the rest of the primers anneal and are extended by DNA polymerase. (b) In the second round of PCR—only the newly synthesized strands are shown—the DNA primers anneal again, and this time DNA synthesis produces double-stranded DNA molecules just like conventional PCR, but these molecules have restriction sites at one end. (c) The products of the second round and all subsequent rounds have EcoRI sites at both ends. (d) When these are cut with EcoRI, sticky ends are produced.
Another method adds sticky ends to any double-stranded DNA fragment—including cDNAs (Figure 10-7). Short double-stranded oligonucleotides (called linkers or adapters) that contain a restriction site are added to a test tube containing cDNAs and ligase. The ligase joins the linkers to the ends of the cDNA strands. After ligation is complete, the DNA is incubated with the corresponding restriction enzyme to generate the sticky ends necessary for cloning into a plasmid vector (see Figure 10-5b). Note that in the examples shown, both the amplified DNA and the cDNA must not contain an internal EcoRI site or it too will be digested. If it does, the sequence for a restriction site that is not in the amplified DNA can be added to the primers or the linkers.
Figure 10-7: Producing cDNA molecules with sticky ends
Figure 10-7: Adding EcoRI sites to the ends of cDNA molecules. The cDNA molecules come from the last step in Figure 10-4. Adapters (boxed region) are added at both ends of the cDNA molecules. These adapters are double-stranded oligonucleotides that contain a restriction site (EcoRI is shown in red) and random DNA sequence at both ends (represented by N).
KEY CONCEPT
Donor and vector DNAs with the same sticky ends can be joined efficiently and ligated. Alternatively, donor DNA that is the product of PCR or cDNA synthesis requires the addition of sticky ends prior to insertion into a vector.
Amplification of donor DNA inside a bacterial cell
Amplification of the recombinant DNA molecules takes advantage of prokaryotic genetic processes such as bacterial transformation, plasmid replication, and bacteriophage growth, all discussed in Chapter 5. Figure 10-8 illustrates the cloning of a donor DNA segment. A single recombinant vector enters a bacterial cell and is amplified by the same machinery that replicates the bacterial chromosome. One basic requirement is the presence of an origin of DNA replication recognized by the host replication proteins (as described in Chapter 7). There are soon many copies of each vector in each bacterial cell. Hence, after amplification, a colony of bacteria will typically contain billions of copies of the single-donor DNA insert fused to its vector. This set of amplified copies of the single-donor DNA fragment within the cloning vector is the recombinant DNA clone.
Figure 10-8: How in vivo amplification works
Figure 10-8: The general strategy used to clone a gene. Restriction-enzyme treatment of donor DNA and vector allows the insertion of single fragments into vectors. A single vector enters a bacterial host, where replication and cell division result in a large number of copies of the donor fragment. ANIMATED ART: Finding specific cloned genes by functional complementation: making a library of wild-type yeast DNA
The amplification of donor DNA inside a bacterial cell entails the following steps:
Choosing a cloning vector and introducing the insert (see the preceding section for a discussion of the latter)
Introducing the recombinant DNA molecule inside a bacterial cell
Recovering the amplified recombinant molecules
Choice of cloning vectors Vectors must be small molecules for convenient manipulation, but they may vary in many ways to suit the goal of the experiment. Some vectors need to be capable of prolific replication in a living cell in order to amplify the inserted donor fragment. In contrast, others are designed to be present in only a single copy to maintain the integrity of the inserted DNA (see below). All vectors must have convenient restriction sites at which the DNA to be cloned may be inserted (called a polylinker or multiple cloning site). Ideally, the restriction site should be present only once in the vector because then restriction fragments of donor DNA will insert only at that one location in the vector. Having a way to identify and recover the desired recombinant molecule quickly also is important. Numerous cloning vectors that meet a wide range of experimental needs are in current use. Some general classes of cloning vectors follow.
Plasmid vectors As described earlier, bacterial plasmids are small circular DNA molecules that commonly replicate their DNA independent of the bacterial chromosome. The plasmids routinely used as vectors carry a gene for drug resistance and a gene to distinguish plasmids with and without DNA inserts. These drug-resistance genes provide a convenient way to select for bacterial cells transformed by plasmids: those cells still alive after exposure to the drug must carry the plasmid vectors. However, not all the plasmids in these transformed cells will contain DNA inserts. Some plasmid vectors also have a system that allows researchers to identify bacterial colonies with plasmids containing DNA inserts. For this reason, it is desirable to be able to identify bacterial colonies with plasmids containing DNA inserts. Such a feature is part of the pUC18 plasmid vector shown in Figure 10-9; DNA inserts disrupt a gene (lacZ) in the plasmid that encodes an enzyme (β-galactosidase) necessary to cleave a compound added to the petri plate agar (X-gal) so that it produces a blue pigment. Thus, the colonies that contain the plasmids with the DNA insert will be white rather than blue (they cannot cleave X-gal because they do not produce β-galactosidase).
Figure 10-9: Use of a plasmid vector, pUC18
Figure 10-9: The plasmid vector pUC18 has been designed for use as a vector for DNA cloning. Insertion of DNA into pUC18 is detected by inactivation of the β-galactosidase function of lacZ, resulting in an inability to convert the artificial substrate X-gal into a blue dye. The polylinker has several alternative restriction sites into which donor DNA can be inserted.
[Photo: Dr. James M. Burnette III and Dr. Leslie Bañuelos.]
Bacteriophage vectors A bacteriophage vector harbors DNA as an insert “packaged” inside the phage particle. Different classes of bacteriophage vectors can carry different sizes of donor DNA insert. Bacteriophage λ (lambda; discussed in Chapters 5 and 11) is an effective cloning vector for double-stranded DNA inserts as long as 15 kb. The central part of the phage genome is not required for replication or packaging of λ DNA molecules in E. coli and so can be cut out by restriction enzymes and discarded. The deleted central part is then replaced by inserts of donor DNA.
Vectors for larger DNA inserts The standard plasmid and phage λ vector just described can accept donor DNA of sizes as large as 10 to 15 kb. However, many experiments require inserts well in excess of this upper limit. To meet these needs, special vectors that require more sophisticated methods for transferring the DNA into the host cell have been engineered. In each case, the DNAs replicate as large plasmids after they have been delivered into the bacterium.
Fosmids are vectors that can carry 35- to 45-kb inserts (Figure 10-10). They are engineered hybrids of λ phage DNA and bacterial F plasmid DNA (see Chapter 5). Fosmids are packaged into λ phage particles, which act as the syringes that introduce these big pieces of recombinant DNA into recipient E. coli cells. After they are in the cell, these hybrids, just like the λ phage, form circular molecules that replicate extrachromosomally in a manner similar to plasmids. However, because of the presence of F plasmid origins of replications that couple plasmid replication to host cell chromosome duplication, very few copies of fosmids accumulate in a cell.
Figure 10-10: Fosmids and BACs are cloning vectors that carry large inserts
Figure 10-10: Features of some large-insert cloning vectors. The number of clones needed to cover the human genome once (1 ×) is based on a genome size of 3000 Mb (3 billion base pairs).
The most popular vector for cloning very large DNA inserts in bacteria is the bacterial artificial chromosome (BAC). Derived from the F plasmid, it can carry inserts ranging from 100 to 200 kb, although the vector itself is only ~7 kb (see Figure 10-10). The DNA to be cloned is inserted into the plasmid, and this large circular recombinant DNA is introduced into the bacterium. BACs were the “workhorse” vectors for the extensive cloning required by large-scale genome-sequencing projects, including the public project to sequence the human genome (discussed in Chapter 14).
KEY CONCEPT
The genetic engineer’s toolkit contains a variety of cloning vectors that accept inserts of small sizes for plasmids, to medium sizes for bacteriophage, to large sizes for fosmids and BACs.
Entry of recombinant molecules into the bacterial cell Three methods are used to introduce recombinant DNA molecules into bacterial cells: transformation, transduction, and infection (Figure 10-11; see Sections 5.3 and 5.4).
Figure 10-11: Modes of delivering recombinant DNA into bacterial cells
Figure 10-11: Recombinant DNA can be delivered into bacterial cells by transformation, transduction, or infection with a phage. (a) Plasmid and BAC vectors are delivered by DNA mediated transformation. (b) Certain vectors such as fosmids are delivered within bacteriophage heads (transduction); however, after having been injected into the bacterium, they form circles and replicate as large plasmids. (c) Bacteriophage vectors such as phage λ infect and lyse the bacterium, releasing a clone of progeny phages, all carrying the identical recombinant DNA molecule within the phage genome.
In transformation, bacteria are bathed in a solution containing the recombinant DNA molecule. Because bacterial cells used in research cannot take up DNA molecules as large as recombinant plasmids, they must be made competent (that is, able to take up the DNA from the surrounding media) by either incubation in a calcium solution or exposure to a high-voltage electrical pulse (electroporation). After entering a competent cell through membrane pores, the recombinant molecule becomes a plasmid chromosome (Figure 10-11a). Electroporation is the method of choice for introducing especially large DNAs such as BACs into bacterial cells.
In transduction, the recombinant molecule is combined with phage head and tail proteins to produce a virus that contains largely non-viral DNA. These engineered phages are then mixed with bacteria and they inject their DNA cargo into the bacterial cells, but new phages cannot form because they do not carry the viral genes necessary for phage replication. Fosmids are introduced into cells by transduction (Figure 10-11b).
In contrast to transduction, which produces plasmids and bacterial colonies but not new viruses, infection produces recombinant phage particles (Figure 10-11c). Through repeated rounds of re-infection, a plaque full of phage particles forms from each initial bacterium that was infected. Each phage particle in a plaque contains not only the recombinant DNA but also viral genes needed to create new infective phage particles.
Recovery of amplified recombinant molecules The recombinant DNA packaged into phage particles is easily obtained by collecting phage lysate and isolating the DNA that they contain. To obtain the recombinant DNA packaged in plasmids, fosmids, or BACs, the bacteria are chemically or mechanically broken apart. The recombinant DNA plasmid is separated from the much larger main bacterial chromosome by centrifugation, electrophoresis, or other selective techniques that distinguish the chromosome from the plasmid by size or shape.
KEY CONCEPT
Gene cloning is carried out through the introduction of single recombinant vectors into recipient bacterial cells, followed by the amplification of these molecules as either plasmid chromosomes or phages.
Making genomic and cDNA libraries
We have seen how to make and amplify individual recombinant DNA molecules such as our human insulin cDNA. Consider the task in 1982, when the human insulin gene had to be identified from a library of human genome fragments. To ensure that we have cloned the DNA segment of interest, we have to make large collections of DNA segments that are all-inclusive. For example, we take all the DNA from a genome, break it up into segments of the right size for our cloning vector, and insert each segment into a different copy of the vector, thereby creating a collection of recombinant DNA molecules that, taken together, represent the entire genome. We then transform or infect these molecules into separate bacterial recipient cells, where they are amplified. The resulting collection of recombinant-DNA-bearing bacteria or bacteriophages is called a genomic library. If we are using a cloning vector that accepts an average insert size of 10 kb and if the entire genome is 100,000 kb in size (the approximate size of the genome of the nematode Caenorhabditis elegans), then at least 10,000 independent recombinant clones would be required to represent one genome’s worth of DNA. To ensure that all sequences of the genome that can be cloned are contained within a collection, genomic libraries typically represent an average segment of the genome at least five times (and so, in our example, there will be 50,000 independent clones in the genomic library). This multifold representation makes it highly unlikely that, by chance, a sequence is not represented at least once in the library.
Similarly, representative collections of cDNA inserts require tens or hundreds of thousands of independent cDNA clones; these collections are cDNA libraries and represent only the protein-coding regions of the genome. A comprehensive cDNA library includes mRNA samples from different tissues, different developmental stages, or from organisms grown in different environmental conditions.
Whether we choose to construct a genomic DNA library or a cDNA library depends on the situation. If we are seeking a specific gene that is active in a specific type of tissue in a plant or animal, then it makes sense to construct a cDNA library from a sample of that tissue. For example, suppose we want to identify cDNAs corresponding to insulin mRNAs. The β-islet cells of the pancreas are the most abundant source of insulin, and so mRNAs from pancreas cells are the appropriate source for a cDNA library because these mRNAs should be enriched for the gene in question. A cDNA library represents a subset of the transcribed regions of the genome; so it will inevitably be smaller than a complete genomic library. Although genomic libraries are bigger, they do have the benefit of containing genes in their native form, including introns and untranscribed regulatory sequences. A genomic library is necessary at some stage as a prelude to cloning an entire gene or an entire genome.
KEY CONCEPT
The task of isolating a clone of a specific gene begins with making a library of genomic DNA or cDNA—if possible, enriched for sequences containing the gene in question.