19.3 Molecular Techniques Can Be Used to Find Genes of Interest

To analyze a gene or to transfer it to another organism, the gene must first be located and isolated. For instance, if we wanted to transfer a human gene for growth hormone to bacteria, we must first find the human gene that encodes growth hormone and separate it from the 3.2 billion bp of human DNA. So far, in our consideration of gene cloning we’ve glossed over the problem of finding the DNA sequence to be cloned; the solution to this problem has been purposely delayed until now because, paradoxically, researchers must often clone a gene to find it.

This approach—to clone first and search later—is called “shotgun cloning,” because it is like hunting with a shotgun: the pellets spray widely in the general direction of the quarry, with a good chance that one or more of the pellets will hit the intended target. In shotgun cloning, a researcher first clones a large number of DNA fragments, knowing that one or more contains the DNA of interest, and then searches for the fragment of interest among the clones.

Gene Libraries

A collection of clones containing all the DNA fragments from one source is called a DNA library. For example, we might isolate genomic DNA from human cells, break it into fragments, insert the fragments into vectors, and clone them in bacterial cells. The set of bacterial colonies or phages containing these fragments is a human genomic library, containing all the DNA sequences found in the human genome.

Creating a Genomic Library

To create a genomic library, cells are collected and disrupted, which causes them to release their DNA and other cellular contents into an aqueous solution, and the DNA is extracted from the solution. After the DNA has been isolated, it is cut into fragments by a restriction enzyme for a limited amount of time (a partial digestion) so that only some of the restriction sites in each DNA molecule are cut. Because the cutting of sites is random, different DNA molecules will be cut in different places and a set of overlapping fragments will be produced (Figure 19.14). The fragments are then joined to vectors, which can be transferred to bacteria. A few of the clones contain the entire gene of interest (if the gene is not too large) and a few contain parts of the gene, but most contain fragments that have no part of the gene of interest.

Figure 19.14: A genomic library contains all of the DNA sequences found in an organism’s genome.

550

A genomic library must contain a large number of clones to ensure that all DNA sequences in the genome are represented in the library. A library of the human genome formed by using cosmids, each carrying a random DNA fragment from 35,000 to 44,000 bp long, would require about 350,000 cosmid clones to provide a 99% chance that every sequence is included in the library.

cDNA Libraries

An alternative to creating a genomic library is to create a library consisting only of those DNA sequences that are transcribed into mRNA (called a cDNA library because all the DNA in this library is complementary to mRNA). Much of eukaryotic DNA consists of repetitive (and other DNA) sequences that are not transcribed into mRNA, and these sequences are not represented in a cDNA library.

A cDNA library has two additional advantages. First, it is enriched with fragments from actively transcribed genes. Second, introns do not interrupt the cloned sequences; introns would pose a problem when the goal is to produce a eukaryotic protein in bacteria, because most bacteria have no means of removing the introns.

The disadvantage of a cDNA library is that it contains only sequences that are present in mature mRNA. Sometimes, researchers are interested in sequences that are not transcribed, such as those in promoters and enhancers, which are important for transcription but are not themselves transcribed. These sequences are not present in a cDNA library. Furthermore, a cDNA library contains only those gene sequences expressed in the tissue from which the RNA was isolated, and the frequency of a particular DNA sequence in a cDNA library depends on the abundance of the corresponding mRNA in the given tissue. So, if a particular gene is not expressed, or is expressed only at low frequency in a particular tissue, it may be absent in a cDNA library prepared from that tissue. In contrast, almost all genes are present at the same frequency in a genomic DNA library.

551

To create a cDNA library, messenger RNA must first be separated from other types of cellular RNA (tRNA, rRNA, snRNA, etc.). Most eukaryotic mRNAs possess a poly(A) tail, which provides a convenient hook for separating eukaryotic mRNA from the other types. Total cellular RNA is isolated from cells and poured through a column packed with short fragments of DNA consisting entirely of thymine nucleotides—that is, oligo(dT) chains (Figure 19.15a). As the RNA moves through the column, the poly(A) tails of mRNA molecules pair with the oligo(dT) chains and are retained in the column, whereas the rest of the RNA passes through it. The mRNA can then be washed from the column by the addition of a buffer that breaks the hydrogen bonds between poly(A) tails and oligo(dT) chains.

Figure 19.15: A cDNA library contains only those DNA sequences that are transcribed into mRNA.

The mRNA molecules are then copied into cDNA. Reverse transcriptase, an enzyme isolated from retroviruses (see Chapter 9), synthesizes single-stranded complementary DNA from the RNA template (Figure 19.15b). The resulting RNA-DNA hybrid molecule is finally converted into a double-stranded cDNA molecule by DNA polymerase. TRY PROBLEM 35

CONCEPTS

One method of finding a gene is to create and screen a DNA library. A genomic library is created by cutting genomic DNA into overlapping fragments and cloning each fragment in a separate bacterial cell. A cDNA library is created from mRNA that is converted into cDNA and cloned in bacteria.

Screening DNA Libraries

Creating a genomic or cDNA library is relatively easy compared with screening the library to find clones that contain the gene of interest. The screening procedure used depends on what is known about the gene.

The first step in screening is to plate the clones of the library. If a plasmid or cosmid vector was used to construct the library, the cells are diluted and plated so that each bacterium grows into a distinct colony. If a phage vector was used, the phages are allowed to infect a lawn of bacteria on a petri plate. Each plaque or bacterial colony contains a single, cloned DNA fragment that must be screened for the gene of interest.

A common way to screen libraries is with probes. To use a probe, replicas of the plated colonies or plaques in the library must first be made. Figure 19.16 illustrates this procedure for a cosmid library.

Figure 19.16: Genomic and cDNA libraries can be screened with a probe to find the gene of interest.

How is a probe obtained when the gene has not yet been isolated? One option is to use a similar gene from another organism as the probe. For example, if we wanted to screen a human genomic library for the growth-hormone gene and the gene had already been isolated from rats, we could use a purified rat-gene sequence as the probe to find the human gene for growth hormone. Successful hybridization does not require perfect complementarity between the probe and the target sequence, so a related sequence can often be used as a probe.

Alternatively, synthetic probes can be created if the protein produced by the gene has been isolated and its amino acid sequence has been determined. With the use of the genetic code and the amino acid sequence of the protein, possible nucleotide sequences of a small region of the gene can be deduced. Although only one sequence in the gene encodes a particular protein, the presence of synonymous codons means that the same protein could be produced by several different nucleotide sequences, and it is impossible to know which is correct. To overcome this problem, a mixture of all the possible nucleotide sequences is used as a probe. To minimize the number of sequences required in the mixture, a region of the protein is selected with relatively little degeneracy in its codons. When part of the DNA sequence of the gene has been determined, a set of DNA probes can be synthesized chemically by using an automated machine known as an oligonucleotide synthesizer.

552

Yet another method of screening a library is to look for the protein product of a gene. This method requires that the DNA library be cloned in an expression vector. The clones can be tested for the presence of the protein by using an antibody that recognizes the protein or by using a chemical test for the protein product. This method depends on the existence of a test for the protein produced by the gene. Gene libraries can also be screened using PCR or by sequencing.

CONCEPTS

A DNA library can be screened for a specific gene with the use of complementary probes that hybridize to the gene. Alternatively, the library can be cloned into an expression vector, and the gene can be located by examining the clones for the protein product of the gene.

CONCEPT CHECK 6

Briefly explain how synthetic probes are created to screen a DNA library when the protein encoded by the gene is known.

In Situ Hybridization

DNA probes can be used to determine the chromosomal location of a gene in a process called in situ hybridization. The name is derived from the fact that DNA (or RNA) is visualized while it is in the cell (in situ). This technique requires that the cells be fixed and the chromosomes be spread on a microscope slide and denatured. A labeled probe is then applied to the slide, just as it can be applied to a gel. Many probes carry attached fluorescent dyes that can be seen directly with the microscope (Figure 19.17a). Several probes with different colored dyes can be used simultaneously to investigate different sequences or chromosomes. Fluorescence in situ hybridization (FISH) has been widely used to identify the chromosomal location of human genes.

Figure 19.17: With in situ hybridization, DNA probes are used to determine the cellular or chromosomal location of a gene or its product. (a) A probe with green fluorescence is specific to chromosome 7, revealing a deletion on one copy of chromosome 7. (b) In situ hybridization is used to detect the presence of mRNA from the tailless gene in a Drosophila embryo.
[Part a: Addenbrookes Hospital/Science Source. Part b: Courtesy of L. Tsuda.]

In situ hybridization can also be used to determine the tissue distribution of specific mRNA molecules, serving as a source of insight into how gene expression differs among cell types (Figure 19.17b). A labeled DNA probe complementary to a specific mRNA molecule is added to tissue, and the location of the probe is determined with the use of radioactive or fluorescent tags. Determining where a gene is expressed often helps define its function. For example, finding that a gene is highly expressed only in brain tissue might suggest that the gene has a role in neural function.

Positional Cloning

For many genes with important functions, no associated protein product is yet known. The biochemical bases of many human genetic diseases, for example, are still unknown. How can these genes be isolated? One approach is to first determine the general location of the gene on the chromosome by using recombination frequencies derived from crosses or pedigrees (see Chapter 7). After the chromosomal region where the gene is found has been identified, genes in this region can be cloned and identified. Then other techniques can be used to identify which of the “candidate” genes might be the one that causes the disease. This approach—to isolate genes on the basis of their position on a gene map—is called positional cloning.

In the first step of positional cloning, geneticists use mapping studies (see Chapter 7) to establish linkage between molecular markers and a phenotype of interest, such as a human disease or a desirable physical trait in a plant or animal. Demonstration of linkage between the phenotype and one or more molecular markers would provide information about which chromosome carries the locus that codes for the phenotype and its general location on that chromosome.

553

The next step is to more precisely locate the locus by using additional molecular markers clustered in the chromosomal region where the locus resides. After the gene has been placed on a chromosome map, clones that cover the region can be isolated from a genomic library. With the use of a technique called chromosome walking (Figure 19.18), it is possible to progress from neighboring genes to linked clones, one of which might contain the gene of interest. The basis of chromosome walking is the fact that a genomic library consists of a set of overlapping DNA fragments (see Figure 19.14). We start with a cloned gene marker that is close to the new gene of interest so that the “walk” will be as short as possible. One end of the clone of a neighboring marker (clone A in Figure 19.18) is used to make a complementary probe. This probe is used to screen the genomic library to find a second clone (clone B) that overlaps with the first and extends in the direction of the gene of interest. This second clone is isolated and purified and a probe is prepared from its end. The second probe is used to screen the library for a third clone (clone C) that overlaps with the second. In this way, one can systematically “walk” toward the gene of interest, one clone at a time.

Figure 19.18: In chromosome walking, neighboring genes are used to locate a gene of interest.

A related technique called chromosome jumping allows one to move from more distantly linked markers to clones that contain a sequence of interest. After clones that cover the delineated region have been obtained by chromosome walking or jumping, all genes located within the region are identified. Genes can be distinguished from other sequences by the presence of characteristic features, such as consensus sequences in the promoter, and a start codon and a stop codon within the same reading frame. After “candidate” genes have been identified, they can be evaluated to determine which is most likely to be the gene of interest. The expression pattern of the gene—where and when it is transcribed—can often provide clues about its function. For example, genes for neurological disease would likely be expressed in the brain. Geneticists often look in the coding region of the gene for mutations among people with the disease. More will be said about determining the function of genes in sections that follow and in Chapter 20.

CONCEPTS

Positional cloning allows researchers to isolate a gene without having knowledge of its biochemical basis. Linkage studies are used to map the locus producing a phenotype of interest to a particular chromosome region. Chromosome walking and jumping can be used to progress from molecular markers to clones containing sequences that cover the chromosome region. Candidate genes within the region are then evaluated to determine if they encode the phenotype of interest.

CONCEPT CHECK 7

How are candidate genes that are identified by positional cloning evaluated to determine whether they encode the phenotype of interest?

Application: Isolating the Gene for Cystic Fibrosis

The first gene responsible for a human genetic disease that was isolated entirely by positional cloning was the gene for cystic fibrosis (CF). Cystic fibrosis is an autosomal recessive disorder characterized by chronic lung infections, insufficient production of pancreatic enzymes that are necessary for digestion, and increased salt concentration in sweat (Figure 19.19). It is among the most common genetic diseases in Caucasians, occurring with a frequency of about 1 in 2000 live births. Nearly 5% of all Caucasians are carriers of the CF mutation.

Figure 19.19: Cystic fibrosis was the first genetic disease for which the causative gene was isolated entirely by positional cloning. New treatments have greatly helped patients with cystic fibrosis. This girl wears a “smart vest,” which shakes her chest to help break up mucous in her lungs, and inhales from a nebulizer that contains enzymes and salt water, which also help break up mucous.
[Jeffrey Sauger Photography.]

554

Geneticists attempting to isolate the gene for CF faced a formidable task. The symptoms of the disease, especially the elevated salt concentration in sweat, suggested that the gene for CF somehow takes part in the movement of ions into and out of the cell, but no information was available about the protein encoded by the gene. At the time, the human genome had not yet been sequenced. Analyses of pedigrees showed that CF is inherited as an autosomal recessive trait, and so it might be located on any one of the 22 pairs of autosomal chromosomes. Thus, geneticists were seeking an unknown gene—probably encompassing a few thousand or tens of thousands of base pairs—among the 3.2 billion base pairs of the human genome.

Researchers began by looking for associations between the inheritance of CF and that of other traits (Figure 19.20). Early studies were limited by the scarcity of genetic traits that varied and could be used for gene-mapping studies, but in the 1980s, advances in molecular biology provided a large number of molecular markers that could be used for linkage analysis. Geneticists collected pedigrees of families in which several members had CF. They compared the inheritance of CF with that of molecular markers among the members of these families, looking for evidence of linkage. The gene for CF was found to be closely linked to two markers, MET and D7S8, located on the long arm of chromosome 7. MET and D7S8 are separated by about 1.5 map units (see Chapter 7). In the human genome, each map unit roughly corresponds to 1 million base pairs; so the gene for CF is located somewhere within a stretch of 1.5 million base pairs of DNA, a huge expanse of sequence.

Figure 19.20: The gene for cystic fibrosis was located by positional cloning.

Further linkage studies with additional markers were carried out to more precisely delineate where in the 1.5-million-base-pair region the CF gene lies. Researchers selected additional molecular markers from the region surrounding MET and D7S8 and performed linkage studies between these new markers and CF (see Figure 19.20). These studies identified two additional markers, D7S122 and D7S340, which are closely linked to CF. Furthermore, they showed that the order of the four markers is MET-D7S340-D7S122-D7S8 and that the CF gene lies very close to D7S122 and D7S340. This finding narrowed the region in which the gene for CF lies to about 500,000 bp.

At this stage, geneticists began isolating clones of sequences from the delineated region. Starting from the molecular markers, they used a combination of chromosome walking and chromosome jumping to identify clones from human genomic libraries that completely covered the region of interest (see Figure 19.20). An examination of sequences within these clones revealed the presence of four genes in the region encompassed by the linked markers (see Figure 19.20). Additional studies were then carried out to better characterize these candidate genes. Three of the candidate genes were eventually eliminated, either because linkage studies suggested that they were not closely linked with the inheritance of CF or because analysis of the sequences or their expression patterns suggested they were not the gene for CF.

555

Hybridization studies were carried out with the one remaining gene to determine where it was expressed. Messenger RNA was isolated from different organ tissues and probed with sequences from the candidate gene. The gene showed high levels of expression in the pancreas, lungs, and sweat glands (Figure 19.21), tissues known to be affected by CF.

Figure 19.21: A candidate for the cystic fibrosis gene is expressed in pancreatic, respiratory, and sweat-gland tissues—tissues that are affected by the disease. Shown is a Northern blot of mRNA produced by the candidate gene in different tissues. These data provided evidence that the candidate gene is in fact the gene that causes cystic fibrosis.
[From J.R. Riordan et al., Science 245:1066-1073, 1989. Reprinted with permission from AAAS.]

Copies of the candidate gene from a healthy person and from a person with CF were then sequenced, and the sequence data were examined for differences that might be a mutation causing CF. The findings revealed that the person with CF had a 3-bp deletion in the coding region of the gene, while the healthy person did not have this deletion. The deletion resulted in the absence of a phenylalanine amino acid from the protein encoded by the candidate gene. Then, for a large number of patients with CF, geneticists used PCR to amplify the region of the gene where the deletion was found; 68% of the CF patients had this deletion. Subsequent studies demonstrated that the remaining CF patients possessed mutations at other locations within the candidate gene, thus proving that the candidate gene was indeed the locus that caused CF.

Researchers eventually demonstrated that the gene for CF encodes a membrane protein that controls the movement of chloride into and out of cells and is known today as the cystic fibrosis transmembrane conductance regulator (CFTR) gene. Patients with CF have two mutated forms of CFTR, which cause the chloride channels to remain closed. Chloride ions build up in the cell, leading to the formation of thick mucus and the symptoms of the disease.