14.1 MECHANISMS OF SITE-SPECIFIC RECOMBINATION
Recombination between specific sequences can result in insertion, deletion, or inversion of the DNA sequence between those sites. Recombination reactions of this type occur in virtually every cell, filling specialized roles that vary greatly from one species to another but sharing a common mechanism. Each site-specific recombination system consists of a short, unique DNA sequence (20 to 200 bp) and a recombinase, an enzyme that acts specifically at that sequence. In some systems, additional proteins are required to facilitate or regulate the process. The result of a site-specific recombination reaction can be similar to that of the crossovers that sometimes accompany homologous recombination, but the process does not require extensive homology at recombination sites.
The DNA rearrangements promoted by site-specific recombinases appear in numerous and sometimes surprising roles. Examples range from prescribed roles in the replication cycles of viral, plasmid, and bacterial DNAs, to key events in the life cycle of some viruses, to regulation of the expression of certain genes.
Precise DNA Rearrangements Are Promoted by Site-Specific Recombinases
The recombination sites recognized by site-specific recombinases often consist of two inverted repeats, separated by a short asymmetric (nonpalindromic) core sequence (Figure 14-2a). During site-specific recombination, the asymmetric cores of two recombination sites are aligned so that their sequences proceed in the same direction. The recombinase recognizes and binds specifically to the symmetric repeats on either side of each aligned core. Since it is not directly bound by the recombinase in most of these systems, the sequence in the core itself can often be varied without affecting its recognition by the recombinase. However, recombination occurs only if the sequences of the two cores are identical.
Figure 14-2: The structure and activity of site-specific recombination sites. (a) Shown here are both strands of the recombination sites from two well-studied recombination systems widely used in biotechnology, lox (loxP) and FRT; the inverted 13 bp repeats are binding sites for the recombinases, named Cre and Flp, respectively. (These two recombination sites and their recombinases are discussed later in the chapter.) The inverted repeats are separated by an asymmetric core sequence. The cleavage and exchange events described in Figure 14-3 occur at or near the ends of the core sequence. (b) The colored ribbons here represent double-stranded DNA. Two recombination sites flank a length of DNA to be recombined. A, B, and C are imaginary genes or genetic markers in the DNA separating the two FRT sites. Each orange arrow represents a complete FRT site, as illustrated in (a). Orientation (shown by the arrowheads) refers to the asymmetric nucleotide sequence in these recombination sites, not the 5′→3′ direction. Recombination can lead to inversion (left) or deletion and, in the reverse process, insertion (right).
The asymmetric core sequence in a recombination site gives each site an orientation within the surrounding DNA. The overall outcome of a site-specific recombination reaction depends on the location and relative orientation of the recombination sites within the genomic DNA in which they reside (Figure 14-2b). Recombination between two oppositely oriented sites on the same DNA molecule produces an inversion. Recombination between two sites with the same orientation on the same DNA molecule results in a deletion. If the sites are on different DNAs, the result is the insertion of one of the DNAs into the other. Some recombinase systems are highly specific for one of these reaction types and act only on sites with particular orientations.
Site-specific recombination systems use either a Tyr or a Ser residue as the key nucleophile in the active site. (The Cre and Flp recombinases discussed here and illustrated in Figures 14-2 and 14-3 both utilize Tyr residues.) In vitro studies of many such systems have clarified the fundamental reaction pathway. A pair of recombinases recognizes and binds to each of two recombination sites. The two recombination sites are brought together by their bound recombinases to form a synaptic complex that incorporates a total of four recombinase subunits. Within this complex, the core sequences of the two recombination sites are aligned. If the two recombination sites are on the same DNA molecule, the intervening core DNA is bent into a loop as the sites are brought together (see Figure 14-2b, left).
The site-specific recombination reaction is best understood for a family of tyrosine-class recombinases that includes the enzymes Cre and Flp (Figure 14-3a). The active-site tyrosines of two of the four recombinases in the complex each attack a specific phosphodiester bond, each located in one DNA strand near the end of the core sequence of each recombination site. The reactions proceed as in Figure 14-1, with a new phosphodiester bond formed between the DNA and the tyrosyl oxygen (forming a phosphotyrosine bond) at each active site as the DNA itself is cleaved. One recombinase subunit thus becomes covalently linked to the DNA at each cleavage site (see Figure 14-3a, step 1). These phosphoryl transfers do not occur at random within the complex. Instead, the reaction is structurally choreographed so that two opposing recombinase subunits (shown in light gray in Figure 14-3) are active while the other two are not. The transient protein-DNA linkage ensures that the overall reaction proceeds with a minimal free-energy change, so high-energy cofactors such as ATP are unnecessary. Each of the free 3′ hydroxyls of the cleaved DNA strands now becomes the nucleophile in step 2, attacking the phosphotyrosine linkage to free the Tyr residue and form a new phosphodiester bond. However, the reaction involves new DNA partners and leads to the formation of a Holliday intermediate closely related to the Holliday intermediates described in Chapter 13 (albeit formed via a different pathway). An isomerization then occurs in the protein complex (step 3). This step includes a branch migration through the core sequence (a step that is blocked if the cores are not identical), coupled to a conformational change such that the active sites of the two recombinase subunits that did not participate in the first two steps become properly positioned relative to the phosphodiester bonds they must act on to complete the reaction. The sequence of two phosphoryl transfer reactions is then repeated (steps 4 and 5), one to create a new set of covalent phosphotyrosine bonds linking the DNA to the protein, and the other to resolve this protein-DNA complex and create new phosphodiester bonds. Each of these new reactions occurs on the opposite strand and at the opposite end of the two recombination sites relative to the reactions in the first two steps.
Figure 14-3: A site-specific recombination reaction. (a) The reaction proceeds within a tetramer of identical recombinase subunits. The subunits bind to the recombination site and catalyze the recombination in several steps, as described in the text. The light gray subunits are the active subunits, with the active-site Tyr residue either poised to react or covalently linked to the DNA. The darker gray subunits are in an inactive conformation in which the active-site Tyr residues are too distant from their DNA substrates to function. Isomerization (step 3) switches the conformations of both sets of subunits so that the inactive subunits (dark gray) become active (light gray) and the active-site Tyr residues are now properly positioned to promote the reaction. (b) A surface contour model of Flp recombinase, showing the four subunits bound to a Holliday intermediate; this is equivalent to the product of step 2 in (a). The protein is made transparent so that the bound DNA is visible.
In systems that use an active-site Ser residue (see this chapter’s Moment of Discovery), both strands of each recombination site are cut concurrently and rejoined to new partners, without the Holliday intermediate. All four recombinase subunits participate, each forming a phosphoserine covalent intermediate at the cleavage sites. In both types of system (serine and tyrosine), the exchange is always reciprocal and precise, regenerating a new pair of fully functional recombination sites when the reaction is complete.
Many mechanistic details of these reactions have become clear with the structural elucidation of recombinases caught at different steps of the process. The four recombinase subunits and the four DNA arms in the synaptic complex take up a square planar arrangement (Figure 14-3b). As shown by the crystal structure, the tyrosine-class recombinases are not in perfect fourfold symmetry. Instead, alternating subunits are in slightly different conformations—two active (with the active-site Tyr residues positioned near the phosphodiester bonds to be cleaved) and two inactive. The isomerization step described above (see Figure 14-3a, step 3), coupled with subtle conformational changes, converts the active recombinase subunits to the inactive state, and inactive subunits to active.
The overall process is closely related to the reaction mechanism promoted by topoisomerases (see Figure 9-19). For both topoisomerases and site-specific recombinases, the reaction begins with the formation of a protein-DNA phosphotyrosine or phosphoserine linkage at the expense of a phosphodiester bond in the DNA. In the case of topoisomerases, the same phosphodiester bond is re-created after the DNA topology has been changed. In the case of site-specific recombinases, each end of the cleaved phosphodiester bond is joined to a new partner. The different outcomes are brought about by the different architectures of the proteins promoting the two reactions, resulting in different and very precise movements of DNA segments between the phosphodiester cleavage and re-formation steps.
Site-Specific Recombination Complements Replication
Replication of the circular chromosomes of viruses, plasmids, and many bacteria poses a unique set of challenges. As noted in Chapter 13, recombinational DNA repair of stalled replication forks can give rise to contiguous dimeric chromosomes (see Figure 13-14). A specialized site-specific recombination system in E. coli converts the dimeric chromosomes to monomeric chromosomes so that cell division can proceed. The reaction is a site-specific deletion reaction catalyzed by a tyrosine-class recombinase, XerCD. As described here, site-specific recombination can also be used as an elegant mechanism to generate more than two copies of a chromosome during one replication cycle.
A common plasmid in Saccharomyces cerevisiae, the 2μ (2 micron) plasmid, has a site-specific recombination system. The recombinase, known as Flp (a shortened version of flippase, an early and somewhat whimsical name for the enzyme), is encoded by the plasmid. In this system, site-specific recombination is used to amplify the number of plasmids in the cell (the plasmid copy number) whenever necessary. When the copy number falls too low, Flp is activated to promote the recombination reaction. The key to copy-number amplification is the timing of the recombination. The replication origin of the plasmid is situated such that one Flp recombination target (FRT) site is replicated well before the other (Figure 14-4). If recombination occurs when only one FRT has been replicated, the result is the inversion not just of one segment of DNA but of one replication fork relative to the other. Instead of meeting at the opposite side of the circle, the two forks begin to follow each other around the circle, promoting an extended rolling-circle replication. This generates multiple tandem copies of the plasmid, instead of just two, from one replication initiation. The multimeric plasmid is then broken down into plasmid monomers by subsequent Flp-mediated recombination events carried out between FRT sites in the same orientation.
Figure 14-4: Coupling site-specific recombination to extensive replication in a yeast plasmid. The yeast 2μ plasmid, a circular DNA, has two FRT sites (orange) on opposite sides of the circular DNA molecule and inverted relative to each other. FRT sites are targeted by the plasmid-encoded Flp recombinase. The recombination reaction inverts the DNA in the sequences of about one half of the plasmid relative to those of the other half. The inversion also changes the direction of one replication fork relative to the other. Inversion thus leads to a double rolling-circle replication that can produce multiple copies of the plasmid in one replication cycle, increasing the plasmid copy number in the cell.
Site-Specific Recombination Can Be a Stage in a Viral Infection Cycle
Bacteriophages such as P1 and λ have played important roles in the development of molecular biology and biotechnology (see the How We Know section at the end of this chapter). When it enters a cell, the DNA of these phages has two potential fates (Figure 14-5). A lysogenic pathway involves incorporation of the phage genome as part of the host genome, either by integration into the host chromosome or as an autonomously replicating plasmid. In either case, phage genes are largely repressed, and the phage DNA is replicated passively by host enzymes. Lysogenized bacteriophage genomes are referred to as prophages, and the parasitic infection is benign as long as the phage remains in this state. In the alternative, lytic pathway, the bacteriophage DNA is replicated and packaged into new phage heads, and the host cell is destroyed by lysis to disperse the progeny. The specific mechanisms used in the P1 life cycle feature site-specific recombination in some key steps.
Figure 14-5: Two possible fates for a phage-infected host cell. Several types of bacteriophage introduce their DNA into cells in a linear form, which is circularized inside the cell. The lysogenic pathway involves either integration of the DNA (now referred to as a prophage) into the host chromosome or its passive replication as a plasmid. The alternative, lytic pathway eventually destroys the host cell and releases phage progeny.
P1 enters a bacterial cell as a linear DNA molecule containing multiple contiguous copies of the 90 kbp genome. In the host, the DNA is rapidly circularized to produce multiple genome-length circular DNAs (Figure 14-6a). The circularization can occur by homologous recombination, or it can be promoted by a phage-encoded site-specific recombination system. The latter system, known as Cre-lox, involves recombination sites called loxP (locus of crossover (x), phage) sites—more often known simply as lox sites—and the recombinase Cre (cyclization recombination). The circularized DNA can enter a lysogenic state, maintaining the same copy number in the host cell. In addition to circularization, the Cre-lox recombination system aids in the orderly dispersal of the P1 genomes to daughter cells at cell division by resolving any circular P1 dimers to monomers. In the lytic pathway, P1 replicates in a rolling-circle mode in which the replication fork travels unidirectionally around the circularized chromosome (Figure 14-6b). This generates long, linear DNAs with many contiguous copies of the P1 genome. The DNAs are cut and the genomes incorporated into phage heads before cell lysis. Occasionally, large pieces of host DNA are also incorporated into phage heads. This low-frequency event allows P1 to be used as an experimental vehicle to move bacterial genes from one cell to another in a process known as bacterial transduction.
Figure 14-6: Circularization of P1 DNA. (a) P1 phage contains about 1.4 copies of its genome, packaged in its head. In a host cell, circularization generates monomeric, circular genomes. The orange arrows are loxP sites and serve to define one monomeric genome equivalent. Circularization may involve Cre-mediated recombination or homologous recombination at other locations. (b) Following circularization, the P1 genome can undergo rolling-circle replication, producing many copies from one template.
Site-Specific Recombination Systems Are Used in Biotechnology
As noted earlier, the Flp recombination system of the yeast 2μ plasmid and the Cre-lox system of bacteriophage P1 are tyrosine-class recombinases. These are relatively simple systems in which the recombination sites are short (about 34 bp long) and the recombinases (Flp and Cre) are the only enzymes required. The recombination can be adapted to produce inversion, deletion, or insertion, depending on the placement and orientation of the recombination sites. The overall reactions promoted by Flp and Cre are isoenergetic, occurring without the input of ATP. The Flp and Cre reactions thus tend to approach an equilibrium in which substrates and products are in equal concentrations. In this case, simplicity gives rise to practical application.
The Cre and Flp systems will function when engineered into the cells of any organism and thus are highly useful in a wide range of biotechnological applications. A few such applications are shown in Figure 14-7. If the requisite lox or FRT sites are engineered into plasmids or chromosomes in the proper locations, these systems can be (and have been) used to activate a particular gene, insert a new gene into a cell at a chosen location, replace one gene with another gene or an altered version of the same gene, delete a gene, or alter the linear structure of an entire chromosome. The sequence specificity of the recombinases allows all of these transactions to be promoted with extraordinary precision. Even more elaborate manipulations are possible. For example, if you tied expression of the recombinase to a promoter expressed only in a particular tissue, you could limit the recombination event to that tissue. Deleting a gene in a certain tissue at a specified time can be a powerful tool for exploring the function of that gene. A vivid example of the application of site-specific recombination in biotechnology is described in Highlight 14-1.
Figure 14-7: Some biotechnology applications of site-specific recombination. (a) Gene activation (by deletion of a transcription stop signal). (b) Gene insertion. (c) Allele replacement, using two variants of the same recombination target site at each end of the gene (or other sequence) to be replaced. In each case, the site-specific recombination target site is engineered into the chromosome or into the introduced DNA, or both, with the orientation indicated by the arrows.
Gene Expression Can Be Regulated by Site-Specific Recombination
The biological uses of site-specific recombination are varied and extend even to the regulation of genes. Salmonella typhimurium, which inhabits the mammalian intestine, moves by rotating the flagella on its cell surface. The many copies of the protein flagellin (Mr 53,000) that make up the flagella are prominent targets of mammalian immune systems. But Salmonella cells have a mechanism that evades the immune response: they switch between two distinct flagellin proteins (FljB and FliC) roughly once every 1,000 generations, using the process of phase variation.
The switch is accomplished by periodic inversion of a segment of DNA containing the promoter for a flagellin gene. The inversion is a site-specific recombination reaction mediated by the Hin (H DNA invertase) recombinase at specific 14 bp sequences (hix sequences) at either end of the DNA segment. When the DNA segment is in one orientation, the gene for FljB flagellin and the gene for a repressor protein (FljA) are expressed; the repressor shuts down expression of the gene for FliC flagellin (Figure 14-8a). When the DNA segment is inverted, the fljA and fljB genes are no longer transcribed, and the fliC gene is induced as the repressor becomes depleted (Figure 14-8b). The Hin recombinase, encoded by the hin gene in the DNA segment that undergoes inversion, is expressed when the DNA segment is in either orientation, so the cell can always switch from one state to the other.
Figure 14-8: Phase variation in Salmonella flagellin genes. (a) In one orientation, fljB is expressed along with a repressor protein (product of the fljA gene) that turns off transcription of the fliC gene; the result is production of the flagellin FljB. (b) In the opposite orientation, the fljA and fljB genes cannot be transcribed, and only the fliC gene is expressed, producing flagellin FliC. Salmonella can flip between these two flagellin-producing systems.
The Salmonella system is by no means unique. Similar regulatory systems occur in other bacteria and in some bacteriophages. Gene regulation by DNA rearrangements that move genes and/or promoters is particularly common in pathogens that benefit by changing their surface proteins, thereby staying ahead of host immune systems.
Hin belongs to a family of recombinases that are distinct from the Flp and Cre enzymes: the serine-class recombinases. Unlike Flp and Cre, Hin promotes reactions that are highly constrained (inversions only). The reactions make use of auxiliary proteins called Fis that participate in the site-specific recombination reaction promoted by Hin.
The Hin recombinase catalyzes recombination only when its hix recombination sites are on the same supercoiled DNA molecule and are inverted relative to each other. This specificity is accomplished through the formation of an elaborate structure involving both Hin and the Fis proteins, which acts as a topological filter (Figure 14-9). The structure cannot form unless the DNA is supercoiled and the sites are properly oriented. The basic site-specific recombination reaction can thus be exquisitely adapted to the particular needs of a virus or cell.
Figure 14-9: A topological filter in Hin-hix recombination. The ribbons represent double-stranded DNA. The Hin recombination complex consists of four Hin subunits bound in pairs to two hix sites, along with an enhancer sequence (green) bound by a host-encoded protein called Fis. Recombination cannot occur unless the entire complex comes together with the topology shown here. This particular alignment of hix sites occurs only if the sites are present in the same supercoiled DNA molecule and have the opposite orientation.
HIGHLIGHT 14-1 TECHNOLOGY: Using Site-Specific Recombination to Trace Neurons
Site-specific recombination offers an opportunity to modify genomic DNA with exquisite precision. A vivid example, and one that illustrates its biotechnological potential, is the brainbow technology introduced by Jeff Lichtman, Jean Livet, and Joshua Sanes. Combining transgenic technology and site-specific recombination, they created a system to map neurons in the mouse brain.
The use of site-specific recombination to make genomic alterations always requires some genetic engineering. Recombination target sites must be installed where a biotechnologist wants them to be. The corresponding recombinase must also be present in the same cell and at the right time, so the systems have to be adapted to a particular purpose or experiment. The most widely used recombination systems are the Cre-lox system of bacteriophage P1 and the Flp-FRT system of the yeast 2μ plasmid. Both have relatively short recombination target sites (34 to 36 bp; see Figure 14-2a), and both rely on a single recombinase enzyme (Cre or Flp) that requires no auxiliary proteins.
Mario Capecchi, Oliver Smithies, and Martin Evans developed powerful procedures to generate transgenic mice with designed genomic alterations (Figure 1; see Chapter 7 for discussion of such procedures). DNA containing a gene alteration is introduced into mouse embryonic stem cells. The DNA is introduced in the form of an engineered plasmid DNA that sandwiches the desired alteration between a set of selectable markers that will allow identification of the rare cells with the desired alteration inserted at the correct location. The altered stem cells (from brown mice) are introduced into a very early embryonic stage (blastocyst) of black mice, and the embryos are implanted in a surrogate mother. The altered brown-mouse cells become part of the developing embryo and can appear in almost any tissue. Progeny with the altered brown-mouse cells in their germ line are crossed to produce all-brown progeny, signifying the presence of the desired genetic change in the germ line.
FIGURE 1 Transgenic mice are engineered by insertion of a targeting vector into embryonic stem cells. The vector contains a selectable marker, X, and the desired chromosomal alteration (such as a lox site to be introduced) sandwiched between two DNA segments complementary to the chromosomal site where the alteration is to be integrated. These chromosomal sequences can direct homologous recombination. A second selectable marker, Y, is included in the cassette, outside these homologous sequences, and is generally introduced to the chromosome only if the DNA is integrated at an incorrect (nonhomologous) chromosomal site. The targeted cells are subjected to selection in vitro, using drugs to select “for” cells that have selectable marker X and “against” cells that also have selectable marker Y. The surviving cells are then introduced into an early-stage embryo and become part of the tissue of the developing mouse. Genetic crosses allow the selection of mice expressing the desired alterations in their germ line.
The brainbow method makes use of green fluorescent protein (GFP), the procedures for generating transgenic mice, and site-specific recombination to trace the path of the countless neurons that make up the brain (Figure 2). Inserted into the mouse genome is a gene cassette (a structured set of genes arranged for a particular biotechnological purpose) with several copies of GFP variants that encode proteins fluorescing with different colors, such as red (RFP), orange (OFP), yellow (YFP), and cyan (CFP). Variants of the loxP target site for the Cre recombinase are engineered between the genes. The loxP sites are arranged so that only one of the three possible recombination events can occur in a particular cassette, and each event will result in gene expression of one of the four GFP variants. The cassette also includes a promoter that directs gene expression only in neurons. Transgenic mice have been engineered that have several of these cassettes, with the potential for expressing one GFP variant from each cassette in a given neuron.
FIGURE 2 Neuronal networks can be traced by turning the network into a brainbow. Transgenic mice with genes encoding GFP variants and different (modified) intervening lox sites are mated to transgenic mice with the gene encoding the Cre recombinase, leading to heterozygous progeny that carry out Cre-mediated recombination in developing neurons. (a) The different lox sites differ in their core sequences, so lox1 reacts only with lox1, and lox2 only with lox2. (b) In this case, the cassette utilizes three or more different lox sites. Cre-mediated recombination results in different patterns of GFP variant expression, and thus a different color, in each neuron. (c) The resulting brainbow of neurons, visualized by a specialized form of light microscopy (epifluorescence).
The engineered mice with GFP cassettes are homozygous for these cassettes, and they pass them on to all their progeny. Separately, a second population of homozygous transgenic mice is engineered to express the Cre recombinase, again from a promoter directing gene expression transiently and only in developing neurons. When a mouse with the GFP cassettes is mated to a mouse expressing the Cre recombinase, all progeny are heterozygous for both the cassette and the recombinase genes. As these mouse embryos develop, the Cre recombinase is expressed early in the development of each neuron. Recombination events occur in some or all of the cassettes in a given neuron, leading to the expression of a particular set of GFP variants. Mixing several different GFP variants in a cell increases the number of potential colors. The outcome is unpredictable for each cell. However, only one set of recombination reactions occurs in each cell, and the end result imparts a distinctive color that is expressed for the life of that neuron. Neighboring developing neurons go through the same recombination processes, but the number of possible outcomes is large and neighboring cells rarely acquire the same color. The result is a rainbowlike array of fluorescent colors in the neural network—a brainbow! Researchers use the brainbow to trace the paths of the axons through the brain. Site-specific recombinases have been used broadly in similar schemes to trace cell lineages in many organisms.
SECTION 14.1 SUMMARY
Site-specific recombination entails the precise cleavage and rejoining of DNA ends at specific and reproducible sites in the DNA.
There are two classes of site-specific recombinases, defined by the key nucleophilic amino acid residue at their active sites: tyrosine or serine.
Site-specific recombination can be coupled to replication, or resolution of chromosomal dimers to monomers before cell division, or amplification of plasmid copy number.
The mechanism of site-specific recombination can be used to facilitate DNA transactions critical to a viral life cycle.
Biotechnologists have adapted site-specific recombination systems to manipulate DNA segments ranging from plasmids to genomes.
Some organisms use site-specific recombination to regulate gene expression, as in phase variation in Salmonella.
Site-specific recombination can be rendered specific for one particular reaction outcome (integration, deletion, or inversion) by coupling the reaction to the formation of a larger, structured complex in which only one reaction outcome is possible.