14.3 THE EVOLUTIONARY INTERPLAY OF TRANSPOSONS AND THEIR HOSTS

As we noted in the introduction to this chapter, transposition provides an alternative survival strategy that almost certainly has ancient roots. Evolution has given rise to many types of transposons, which have colonized the genomes of all extant organisms, and it has also given rise to other, more complex pathogens, such as viruses. Although the need to adapt to their hosts has affected transposon evolution, transposons have not always remained separate entities. Some transposable elements, and the enzymes they encode, have been appropriated by host cells and adapted to new biological tasks. The movements of transposons have driven genomic changes that have contributed in important ways to evolution.

Viruses, Transposons, and Introns Have an Interwoven Evolutionary History

Many well-characterized eukaryotic retrotransposons, from sources as diverse as yeast and fruit flies, have a structure very similar to that of retroviruses. Based on studies of reverse transcriptase genes, researchers hypothesize that retrotransposons probably gave rise to retroviruses.

The evolution of virtually all retrotransposons and retroviruses can be traced through their reverse transcriptase genes. Similarly, the evolution of an even broader range of transposable elements can be linked to the evolution of retrotransposons and retroviruses through their integrase and transposase genes. Integrases and transposases catalyze very similar reactions (Figure 14-20). The most widespread class of both types of enzymes uses a set of three active-site amino acids, two aspartates and one glutamate, to promote the hydrolytic cleavage and transesterification of phosphodiester bonds. These three residues (D, D, and E) are not adjacent in the primary sequence of these enzymes, but they come together in the active site and constitute a well-recognized motif called the DDE motif (Figure 14-21). The close relationship between these enzymes, regardless of source, can have practical benefits. For example, the transposase of the bacterial transposon Tn5 can be used as a model to study the function of the HIV integrase, and even as a rapid test for drugs that might be used to inhibit the HIV enzyme.

Figure 14-20: Transposases and integrases. (a) Transposases promote the nucleophilic attack of the 3′ end of a DNA strand on a phosphodiester bond—a strand transfer reaction. (b) An integrase carries out the same reaction. (Transposases can also use a water molecule as the nucleophile.) The two types of enzyme have similar active sites.
Figure 14-21: The DDE motif. Consensus sequences are shown for the catalytic domain of two families of transposases: the Tn3 family and retroviral integrases. The DDE motif consists of three residues (Asp, Asp, and Glu, shown in red) that are generally not adjacent in the primary sequence but come together at the active site when the protein is folded. The motif is found in most transposases and integrases. Alternative residues at any position are shown above or below the line; those shown in blue (up to two at a given position) are residues found at that position in more than 75% of the enzymes of this family. Dashes indicate any amino acid residue. Two peptide segments, with the residue numbers indicated, are omitted here.

Transposons have a long and complex history. Their dispersal may include rare events in which DNA was transferred by some means (such as bacterial conjugation, cellular fusion, viral infection, or accidental DNA uptake) among the cells of different species. When a transposon is introduced into a new species’ genome, there is often a period of many host generations during which the element transposes more or less freely. The number of inserted transposons may increase, with the resulting genomic changes being passed on whenever they do not have a deleterious effect on the host. As time passes, the transposons become subject to silencing processes, including the introduction of mutations in their transposase or integrase genes that inactivate the gene products. Alternatively, the host may find a way to shut down transposition. One common silencing mechanism involves RNA interference (RNAi, a process described in Chapter 22). In brief, the cell produces short RNA molecules that are complementary to the transcripts of the transposase-encoding gene. The RNA hybridizes to the gene transcripts, preventing their translation and effectively blocking the synthesis of an enzyme required for transposition of an entire class of transposons.

509

Linear transmission of transposons from one host generation to the next is predominant, with transfer between species occurring rarely. Thus, many transposon families are found only in certain classes of organisms. In eukaryotes, ongoing genomic sequencing efforts have revealed 12 superfamilies of DNA transposons, including Tc1/mariner (Table 14-1). Many of these superfamilies are found in more than one eukaryotic type. Seven are closely related to transposons found in bacteria, suggesting that they appeared before the divergence of bacteria and eukaryotes.

Figure 14-1: DNA Transposons in Eukaryotes

Sometimes transposons benefit their hosts. As we have seen, the antibiotic-resistance genes encoded by the transposon Tn5 have contributed greatly to the development of bacterial pathogens that are resistant to those antibiotics. In human cells, there are more than 1 million copies of the Alu transposon (a 300 bp SINE element) in the DNA, accounting for nearly 10% of the genome. These elements are so widespread that a typical human gene includes several copies in the introns of its primary transcript. Host cells use these elements as target sites for RNA editing (see Chapter 16). Other transposon genes are appropriated by the host for other purposes. Efforts to trace the evolution of mammalian genes have identified several dozen that are derived from transposons. A dramatic case of transposon domestication—occurring in immunoglobulin formation—is described shortly.

Perhaps more important is the overall impact of transposons on the evolution of the host. Genomic changes promoted by transposons come in many forms. Transposons are set up to bring their ends together in a complex prior to any cleavage event, but this control mechanism can go awry. If transposase subunits form a complex involving two ends derived from different copies of the same transposon, on the same or different chromosomes, large genomic rearrangements can result. Genes may be captured between two transposable elements and moved to different genomic locations. If the genes are duplicated in the process, the new gene copies may evolve and acquire new functions. Transposition is not always precise; the insertion of a transposon into a gene, followed by its later excision, can add or subtract base pairs in the gene and create new alleles. Also, the insertion or excision of transposons at particular genomic sites can alter the expression of genes or sets of genes.

510

The transposons that seem to clutter mammalian genomes have been referred to as “selfish” or “junk” DNA, but these labels are being shed as our understanding broadens. Transposon DNA may play a key role in chromosomal structure and packaging. And far from being dormant, transposon DNA is actively transcribed in at least some cells. As new classes of functional RNAs are being discovered at a rapid pace, the RNAs produced by transposons may prove to have unexpected cellular roles.

A Hybrid Recombination Process Assembles Immunoglobulin Genes

Humans have a complex immune system capable of generating millions of different immunoglobulins (antibody proteins) with distinct binding specificities. But the human genome contains only about 25,000 genes, and just a few hundred of these are immune system genes. Somehow, the millions of different immunoglobulins are generated from these several hundred genes. As B lymphocytes (B cells) differentiate, their immunoglobulin genes recombine so that each cell will express an antibody with a unique binding specificity. Studies of the recombination mechanism reveal a close relationship to DNA transposition and suggest that this system for generating antibody diversity evolved from an ancient cellular invasion by transposons.

Immunoglobulins consist of two heavy and two light polypeptide chains (Figure 14-22 shows the general structure of the IgG class of immunoglobulins). Each chain has two regions: a variable region, with a sequence that differs greatly from one immunoglobulin to another, and a constant region, which is virtually unchanging within a class of immunoglobulins. There are two distinct families of light chains, kappa and lambda, which differ somewhat in the sequence of their constant regions. For all three types of polypeptide chain (heavy chain, and kappa and lambda light chains), diversity in the variable regions is generated by a similar mechanism. The genes for these polypeptides are divided into segments, and the genome contains clusters with multiple versions of each segment. The joining of one version of each gene segment creates a complete gene.

Figure 14-22: Immunoglobulin G (IgG). Pairs of heavy and light chains combine to form a Y-shaped molecule. Two antigen-binding sites are formed by the combination of variable domains from one light and one heavy chain. The light chains include V (variable) and J (joining) segments. The heavy chains have V (variable), D (diverse), and J (joining) segments, brought together by mechanisms similar to those for light chains, as described in the text. For heavy chains in the human genome, there are 44 V segments, 27 D segments, and 6 J segments, with one of each brought together at random in a given immunoglobulin.

Figure 14-23 depicts the organization of the DNA encoding the kappa light chain and shows how a mature kappa light chain is generated. In undifferentiated cells, the coding information for this polypeptide is separated into three segments. The V (variable) segment encodes the first 95 amino acid residues of the variable region, the J (joining) segment encodes the remaining 12 residues of the variable region, and the C (constant) segment encodes the constant region. For kappa light chains, the genome contains ∽300 different V segments, 5 different J segments, and 1 type of C segment.

Figure 14-23: Recombination of the V and J gene segments of the human IgG kappa light chain. This process results in considerable antibody diversity. Shown at the top is the arrangement of IgG-coding sequences in a bone marrow stem cell. Recombination deletes the DNA between specific V and J segments. The RNA transcript is processed by RNA splicing; translation produces the light-chain polypeptide. The light chain can combine with any of several thousand possible heavy chains to produce an antibody molecule.

511

As a stem cell in the bone marrow differentiates to form a mature B cell, one V segment and one J segment are brought together by a specialized recombination system (Figure 14-24). During this programmed DNA deletion, the intervening DNA is discarded. There are about 300 × 5 = 1,500 possible V-J combinations. Additional variation in the sequence at the V-J junction is introduced by imprecision in the recombination reaction, as an enzyme called terminal deoxynucleotide transferase adds a few nucleotides at random to 3′ ends exposed during the recombination process. This increases the overall variation considerably. The final joining of the V-J combination to the C region is accomplished by an RNA-splicing reaction after transcription (see Chapter 16). The assembly of light chains with similarly randomized heavy chains increases diversity still further.

Figure 14-24: Immunoglobulin gene rearrangement. Proteins RAG1 and RAG2 bind to RSS (recombination signal sequences) and cleave one DNA strand between the RSS and the V (or J) segments that are to be joined. The liberated 3′ hydroxyl acts as a nucleophile, attacking a phosphodiester bond in the other strand to create a double-strand break. The resulting hairpins on the V and J segments are cleaved, and the ends are covalently linked by a complex of proteins specialized for end-joining repair of double-strand breaks, as described in Chapter 13. The steps in the generation of the double-strand break, catalyzed by RAG1 and RAG2, are chemically related to steps in transposition reactions.

The recombination mechanism for joining the V and J segments is facilitated by recombination signal sequences (RSS) that lie just downstream of each V segment and just before each J segment. These sequences are bound by proteins called RAG1 and RAG2 (products of the recombination activating gene). The RAG proteins catalyze the formation of a double-strand break between the RSS and the V (or J) segments to be joined. The V and J segments are then joined with the aid of a second complex of proteins.

The genes for the heavy chains and the lambda light chains form by similar processes. Heavy chains have more gene segments than light chains, with thousands of possible combinations. Because any heavy chain can combine with any light chain to generate an immunoglobulin, each human can produce at least 107 possible immunoglobulins. And additional diversity is generated by high mutation rates (of unknown mechanism) in the V segments during B-cell differentiation. Each individual mature B cell produces only one type of antibody, but the range of antibodies produced by the many B cells of an individual organism is clearly enormous.

The mechanism of the DNA recombination events required to generate an expressed immunoglobulin gene suggests that the immune system evolved, in part, from ancient transposons. The mechanism for generation of the double-strand breaks by RAG1 and RAG2 closely resembles several reaction steps in transposition (see Figure 14-24). The double-strand breaks that initiate the process are generated by single protein active sites and feature transient hairpin intermediates at each of the ends to be joined, as in the reaction promoted by the Tn5 transposase (see Figure 14-11). In addition, the deleted DNA, with its terminal RSS, has a sequence structure found in most transposons. In the test tube, RAG1 and RAG2 can associate with this deleted DNA and insert it, transposon like, into other DNA molecules (probably a rare reaction in vivo). In fact, a subtle rearrangement of the RSS, coupled with placement of the genes encoding the RAG proteins between the RSS ends, creates a DNA element that functions exactly like a transposon. The RAG1 protein is closely related in sequence to the transposases encoded by the Transib superfamily of eukaryotic transposons (see Table 14-1). The properties of the immunoglobulin gene rearrangement system point to an intriguing origin, in which the distinction between host and parasite has become blurred by evolution.

SECTION 14.3 SUMMARY

  • Transposons, retrotransposons, and retroviruses have a shared evolutionary history, evident in the phylogenies of the key enzymes—reverse transcriptases, transposases, and integrases—that promote these processes.

    512

  • Important elements of the vertebrate immune system, the enzymes that promote immunoglobulin gene rearrangements and thus immunoglobulin diversity, evolved from the transposase/integrase family of enzymes. RAG1 is related to the transposases of the Transib transposons.

UNANSWERED QUESTIONS

For any organism, the information required for creating a new generation is passed on through its DNA. Stable transmission is needed, yet genomes are surprisingly dynamic. Recombination processes contribute to repair and facilitate key steps of replication and cell division. A hidden world of transposons makes a home in each genome, replicating passively yet contributing to evolution in important ways. This dynamic genome still holds some secrets to unlock.

  1. What is the origin of reverse transcriptase? How does it relate to the origin of retroviruses? What is its impact on genome development and diversity? For researchers interested in the origin of life, reverse transcriptase is potentially a very old enzyme that played a key role in the transition from RNA- to DNA-based life.

  2. Why do the types of transposons vary so much from one class of organism to another? Each class of transposon present in a given genome represents an invasion that occurred sometime in the lineage of that organism. The study of genomic transposons may provide a rich harvest of information about the evolutionary past of all organisms.

  3. How many proteins and other factors are involved in controlling transposition? Exploration of the elaborate interface between transposons and their hosts is only just beginning. The processes that silence a transposon often involve genes found in both the transposon and the host. The extent of host gene involvement has not been fully explored in most cases; functional RNA molecules may do part of the work. Similarly, elaborate processes that prevent integration of transposons into other transposons are only partially understood. AIDS is, as yet, an almost intractable disease, in part because of the capacity of HIV to integrate into a host genome and remain there, replicating passively. A permanent cure for HIV cannot occur as long as these silent HIV genomes provide a potential source of new infection. A better understanding of how this integration is regulated may eventually lead to genomic clean-up therapies to eliminate or permanently inactivate these pathogens. That understanding must come from work on a wide range of viruses and transposons to fully sample the variety of mechanisms they use, as well as to unearth host functions that play subtle roles.

  4. What do retroviruses and transposons contribute to their hosts? The evolutionary history of these pathogens is clearly not entirely shaped by their own requirements. Obvious contributions to host survival have already been described, but the sheer bulk of transposon DNA in the human genome inspires new questions about function. How do all these repeated transposon sequences affect the structure and function of chromosomes? New reports suggest that much of the genomic DNA previously labeled as junk is in fact transcribed. What are all these RNA molecules doing? For example, a newly discovered class of RNAs called piwi RNAs (piRNAs) are abundant in germ-line cells (especially during spermatogenesis). They play a role in the silencing of transposon genes, but their origin and detailed function are still a mystery.

513

HOW WE KNOW: Bacteriophage λ Provided the First Example of Site-Specific Recombination

Echols, H. 2001. Operators and Promoters: The Story of Molecular Biology and Its Creators. Berkeley: University of California Press.

Gottesman, M.E., and R.A. Weisberg. 2004. Little lambda, who made thee? Microbiol. Mol. Biol. Rev. 68:796–813.

Nash, H.A. 1975. Integrative recombination of bacteriophage lambda DNA in vitro. Proc. Natl. Acad. Sci. USA 72:1072–1076.

Howard Nash, 1937–2011

Since the 1950s, scientists have known that the DNA of bacteriophage λ (λ phage) is linked to its bacterial host chromosome at a specific chromosomal location. The correct explanation for how the λ DNA enters the chromosome appeared in 1962, before anyone knew that the linear λ DNA is circularized on entering a bacterial cell. Allan Campbell, then at the University of Rochester, had the novel insight that circularization, followed by recombination into the host chromosome, could explain many observations associated with λ lysogeny. Clearly, a uniquely precise recombination process was at work, one that used defined DNA sequences.

A molecular understanding of this, as yet unprecedented, reaction mechanism required an in vitro system—which came in a breakthrough reported separately by Howard Nash (integration) and by Max Gottesman and Susan Gottesman (excision) in 1975. The researchers were working in competing laboratories just a few buildings apart at the National Institutes of Health in Bethesda, Maryland. The Nash system was the more successful of the two, rapidly leading to a detailed biochemical definition of the λ integration reaction and its components.

In his in vitro system, Nash constructed an altered bacteriophage λ chromosome that included both recombination sites, by then defined and named attB and attP (B for bacterium and P for phage), separated by about 15% of the chromosome’s length (Figure 1). As a source of the required enzymes, Nash used a concentrated extract derived from cells in which λ proteins were being produced. He then showed that integrative recombination between the two recombination sites would occur in cells to produce phage chromosomes 15% smaller than normal.

FIGURE 1 (a) The structure of the altered bacteriophage λ substrate developed by Nash (λ attB-attP), and the reaction promoted by the recombination system. A recombination reaction between the attP and attB sites produces two slightly different sites called attL and attR (products λ attL and λ attR). (b) The number of recombinant λ phages produced (as percentage of total) as a function of incubation time with the enzyme extract.

Phage with the shortened chromosome had the useful property that they were infectious in the presence of metal-chelating agents (molecules that bind to and sequester metals, effectively making them unavailable), whereas phage with the larger chromosome were not. After an in vitro reaction, phage introduced into bacterial cells and plated on agar containing a metal chelator started an infection cycle only after successful recombination. The infection created plaques (clear spots of killed cells in the bacterial lawn) that could be counted. After years of optimizing his system, Nash reported the first in vitro site-specific recombination system in 1975 (the Gottesmans’ report appeared three months later).

A successful in vitro system is a powerful thing in molecular biology. Nash, soon joined at the NIH by Kiyoshi Mizuuchi, used his system to purify the λ Int protein, discover a required host protein (IHF, for integration host factor), and define the reaction requirements. Following further work in other labs, notably that of Art Landy at Brown University, the λ integration system gradually yielded its secrets and stimulated the search for other site-specific recombination systems, such as those described in this chapter.

514

If You Leave out the Polyvinyl Alcohol, Transposition Gets Stuck

Craigie, R., and K. Mizuuchi. 1985. Mechanism of transposition of bacteriophage Mu: Structure of a transposition intermediate. Cell 41:867–876.

Kiyoshi Mizuuchi

Understanding how a process works often starts with focusing on the reactions that are most efficient and easiest to detect and study—which is why bacteriophage Mu was chosen as a model system for studying transposition. Although Mu is a complicated transposon, its status as a preferred research subject was based on its capacity to transpose often. When the transposon moves, it replicates itself, leaving behind a copy at the original chromosomal site and depositing a new one in the target. A few base pairs of chromosomal DNA in the target are also replicated, creating a short repeated sequence at both ends of the insertion site. The entire process seemed a little bit like magic.

In 1979, James Shapiro, at the University of Chicago, proposed a mechanism for Mu transposition that laid out the main features of the process that we now know to occur (Figure 2). It involved nicking DNA strands to expose both 3′ ends of the transposon, then making a staggered break in the target DNA, leaving 5′ overhangs on the resulting ends. The transposon 3′ ends were then joined to the target 5′ ends. The remaining 3′ ends of the target would prime replication, creating two copies of the transposon and a cointegrate intermediate. This intermediate could be resolved by homologous or site-specific recombination to yield the final products. Other researchers proposed alternatives, but Shapiro’s model eventually proved to be largely correct, with the key exception that the direct attack of transposon 3′ ends on target phosphodiester bonds (the transposase-catalyzed strand transfer) was not yet known, or predicted. The key was to find the postulated reaction intermediates.

FIGURE 2 The Shapiro model for transposition.

Kiyoshi Mizuuchi, working with his associate Bob Craigie, found the intermediates. In the early 1980s, they developed an in vitro system that supported Mu transposition, using a plasmid that included a much-abbreviated copy of Mu with both ends of the transposon (the binding sites for the Mu transposase). The Mu proteins required (MuA and MuB), as well as the E. coli host proteins, were made available as extracts from cells in which they were being expressed. The target DNA was another circular DNA, derived from bacteriophage ϕX174 (a phage that has no transposition properties). With this system, the researchers could detect both the cointegrate and the simple insertion products where a new copy of Mu was integrated in the target DNA. Although the results supported Shapiro’s model, the researchers did not detect the predicted strand-transfer intermediates before replication. Either these intermediates did not exist or they were converted to the final products of transposition too quickly for detection.

515

There are two major steps in the process postulated by Shapiro: cleaving DNA and rejoining the ends to new partners, followed by replication (see Figure 2). We now know that the initial DNA cleavage and strand-transfer events need MuA and MuB proteins. The replication steps are more complex, requiring multiple proteins from the host cell. Completing the entire reaction required a concentrated cell extract and addition of the polymer polyvinyl alcohol. The polymer acted as a solvent-exclusion agent, further concentrating the reaction components. In controls, Craigie and Mizuuchi left out different reaction components to demonstrate that each was needed. When they left out polyvinyl alcohol, the reaction did not generate products, but a prominent new DNA band appeared on the agarose gels that they were using to analyze the reaction. Craigie and Mizuuchi knew how to capitalize on this bit of serendipity. Analysis of this new DNA species soon revealed that it was a strand-transfer intermediate, essentially equivalent to the intermediate created by end-joining as proposed by Shapiro.

What had happened in the polyvinyl alcohol control? The absence of the polymer led to a partial reaction in which strand transfer became a dead end, with the resulting transposition intermediate building up to concentrations that made it much easier to detect and study. To confirm that this species was indeed a normal reaction intermediate (and not simply produced by unusual reaction conditions), Craigie and Mizuuchi isolated the putative intermediate DNA species, added back cell extract without MuA and MuB but with replication enzymes now aided by polyvinyl alcohol, and showed that the predicted transposition products were generated. The result was a definitive study establishing key facts about the pathway of Mu transposition. More broadly, the study played a major role in developing our current understanding of replicational transposition mechanisms.