Non-LTR Retrotransposons Transpose by a Distinct Mechanism

The most abundant mobile elements in mammals are retrotransposons that lack LTRs, sometimes called nonviral retrotransposons. These moderately repeated DNA sequences form two classes in mammalian genomes: long interspersed elements (LINEs) and short interspersed elements (SINEs). In humans, full-length LINEs are about 6 kb long, and SINEs are about 300 bp long (see Table 8-1). Repeated sequences with the characteristics of LINEs have been observed in protozoans, insects, and plants, but for unknown reasons, they are particularly abundant in the genomes of mammals. SINEs too are found primarily in mammalian DNA. Large numbers of LINEs and SINEs in higher eukaryotes have accumulated over evolutionary time by repeated copying of sequences at a few positions in the genome and insertion of the copies into new positions.

319

LINEs Human DNA contains three major families of LINEs that are similar in their mechanism of transposition but differ in their sequences: L1, L2, and L3. Only members of the L1 family transpose in the contemporary human genome; apparently there are no remaining functional copies of L2 or L3. LINE sequences are present at roughly 900,000 sites in the human genome, accounting for a staggering 21 percent of total human DNA. The general structure of a complete LINE is diagrammed in Figure 8-16. LINEs are usually flanked by short direct repeats, the hallmark of mobile elements, and contain two long open reading frames (ORFs, which are protein-coding regions; see Section 8.4). ORF1, about 1 kb long, encodes an RNA-binding protein. ORF2, about 4 kb long, encodes a protein that has a long region of homology with the reverse transcriptases of retroviruses and LTR retrotransposons, but also exhibits DNA endonuclease activity.

image
FIGURE 8-16 General structure of a LINE. The length of the target-site direct repeats varies among the copies of a LINE at different sites in the genome. Although the full-length L1 element is about 6 kb long, variable amounts of the left end are absent at over 90 percent of the sites where this mobile element is found. The shorter open reading frame (ORF1), about 1 kb in length, encodes an RNA-binding protein. The longer ORF2, about 4 kb in length, encodes a bifunctional protein with reverse transcriptase and DNA endonuclease activity. Note that LINEs lack the long terminal repeats found in LTR retrotransposons.

Evidence for the mobility of L1 elements first came from analysis of DNA cloned from patients with certain genetic diseases such as hemophilia and myotonic dystrophy. DNA from these patients was found to carry mutations resulting from insertion of an L1 element into a gene, whereas no such element occurred within that gene in either parent. About 1 in 600 mutations that cause significant disease in humans are due to L1 transpositions or SINE transpositions that are catalyzed by L1-encoded proteins. Later experiments similar to those just described with yeast Ty elements (see Figure 8-15) confirmed that L1 elements transpose through an RNA intermediate. In these experiments, an intron was introduced into a cloned mouse L1 element, and the recombinant L1 element was stably transfected into cultured hamster cells. After several cell doublings, a DNA fragment corresponding to the L1 element but lacking the inserted intron was detected in the cells. This finding strongly suggests that, over time, the recombinant L1 element containing the inserted intron had transposed to new sites in the hamster genome through an RNA intermediate that underwent RNA splicing to remove the intron.

Since LINEs do not contain LTRs, their mechanism of transposition through an RNA intermediate differs from that of LTR retrotransposons. The proteins encoded by ORF1 and ORF2 are translated from a LINE RNA. In vitro studies indicate that transcription by RNA polymerase is directed by promoter sequences at the left end of integrated LINE DNA. LINE RNA is polyadenylated by the same post-transcriptional mechanism that polyadenylates other mRNAs. The LINE RNA is then exported into the cytosol, where it is translated into ORF1 and ORF2 proteins. Multiple copies of ORF1 protein then bind to the LINE RNA, and ORF2 protein binds to the poly(A) tail. The LINE RNA is then transported back into the nucleus as a complex with ORF1and ORF2 proteins, where it is reverse-transcribed into LINE DNA by ORF2. The mechanism involves staggered cleavage of cellular DNA at the insertion site, followed by priming of reverse transcription by the resulting cleaved cellular DNA, as detailed in Figure 8-17. The complete process results in insertion of a copy of the original LINE retrotransposon into a new site in chromosomal DNA. A short direct repeat is generated at the insertion site because of the initial staggered cleavage of the two chromosomal DNA strands.

image
FIGURE 8-17 Proposed mechanism of LINE reverse transcription and integration. Only ORF2 protein is represented here. Newly synthesized LINE DNA is shown in black. ORF1 and ORF2 proteins, produced by translation of LINE RNA in the cytoplasm, bind to LINE RNA and transport it into the nucleus. Step 1: In the nucleus, ORF2 makes staggered cuts in AT-rich target-site DNA, generating the DNA 3′-OH ends indicated by blue arrowheads. Step 2: The 3′ end of the T-rich DNA strand hybridizes to the poly(A) tail of the LINE RNA and primes DNA synthesis by ORF2. Step 3: ORF2 extends the DNA strand using the LINE RNA as a template. Steps 4 and 5: When synthesis of the LINE DNA bottom strand reaches the 5′ end of the LINE RNA template, ORF2 extends the newly synthesized LINE DNA using as a template the top-strand cellular DNA generated by the initial ORF2 staggered cleavage. Step 6: A cellular DNA polymerase extends the 3′ end of the top strand generated by the initial ORF2 staggered cut, using the newly synthesized bottom-strand LINE DNA as a template. The LINE RNA is digested as the DNA polymerase extends the upper-strand DNA, just as occurs during removal of lagging-strand primer RNA during cellular DNA synthesis (see Figure 5-29). The 3′ ends of the newly synthesized DNA strands are ligated to the 5′ ends of the cellular DNA strands as in lagging-strand cellular DNA synthesis. See D. D. Luan et al., 1993, Cell 72:595.

As noted already, the DNA form of an LTR retrotransposon is synthesized from its RNA form in the cytosol using a cellular tRNA as a primer for reverse transcription of the first strand of DNA (see Figure 8-14). The resulting double-stranded DNA with long terminal repeats is then transported into the nucleus, where it is integrated into chromosomal DNA by a retrotransposon-encoded integrase. In contrast, the DNA form of a non-LTR retrotransposon is synthesized in the nucleus. The synthesis of the first strand of the non-LTR retroviral DNA by ORF2, a reverse transcriptase, is primed by the 3′ end of cleaved chromosomal DNA, which base-pairs with the poly(A) tail of the non-LTR RNA (see Figure 8-17, step 1). Since its synthesis is primed by the cut end of a cleaved chromosome, and since synthesis of the other strand of the non-LTR retrotransposon DNA is primed by the 3′ end of chromosomal DNA on the other side of the initial cut (step 6), the mechanism of synthesis results in integration of the non-LTR retrotransposon DNA. There is no need for an integrase to insert the non-LTR retrotransposon DNA. Because its synthesis begins with reverse transcription of a poly(A) tail on the LINE RNA, one end of a non-LTR retrotransposon is AT-rich.

The vast majority of LINEs in the human genome are truncated at their 5′ end, suggesting that reverse transcription was terminated before completion and that the resulting fragments, extending variable distances from the poly(A) tail, were inserted. Because of this shortening, the average size of LINE elements is only about 900 bp, even though the full-length sequence is about 6 kb long. Truncated LINE elements, once formed, probably are not further transposed because they lack a promoter for formation of the RNA intermediate. In addition to the fact that most L1 insertions are truncated, nearly all the full-length elements contain stop codons and frameshift mutations in ORF1 and ORF2; these mutations have probably accumulated in most LINE sequences over evolutionary time. As a result, only about 0.01 percent of the LINE sequences in the human genome, or about 60 in total number, are full-length, with intact open reading frames for ORF1 and ORF2.

320

SINEs The most abundant class of mobile elements in the human genome, SINEs constitute about 13 percent of total human DNA. Varying in length from about 100 to 400 base pairs, these retrotransposons do not encode protein, but most contain a 3′ AT-rich sequence similar to that in LINEs. SINEs are transcribed by the same nuclear RNA polymerase that transcribes genes encoding tRNAs, 5S rRNAs, and other small stable RNAs. Most likely, the ORF1 and ORF2 proteins expressed from full-length LINEs mediate reverse transcription and integration of SINEs by the mechanism depicted in Figure 8-17. Consequently, SINEs can be viewed as parasites of the LINE symbionts, competing with LINE RNAs for binding, reverse transcription, and integration by LINE-encoded ORF1 and ORF2.

SINEs occur at about 1.6 million sites in the human genome. Of these, about 1.1 million are Alu elements, so named because most of them contain a single recognition site for the restriction enzyme AluI. Alu elements exhibit considerable sequence homology with, and probably evolved from, 7SL RNA, a cytosolic RNA in a ribonucleoprotein complex called the signal recognition particle. This abundant cytosolic ribonucleoprotein particle aids in targeting certain polypeptides to the membranes of the endoplasmic reticulum (see Chapter 13). Alu elements are scattered throughout the human genome at sites where their insertion has not disrupted gene expression: between genes, within introns, and in the 3′ untranslated regions of some mRNAs. For instance, nine Alu elements are located within the human β-globin gene cluster (see Figure 8-4a). Of the new germ-line non-LTR retrotranspositions that are estimated to occur about once in every eight individuals, about 40 percent involve L1 elements and 60 percent involve SINEs, of which about 90 percent are Alu elements.

321

Like other mobile elements, most SINEs have accumulated mutations from the time of their insertion in the germ line of an ancient ancestor of modern humans. Like LINEs, many SINEs are truncated at their 5′ ends.