15.4 The Dynamic Genome: More Transposable Elements Than Ever Imagined

Transposable elements were first discovered with the use of genetic approaches. In these studies, the elements made their presence known when they transposed into a gene or were sites of chromosome breakage or rearrangement. After the DNA of transposable elements was isolated from unstable mutations, scientists could use that DNA as molecular probes to determine if there were more related copies in the genome. In all cases, at least several copies—and in some cases as many as several hundred copies—of the element were always present in the genome.

Scientists wondered about the prevalence of transposable elements in genome s. Were there other transposable elements in the genome that remained unknown because they had not caused a mutation that could be studied in the laboratory? Were there transposable elements in the vast majority of organisms that were not amenable to genetic analysis? Asked another way, do organisms without mutations induced by transposable elements nonetheless have transposable elements in their genomes? These questions are reminiscent of the philosophical thought experiment, If a tree falls in the forest and no one is around to hear it, does it make a sound?

567

Large genomes are largely transposable elements

Long before the advent of DNA-sequencing projects, scientists using a variety of biochemical techniques discovered that DNA content (called C-value) varied dramatically in eukaryotes and did not correlate with biological complexity. For example, the genomes of salamanders are 20 times as large as the human genome, whereas the genome of barley is more than 10 times as large as the genome of rice, a related grass. The lack of correlation between genome size and the biological complexity of an organism is known as the C-value paradox.

Barley and rice are both cereal grasses, and, as such, their gene content should be similar. However, if genes are a relatively constant component of the genomes of multicellular organisms, what is responsible for the additional DNA in the larger genomes? On the basis of the results of additional experiments, scientists were able to determine that DNA sequences that are repeated thousands, even hundreds of thousands, of times make up a large fraction of eukaryotic genomes and that some genomes contain much more repetitive DNA than others.

Thanks to many recent projects to sequence the genomes of a wide variety of organisms (including Drosophila, humans, the mouse, Arabidopsis, and rice), we now know that there are many classes of repetitive sequences in the genomes of higher organisms and that some are similar to the DNA transposons and retrotransposons shown to be responsible for mutations in plants, yeast, and insects. Most remarkably, these sequences make up most of the DNA in the genomes of multicellular eukaryotes.

Rather than correlating with gene content, genome size frequently correlates with the amount of DNA in the genome that is derived from transposable elements. Organisms with big genomes have lots of sequences that resemble transposable elements, whereas organisms with small genomes have many fewer. Two examples, one from the human genome and the other from a comparison of the grass genomes, illustrate this point. The structural features of the transposable elements that are found in human genomes are summarized in Figure 15-21 and will be referred to in the next section.

KEY CONCEPT

The C-value paradox is the lack of correlation between genome size and biological complexity. Genes make up only a small proportion of the genomes of multicellular organisms. Genome size usually corresponds to the amount of transposable-element sequences rather than to gene content.
Figure 15-21: Types of transposable elements in the human genome
Figure 15-21: Several general classes of transposable elements are found in the human genome.
[Data from Nature 409, 880 (15 February 2001), “Initial Sequencing and Analysis of the Human Genome,” The International Human Genome Sequencing Consortium.]

568

Transposable elements in the human genome

Almost half of the human genome is derived from transposable elements. The vast majority of these transposable elements are two types of retrotransposons called long interspersed elements, or LINEs, and short interspersed elements, or SINEs (see Figure 15-21). LINEs move like a retrotransposon with the help of an element-encoded reverse transcriptase but lack some structural features of retrovirus-like elements, including LTRs (see Figure 15-12d). SINEs can be best described as nonautonomous LINEs because they have the structural features of LINEs but do not encode their own reverse transcriptase. Presumably, they are mobilized by reverse transcriptase enzymes that are encoded by LINEs residing in the genome.

The most abundant SINE in humans is called Alu, so named because it contains a target site for the Alu restriction enzyme. The human genome contains more than 1 million whole and partial Alu sequences, scattered between genes and within introns. These Alu sequences make up more than 10 percent of the human genome. The full Alu sequence is about 200 nucleotides long and bears remarkable resemblance to 7SL RNA, an RNA that is part of a complex by which newly synthesized polypeptides are secreted through the endoplasmic reticulum. Presumably, the Alu sequences originated as reverse transcripts of these RNA molecules.

There is about 20 times as much DNA in the human genome derived from transposable elements as there is DNA encoding all human proteins. Figure 15-22 illustrates the number and diversity of transposable elements present in the human genome, using as an example the positions of individual Alus, other SINEs, and LINEs in the vicinity of a typical human gene.

Figure 15-22: Human genes contain many transposable elements
Figure 15-22: Numerous repetitive elements are found in the human gene (HGO) encoding homogentisate 1,2-dioxygenase, the enzyme whose deficiency causes alkaptonuria. The upper row diagrams the positions of the HGO exons. The locations of Alus (blue), other SINEs (purple), and LINEs (yellow) in the HGO sequence are indicated in the lower row.
[Data from B. Granadino, D. Beltrán-Valero de Bernabé, J. M. Fernández-Cañón, M. A. Peñalva, and S. Rodríguez de Córdoba, “The Human Homogentisate 1,2-dioxygenase (HGO) Gene,” Genomics 43, 1997, 115]

The human genome seems to be typical for a multicellular organism in the abundance and distribution of transposable elements. Thus, an obvious question is, How do plants and animals survive and thrive with so many insertions in genes and so much mobile DNA in the genome? First, with regard to gene function, all of the elements shown in Figure 15-22 are inserted into introns. Thus, the mRNA produced by this gene will not include any sequences from transposable elements because they will have been spliced out of the pre-mRNA with the surrounding intron. Presumably, transposable elements insert into both exons and introns, but only the insertions into introns will remain in the population because they are less likely to cause a deleterious mutation. Insertions into exons are said to be subjected to negative selection. Second, humans, as well as all other multicellular organisms, can survive with so much mobile DNA in the genome because the vast majority is inactive and cannot move or increase in copy number. Most transposable-element sequences in a genome are relics that have accumulated inactivating mutations through evolutionary time. Others are still capable of movement but are rendered inactive by host regulatory mechanisms (see Section 15.5). There are, however, a few active LINEs and Alus that have managed to escape host control and have inserted into important genes, causing several human diseases. Three separate insertions of LINEs have disrupted the factor VIII gene, causing hemophilia A. At least 11 Alu insertions into human genes have been shown to cause several diseases, including hemophilia B (in the factor IX gene), neurofibromatosis (in the NF1 gene), and breast cancer (in the BRCA2 gene).

569

The overall frequency of spontaneous mutation due to the insertion of class 2 elements in humans is quite low, accounting for less than 0.2 percent (1 in 500) of all characterized spontaneous mutations. Surprisingly, retrotransposon insertions account for about 10 percent of spontaneous mutations in another mammal, the mouse. The approximately 50-fold increase in this type of mutation in the mouse most likely corresponds to the much higher activity of these elements in the mouse genome than in the human genome.

KEY CONCEPT

Transposable elements compose the largest fraction of the human genome, with LINEs and SINEs being the most abundant. The vast majority of transposable elements are ancient relics that can no longer move or increase their copy number. A few elements remain active, and their movement into genes can cause disease.

The grasses: LTR-retrotransposons thrive in large genomes

As already mentioned, the C-value paradox is the lack of correlation between genome size and biological complexity. How can organisms have very similar gene content but differ dramatically in the size of their genomes? This situation has been investigated in the cereal grasses. Differences in the genome sizes of these grasses have been shown to correlate primarily with the number of one class of elements, the LTR-retrotransposons. The cereal grasses are evolutionary relatives that have arisen from a common ancestor in the past 70 million years. As such, their genomes are still very similar with respect to gene content and organization (called synteny; see Chapter 14), and regions can be compared directly. These comparisons reveal that linked genes in the small rice genome are physically closer together than are the same genes in the larger maize and barley genomes. In the maize and barley genomes, genes are separated by large clusters of retrotransposons (Figure 15-23).

Figure 15-23: Transposable elements in grasses are responsible for differences in genome size
Figure 15-23: The grasses, including barley, rice, sorghum, and maize, diverged from a common ancestor about 70 million years ago. Since that time, the transposable elements have accumulated to different levels in each species. Chromosomes are larger in maize and barley, whose genomes contain large amounts of LTR-retrotransposons. Green in the partial genome at the bottom represents a cluster of transposons, whereas orange represents genes.

Safe havens

The abundance of transposable elements in the genomes of multicellular organisms led some investigators to postulate that successful transposable elements (those that are able to attain very high copy numbers) have evolved mechanisms to prevent harm to their hosts by not inserting into host genes. Instead, successful transposable elements insert into so-called safe havens in the genome. For the grasses, a safe haven for new insertions appears to be into other retrotransposons. Another safe haven is the heterochromatin of centromeres, where there are very few genes but lots of repetitive DNA (see Chapter 12 for more on heterochromatin). Many classes of transposable elements in both plant and animal species tend to insert into the centric heterochromatin.

Safe havens in small genomes: targeted insertions In contrast with the genomes of multicellular eukaryotes, the genome of unicellular yeast is very compact, with closely spaced genes and very few introns. With almost 70 percent of its genome as exons, there is a high probability that new insertions of transposable elements will disrupt a coding sequence. Yet, as we have seen earlier in this chapter, the yeast genome supports a collection of LTR-retrotransposons called Ty elements.

570

How are transposable elements able to spread to new sites in genomes with few safe havens? Investigators have identified hundreds of Ty elements in the sequenced yeast genome and have determined that they are not randomly distributed. Instead, each family of Ty elements inserts into a particular genomic region. For example, the Ty3 family inserts almost exclusively near but not in tRNA genes, at sites where they do not interfere with the production of tRNAs and, presumably, do not harm their hosts. Ty elements have evolved a mechanism that allows them to insert into particular regions of the genome: Ty proteins necessary for integration interact with specific yeast proteins bound to genomic DNA. Ty3 proteins, for example, recognize and bind to subunits of the RNA polymerase complex that have assembled at tRNA promoters (Figure 15-24a).

Figure 15-24: Transposable elements insert into safe havens
Figure 15-24: Some transposable elements are targeted to specific safe havens. (a) The yeast Ty3 retrotransposon inserts into the promoter region of transfer RNA genes. (b) The Drosophila R1 and R2 non-LTR-retrotransposons (LINEs) insert into the genes encoding ribosomal RNA that are found in long tandem arrays on the chromosome. Only the reverse transcriptase (RT) genes of R1 and R2 are noted.

The ability of some transposons to insert preferentially into certain sequences or genomic regions is called targeting. A remarkable example of targeting is illustrated by the R1 and R2 elements of arthropods, including Drosophila. R1 and R2 are LINEs (see Figure 15-21) that insert only into the genes that produce ribosomal RNA. In arthropods, several hundred rRNA genes are organized in tandem arrays (Figure 15-24b). With so many genes encoding the same product, the host tolerates insertion into a subset. However, too many insertions of R1 and R2 have been shown to decrease insect viability, presumably by interfering with ribosome assembly.

Gene therapy revisited This chapter began with a description of a recessive genetic disorder called SCID (severe combined immunodeficiency disease). The immune systems of persons afflicted with SCID are severely compromised owing to a mutation in a gene encoding the enzyme adenosine deaminase. To correct this genetic defect, bone-marrow cells from SCID patients were collected and treated with a retrovirus vector containing a good ADA gene. The transformed cells were then infused back into the patients. The immune systems of most of the patients showed significant improvement. However, the therapy had a very serious side effect: two of the patients developed leukemia. In both patients, the retroviral vector had inserted (integrated) near a cellular gene whose aberrant expression is associated with leukemia. A likely scenario is that insertion of the retroviral vector near the cellular gene altered its expression and, indirectly, caused the leukemia.

571

Clearly, the serious risk associated with this form of gene therapy might be greatly improved if doctors were able to control where the retroviral vector integrates into the human genome. We have already seen that there are many similarities between LTR-retrotransposons and retroviruses. It is hoped that, by understanding Ty targeting in yeast, we can learn how to construct retroviral vectors that insert themselves and their transgene cargo into safe havens in the human genome.

KEY CONCEPT

A successful transposable element increases copy number without harming its host. One way in which an element safely increases copy number is to target new insertions into safe havens, regions of the genome where there are few genes.