4.7 Most Eukaryotic Genes Are Mosaics of Introns and Exons
Figure 4.37: Structure of the β-globin gene.
In bacteria, polypeptide chains are encoded by a continuous array of triplet codons in DNA. For many years, genes in higher organisms were assumed to be organized in the same manner. This view was unexpectedly shattered in 1977, when investigators discovered that most eukaryotic genes are discontinuous. The mosaic nature of eukaryotic genes was revealed by electron microscopic studies of hybrids formed between mRNA and a segment of DNA containing the corresponding gene (Figure 4.36). For example, the gene for the β chain of hemoglobin is interrupted within its amino acid-coding sequence by a long stretch of of 550 non-coding base pairs and a short one of 120 base pairs. Thus, the β-globin gene is split into three coding sequences (Figure 4.37). Non-coding regions are called introns (for intervening sequences), whereas coding regions are called exons (for expressed sequences). The average human gene has 8 introns, and some have more than 100. The size ranges from 50 to 10,000 nucleotides.
Figure 4.36: Detection of introns by electron microscopy. An mRNA molecule (shown in red) is hybridized to genomic DNA containing the corresponding gene. (A) A single loop of single-stranded DNA (shown in blue) is seen if the gene is continuous. (B) Two loops of single-stranded DNA (blue) and a loop of double-stranded DNA (blue and green) are seen if the gene contains an intron. Additional loops are evident if more than one intron is present.
RNA processing generates mature RNA
Figure 4.38: Transcription and processing of the β-globin gene. The gene is transcribed to yield the primary transcript, which is modified by cap and poly(A) addition. The introns in the primary RNA transcript are removed to form the mRNA.
At what stage in gene expression are introns removed? Newly synthesized RNA molecules (pre-mRNA or primary transcript) isolated from nuclei are much larger than the mRNA molecules derived from them; in regard to β-globin RNA, the former consists of approximately 1600 nucleotides and the latter approximately 900 nucleotides. In fact, the primary transcript of the β-globin gene contains two regions that are not present in the mRNA. These regions in primary transcript are excised, and the coding sequences are simultaneously linked by a precise splicing complex to form the mature mRNA (Figure 4.38). A common feature in the expression of discontinuous, or split, genes is that their exons are ordered in the same sequence in mRNA as in DNA. Thus, the codons in split genes, like continuous genes, are in the same linear order as the amino acids in the polypeptide products.
Splicing is a complex operation that is carried out by spliceosomes, which are assemblies of proteins and small RNA molecules (snRNA). RNA plays the catalytic role (Section 29.3). Spliceosomes recognize signals in the nascent RNA that specify the splice sites. Introns nearly always begin with GU and end with an AG that is preceded by a pyrimidine-rich tract (Figure 4.39). This consensus sequence is part of the signal for splicing.
Figure 4.39: Consensus sequence for the splicing of mRNA precursors.
Many exons encode protein domains
Most genes of higher eukaryotes, such as birds and mammals, are split. Lower eukaryotes, such as yeast, have a much higher proportion of continuous genes. In prokaryotes, split genes are extremely rare. Have introns been inserted into genes in the evolution of higher organisms? Or have introns been removed from genes to form the streamlined genomes of prokaryotes and simple eukaryotes? Comparisons of the DNA sequences of genes encoding evolutionarily conserved proteins suggest that introns were present in ancestral genes and were lost in the evolution of organisms that have become optimized for very rapid growth, such as prokaryotes. The positions of introns in some genes are at least 1 billion years old. Furthermore, a common mechanism of splicing developed before the divergence of fungi, plants, and vertebrates, as shown by the finding that mammalian cell extracts can splice yeast RNA.
What advantages might split genes confer? Many exons encode discrete structural and functional domains of proteins. An attractive hypothesis is that new proteins arose in evolution by the rearrangement of exons encoding discrete structural elements, binding sites, and catalytic sites, a process called exon shuffling. Because it preserves functional units but allows them to interact in new ways, exon shuffling is a rapid and efficient means of generating novel genes (Figure 4.40). Figure 4.41 shows the composition of a gene that was formed in part by exon shuffling. DNA can break and recombine in introns with no deleterious effect on encoded proteins. In contrast, the exchange of sequences within different exons usually leads to loss of function.
Figure 4.40: Exon shuffling. Exons can be readily shuffled by recombination of DNA to expand the genetic repertoire.
Figure 4.41: The tissue plasminogen activator (TPA) gene was generated by exon shuffling. The gene for TPA encodes an enzyme that functions in hemostasis (Section 10.4). This gene consists of 4 exons, one (F) derived from the fibronectin gene which encodes an extracellular matrix protein, one from the epidermal growth factor gene (EGF), and two from the plasminogen gene (K, Section 10.4), the substrate of the TPA protein. The K domain appears to have arrived by exon shuffling and then been duplicated to generate the TPA gene that exists today.
[Information from: www.ehu.es/ehusfera/genetica/2012/10/02/demostracion-molecular-de-microevolucion/]
Another advantage of split genes is the potential for generating a series of related proteins by alternative splicing of the primary transcript. For example, a precursor of an antibody-producing cell forms an antibody that is anchored in the cell’s plasma membrane (Figure 4.42). The attached antibody recognizes a specific foreign antigen, an event that leads to cell differentiation and proliferation. The activated antibody-producing cells then splice their primary transcript in an alternative manner to form soluble antibody molecules that are secreted rather than retained on the cell surface. Alternative splicing is a facile means of forming a set of proteins that are variations of a basic motif without requiring a gene for each protein. Because of alternative splicing, the proteome is more diverse than the genome in eukaryotes.
Figure 4.42: Alternative splicing. Alternative splicing generates mRNAs that are templates for different forms of a protein: (A) a membrane-bound antibody on the surface of a lymphocyte and (B) its soluble counterpart, exported from the cell. The membrane-bound antibody is anchored to the plasma membrane by a helical segment (highlighted in yellow) that is encoded by its own exon.