Simple and Complex Transcription Units Are Found in Eukaryotic Genomes

The cluster of genes that forms a bacterial operon constitutes a single transcription unit, which is transcribed from a specific promoter in the DNA sequence to a termination site, producing a single primary transcript. In other words, genes and transcription units are often distinguishable in prokaryotes, since a single transcription unit contains several genes when they are part of an operon. In contrast, most eukaryotic genes are expressed from separate transcription units, so that each mRNA is translated into a single protein.

Eukaryotic transcription units, however, are classified into two types, depending on the fate of the primary transcript. The primary transcript produced from a simple transcription unit, such as the one encoding β-globin (see Figure 5-15), is processed to yield a single type of mRNA, encoding a single protein. Mutations in exons, introns, and transcription-control regions may all influence expression of the protein encoded by a simple transcription unit (Figure 8-3a). In humans, simple transcription units such as the one encoding β-globin are rare. Approximately 90 percent of human transcription units are complex. In these cases, the primary RNA transcript can be processed in more than one way, leading to formation of mRNAs containing different exons. Each alternative mRNA, however, is monocistronic: it is translated into a single polypeptide, with translation usually initiating at the first AUG in the mRNA.

image
FIGURE 8-3 Simple and complex eukaryotic transcription units. (a) A simple transcription unit includes a region that encodes one protein, extending from the 5′ cap site to the 3′ poly(A) site, and associated control regions. Introns lie between exons (light blue rectangles) and are removed during processing of the primary transcripts (dashed red lines); thus they do not occur in the functional monocistronic mRNA. Mutations in a transcription-control region (a, b) may reduce or prevent transcription, thus reducing or eliminating synthesis of the encoded protein. A mutation within an exon (c) may result in an abnormal protein with diminished activity. A mutation within an intron (d) that introduces a new splice site results in an abnormally spliced mRNA encoding a nonfunctional protein. (b) Complex transcription units produce primary transcripts that can be processed in alternative ways. (Top) If a primary transcript contains alternative splice sites, it can be processed into mRNAs with the same 5′ and 3′ exons but different internal exons. (Middle) If a primary transcript has two poly(A) sites, it can be processed into mRNAs with alternative 3′ exons. (Bottom) If alternative promoters (f or g) are active in different cell types, mRNA1, produced in a cell type in which f is activated, has a different first exon (1A) than mRNA2, which is produced in a cell type in which g is activated (and in which exon 1B is used). Mutations in control regions (a and b) and in regions within exons shared by the alternative mRNAs (designated c) affect the proteins encoded by both alternatively processed mRNAs. In contrast, mutations (designated d and e) within exons unique to one of the alternatively processed mRNAs affect only the protein translated from that mRNA. For genes that are transcribed from different promoters in different cell types (bottom), mutations in different control regions (f and g) affect expression only in the cell type in which that control region is active.

304

Multiple mRNAs can arise from a primary transcript in three ways, as shown in Figure 8-3b. Examples of all three types of alternative RNA processing occur in the genes that regulate sexual differentiation in Drosophila (see Figure 10-18). Commonly, one mRNA is produced from a complex transcription unit in some cell types, and a different mRNA is made in other cell types. For example, alternative splicing of the primary fibronectin transcript in fibroblasts and hepatocytes determines whether or not the secreted protein includes domains that adhere to cell surfaces (see Figure 5-16). The phenomenon of alternative splicing greatly expands the number of proteins encoded in the genomes of higher organisms. It is estimated that about 90 percent of human genes are contained within complex transcription units that give rise to alternatively spliced mRNAs encoding proteins with distinct functions, as for the fibroblast and hepatocyte forms of fibronectin.

The relationship between a mutation and a gene is not always straightforward when it comes to complex transcription units. A mutation in the control region or in an exon shared by alternatively spliced mRNAs will affect all the alternative proteins encoded by a given complex transcription unit. On the other hand, a mutation in an exon present in only one of the alternative mRNAs will affect only the protein encoded by that mRNA. As explained in Chapter 6, genetic complementation tests are commonly used to determine if two mutations are in the same or different genes (see Figure 6-7). However, in the complex transcription unit shown in Figure 8-3b (middle), mutations d and e would complement each other in a genetic complementation test, even though they occur in the same gene, because a chromosome with mutation d can express a normal protein encoded by mRNA2 and a chromosome with mutation e can express a normal protein encoded by mRNA1. Both mRNAs produced from this gene would be present in a diploid cell carrying both mutations, generating both protein products and hence a wild-type phenotype. However, a chromosome with mutation c in an exon common to both mRNAs would not complement either mutation d or mutation e. In other words, mutation c would be in the same complementation groups as mutations d and e, even though d and e themselves would not be in the same complementation group! Given these complications with the genetic definition of a gene, the genomic definition outlined at the beginning of this section is commonly used. In the case of protein-coding genes, a gene is the DNA sequence transcribed into a pre-mRNA precursor, equivalent to a transcription unit, plus any other regulatory elements required for synthesis of the primary transcript. The various proteins encoded by the alternatively spliced mRNAs expressed from one gene are called isoforms.

305