29.2 Transcription in Eukaryotes Is Highly Regulated

We turn now to transcription in eukaryotes, a much more complex process than in bacteria. Eukaryotic cells have a remarkable ability to regulate precisely the time at which each gene is transcribed and how much RNA is produced. This ability has allowed some eukaryotes to evolve into multicellular organisms, with distinct tissues. That is, multicellular eukaryotes use differential transcriptional regulation to create different cell types. Gene expression is influenced by three important characteristics unique to eukaryotes: the nuclear membrane, complex transcriptional regulation, and RNA processing.

1. The Nuclear Membrane. In eukaryotes, transcription and translation take place in different cellular compartments: transcription takes place in the membrane-bounded nucleus, whereas translation takes place outside the nucleus in the cytoplasm. In bacteria, the two processes are closely coupled (Figure 29.21). Indeed, the translation of bacterial mRNA begins while the transcript is still being synthesized. The spatial and temporal separation of transcription and translation enables eukaryotes to regulate gene expression in much more intricate ways, contributing to the richness of eukaryotic form and function.

872

Figure 29.21: Transcription and translation. These two processes are closely coupled in prokaryotes whereas they are spatially and temporally separate in eukaryotes. (A) In prokaryotes, the primary transcript serves as mRNA and is used immediately as the template for protein synthesis. (B) In eukaryotes, mRNA precursors are processed and spliced in the nucleus before being transported to the cytoplasm for translation into protein.
[Information from J. Darnell, H. Lodish, and D. Baltimore. Molecular Cell Biology, 2d ed. (Scientific American Books, 1990), p. 230.]

2. Complex Transcriptional Regulation. Like bacteria, eukaryotes rely on conserved sequences in DNA to regulate the initiation of transcription. But bacteria have only three promoter elements (the −10, −35, and UP elements), whereas eukaryotes use a variety of types of promoter elements, each identified by its own conserved sequence. Not all possible types will be present together in the same promoter. In eukaryotes, elements that regulate transcription can be found at a variety of locations in DNA, upstream or downstream of the start site and sometimes at distances much farther from the start site than in prokaryotes. For example, enhancer elements located on DNA far from the start site increase the promoter activity of specific genes.

3. RNA Processing. Although both bacteria and eukaryotes modify RNA, eukaryotes very extensively process nascent RNA destined to become mRNA. This processing includes modifications to both ends and, most significantly, splicing out segments of the primary transcript. RNA processing is described in Section 29.3.

Three types of RNA polymerase synthesize RNA in eukaryotic cells

In bacteria, RNA is synthesized by a single kind of polymerase. In contrast, the nucleus of a typical eukaryotic cell contains three types of RNA polymerase differing in template specificity and location in the nucleus (Table 29.2). All these polymerases are large proteins, containing from 8 to 14 subunits and having total molecular masses greater than 500 kDa. RNA polymerase I is located in specialized structures within the nucleus called nucleoli, where it transcribes the tandem array of genes for 18S, 5.8S, and 28S rRNA. The other rRNA molecule (5S rRNA) and all the tRNA molecules are synthesized by RNA polymerase III, which is located in the nucleoplasm rather than in nucleoli. RNA polymerase II, which also is located in the nucleoplasm, synthesizes the precursors of mRNA as well as several small RNA molecules, such as those of the splicing apparatus and many of the precursors to small regulatory RNAs.

Although all eukaryotic RNA polymerases are homologous to one another and to prokaryotic RNA polymerases, RNA polymerase II contains a unique carboxyl-terminal domain on the 220-kDa subunit called the CTD; this domain is unusual because it contains multiple repeats of a YSPTSPS consensus sequence. The activity of RNA polymerase II is regulated by phosphorylation mainly on the serine residues of the CTD.

Another major distinction among the polymerases lies in their responses to the toxin α-amanitin, a cyclic octapeptide that contains several modified amino acids.

Type

Location

Cellular transcripts

Effects of α-amanitin

I

Nucleolus

18S, 5.8S, and 28S rRNA

Insensitive

II

Nucleoplasm

mRNA precursors and snRNA

Strongly inhibited

III

Nucleoplasm

tRNA and 5S rRNA

Inhibited by high concentrations

Table 29.2: Eukaryotic RNA polymerases

873

Amanita phalloides, also called the death cap.
[Jacana/Science Source.]
Figure 29.22: Common eukaryotic promoter elements. Each eukaryotic RNA polymerase recognizes a set of promoter elements—sequences in DNA that promote transcription. The RNA polymerase I promoter consists of a ribosomal initiator (rInr) and an upstream promoter element (UPE). The RNA polymerase II promoter likewise includes an initiator element (Inr) and may also include either a TATA box or a downstream promoter element (DPE). Separate from the promoter region, enhancer elements bind specific transcription factors. RNA polymerase III promoters consist of conserved sequences that lie within the transcribed genes.

α-Amanitin is produced by the poisonous mushroom Amanita phalloides, which is also called the death cap or the destroying angel. More than a hundred deaths result worldwide each year from the ingestion of poisonous mushrooms. α-Amanitin binds very tightly (Kd = 10 nM) to RNA polymerase II and thereby blocks the elongation phase of RNA synthesis. Higher concentrations of α-amanitin (1 μM) inhibit polymerase III, whereas polymerase I is insensitive to this toxin. This pattern of sensitivity is highly conserved throughout the animal and plant kingdoms.

Eukaryotic polymerases also differ from each other in the promoters to which they bind. Eukaryotic genes, like prokaryotic genes, require promoters for transcription initiation. Like prokaryotic promoters, eukaryotic promoters consist of conserved sequences that attract the polymerase to the start site. However, eukaryotic promoters differ distinctly in sequence and position, depending on the type of RNA polymerase to which they bind (Figure 29.22).

1. RNA Polymerase I. The ribosomal DNA (rDNA) transcribed by polymerase I is arranged in several hundred tandem repeats, each containing a copy of each of three rRNA genes. The promoter sequences are located in stretches of DNA separating the genes. At the transcriptional start site lies a TATA-like sequence called the ribosomal initiator element (rInr). Farther upstream, 150 to 200 bp from the start site, is the upstream promoter element (UPE). Both elements aid transcription by binding proteins that recruit RNA polymerase I.

2. RNA Polymerase II. Promoters for RNA polymerase II, like prokaryotic promoters, include a set of consensus sequences that define the start site and recruit the polymerase. However, the promoter can contain any combination of a number of possible consensus sequences. Unique to eukaryotes, they also include enhancer elements that can be very distant (more than 1 kb) from the start site.

3. RNA Polymerase III. Promoters for RNA polymerase III are within the transcribed sequence, downstream of the start site. There are two types of intergenic promoters for RNA polymerase III. Type I promoters, found in the 5S rRNA gene, contain two short conserved sequences known as the A block and the C block. Type II promoters, found in tRNA genes, consist of two 11-bp sequences, the A block and the B block, situated about 15 bp from either end of the gene.

874

Three common elements can be found in the RNA polymerase II promoter region

Figure 29.23: TATA box. Comparisons of the sequences of more than 100 eukaryotic promoters led to the consensus sequence shown. The subscripts denote the frequency (%) of the base at that position.

RNA polymerase II transcribes all of the protein-coding genes in eukaryotic cells. Promoters for RNA polymerase II, like those for bacterial polymerases, are generally located on the 5′ side of the start site for transcription. Because these sequences are on the same molecule of DNA as the genes being transcribed, they are called cis-acting elements. The most commonly recognized cis-acting element for genes transcribed by RNA polymerase II is called the TATA box on the basis of its consensus sequence (Figure 29.23). The TATA box is usually found between positions −30 and −100. Note that the eukaryotic TATA box closely resembles the prokaryotic −10 sequence (TATAAT) but is farther from the start site. The mutation of a single base in the TATA box markedly impairs promoter activity. Thus, the precise sequence, not just a high content of AT pairs, is essential.

Figure 29.24: CAAT box and GC box. Consensus sequences for the CAAT and GC boxes of eukaryotic promoters for mRNA precursors.

The TATA box is often paired with an initiator element (Inr), a sequence found at the transcriptional start site, between positions −3 and +5. This sequence defines the start site because the other promoter elements are at variable distances from that site. Its presence increases transcriptional activity.

A third element, the downstream core promoter element (DPE), is commonly found in conjunction with the Inr in transcripts that lack the TATA box. In contrast with the TATA box, the DPE is found downstream of the start site, between positions +28 and +32.

Additional regulatory sequences are located between −40 and −150. Many promoters contain a CAAT box, and some contain a GC box (Figure 29.24). Constitutive genes (genes that are continuously expressed rather than regulated) tend to have GC boxes in their promoters. The positions of these upstream sequences vary from one promoter to another, in contrast with the quite constant location of the −35 region in prokaryotes. Another difference is that the CAAT box and the GC box can be effective when present on the template (antisense) strand, unlike the −35 region, which must be present on the coding (sense) strand. These differences between prokaryotes and eukaryotes correspond to fundamentally different mechanisms for the recognition of cis-acting elements. The −10 and −35 sequences in prokaryotic promoters are binding sites for RNA polymerase and its associated σ factor. In contrast, the TATA, CAAT, and GC boxes and other cis-acting elements in eukaryotic promoters are recognized by proteins other than by RNA polymerase itself.

The TFIID protein complex initiates the assembly of the active transcription complex

Figure 29.25: Transcription initiation. Transcription factors TFIIA, B, D, E, F, and H are essential in initiating transcription by RNA polymerase II. The step-by-step assembly of these general transcription factors begins with the binding of TFIID (purple) to the TATA box. [The TATA-box-binding protein (TBP), a component of TFIID, recognizes the TATA box.] After assembly, TFIIH opens the DNA double helix and phosphorylates the carboxyl-terminal domain (CTD), allowing the polymerase to leave the promoter and begin transcription. The red arrow marks the transcription start site.

Cis-acting elements constitute only part of the puzzle of eukaryotic gene expression. Transcription factors that bind to these elements also are required. For example, RNA polymerase II is guided to the start site by a set of transcription factors known collectively as TFII (TF stands for transcription factor, and II refers to RNA polymerase II). Individual TFII factors are called TFIIA, TFIIB, and so on.

In TATA-box promoters, the key initial event is the recognition of the TATA box by the TATA-box-binding protein (TBP), a 30-kDa component of the 700-kDa TFIID complex (Figure 29.25). In TATA-less promoters, other proteins in the TFIID complex bind the core promoter elements but, because less is known about these interactions, we will consider only the TATA-box–TBP binding interaction. TBP binds 105 times as tightly to the TATA box as to nonconsensus sequences; the dissociation constant of the TBP–TATA-box complex is approximately 1 nM. TBP is a saddle-shaped protein consisting of two similar domains (Figure 29.26). The TATA box of DNA binds to the concave surface of TBP. This binding induces large conformational changes in the bound DNA. The double helix is substantially unwound to widen its minor groove, enabling it to make extensive contact with the antiparallel β strands on the concave side of TBP. Hydrophobic interactions are prominent at this interface. Four phenylalanine residues, for example, are intercalated between base pairs of the TATA box. The flexibility of AT-rich sequences is generally exploited here in bending the DNA. Immediately outside the TATA box, classical B-DNA resumes. The TBP–TATA-box complex is distinctly asymmetric. The asymmetry is crucial for specifying a unique start site and ensuring that transcription proceeds unidirectionally.

Figure 29.26: Complex formed by TATA-box-binding protein and DNA. The saddlelike structure of the protein sits atop a DNA fragment. Notice that the DNA is significantly unwound and bent.
[Drawn from 1CDW.pdb.]

875

TBP bound to the TATA box is the heart of the initiation complex (Figure 29.25). The surface of the TBP saddle provides docking sites for the binding of other components. Additional transcription factors assemble on this nucleus in a defined sequence. TFIIA is recruited, followed by TFIIB; then TFIIF, RNA polymerase II, TFIIE, and TFIIH join the other factors to form a complex called the basal transcription apparatus. During the formation of the basal transcription apparatus, the carboxyl-terminal domain (CTD) is unphosphorylated and plays a role in transcription regulation through its binding to an enhancer-associated complex called mediator (Section 32.2). The phosphorylated CTD stabilizes transcription elongation by RNA polymerase II and recruits RNA-processing enzymes that act in the course of elongation. Phosphorylation of the CTD by TFIIH marks the transition from initiation to elongation. The importance of the carboxyl-terminal domain is highlighted by the finding that yeast containing mutant polymerase II with fewer than 10 repeats in the CTD are not viable. Most of the factors are released before the polymerase leaves the promoter and can then participate in another round of initiation.

Multiple transcription factors interact with eukaryotic promoters

The basal transcription complex described in the preceding section initiates transcription at a low frequency. Additional transcription factors that bind to other sites are required to achieve a high rate of mRNA synthesis. Their role is to selectively stimulate specific genes. Upstream stimulatory sites in eukaryotic genes are diverse in sequence and variable in position. Their variety suggests that they are recognized by many different specific proteins. Indeed, many transcription factors have been isolated, and their binding sites have been identified by footprinting experiments. For example, heat-shock transcription factor (HSTF) is expressed in Drosophila after an abrupt increase in temperature. This 93-kDa DNA-binding protein binds to the following consensus sequence:

Several copies of this sequence, known as the heat-shock response element, are present starting at a site 15 bp upstream of the TATA box.

876

HSTF differs from σ32, a heat-shock protein of E. coli, in binding directly to response elements in heat-shock promoters rather than first becoming associated with RNA polymerase.

Enhancer sequences can stimulate transcription at start sites thousands of bases away

The activities of many promoters in higher eukaryotes are greatly increased by another type of cis-acting element called an enhancer. Enhancer sequences have no promoter activity of their own yet can exert their stimulatory actions over distances of several thousand base pairs. They can be upstream, downstream, or even in the midst of a transcribed gene. Moreover, enhancers are effective when present on either the coding or non-coding DNA strand.

A particular enhancer is effective only in certain cells. For example, the immunoglobulin enhancer functions in B lymphocytes but not elsewhere. Cancer can result if the relation between genes and enhancers is disrupted. In Burkitt lymphoma and B-cell leukemia, a chromosomal translocation brings the proto-oncogene myc (a transcription factor itself) under the control of a powerful immunoglobin enhancer. The consequent dysregulation of the myc gene is believed to play a role in the progression of the cancer.

Transcription factors and other proteins that bind to regulatory sites on DNA can be regarded as passwords that cooperatively open multiple locks, giving RNA polymerase access to specific genes. The discovery of promoters and enhancers has allowed us to gain a better understanding of how genes are selectively expressed in eukaryotic cells. The regulation of gene transcription, discussed in Chapter 32, is the fundamental means of controlling gene expression.

Although bacteria lack TBP, archaea utilize a TBP molecule that is structurally quite similar to the eukaryotic protein. In fact, transcriptional control processes in archaea are, in general, much more similar to those in eukaryotes than are the processes in bacteria. Many components of the eukaryotic transcriptional machinery evolved from those in an ancestor of archaea.