20.1 TRANSCRIPTIONAL REGULATION

Cells need to maintain control of their growth and must be able to adapt quickly to a changing environment, but they also strive for energy efficiency. For this reason, transcription is a common site of regulation, because unnecessary downstream steps in gene expression can be avoided, thus conserving energy. Turning transcription off—or down—can alter protein levels without involving the cell’s protein biosynthetic machinery at all. The control of transcription also permits the synchronized regulation of multiple genes encoding products with interdependent activities; for example, when their DNA is heavily damaged, bacterial cells require a coordinated increase in the levels of many DNA repair enzymes. Interactions between proteins and DNA are the key to transcriptional regulation, and much is now known about the specificity of transcriptional control.

As we discussed in Chapters 15 and 19, bacterial genes contain relatively simple promoters with sequences that allow RNA polymerase to bind and initiate transcription at specific sites. Variations in promoter sequences, or in the space between promoters and the gene(s) they control, affect transcriptional efficiency. In addition, bacteria have mechanisms for regulating groups of genes. For example, functionally related genes frequently cluster together in operons (see Figure 19-11), where they can be controlled by a single promoter. Sigma factors, which bind and regulate RNA polymerase, also contribute to the global control of transcription (see Table 15-2). In some cases, activator and repressor proteins confer further levels of regulation by altering transcription in response to metabolites. The activities of multiple activators and repressors can converge on a single promoter to fine-tune transcription in response to various stimuli.

In this section we discuss the regulation of two bacterial operons for which mechanistic details are particularly well established. The lactose (lac) and tryptophan (trp) operons both require multiple regulatory proteins, but the overall mechanisms of regulation are distinct. We then consider the SOS response in E. coli, illustrating how genes scattered throughout the genome can be coordinately regulated. Throughout this discussion we cover some of the experimental approaches that have provided insights into these regulatory pathways.

The lac Operon Is Subject to Negative Regulation

Many of the principles of bacterial gene expression were first discovered in studies of sugar metabolism in Escherichia coli. This bacterium can use a variety of sugars as an energy source, depending on what is available in its environment. Metabolism of each sugar type requires a unique set of enzymes. The genes encoding this set of enzymes are often grouped together into an operon, which allows the genes to be coordinately regulated. In the 1960s, two French scientists, François Jacob and Jacques Monod, examined the E. coli genes involved in metabolizing the sugar lactose. Through genetic experiments, they determined how expression of these genes is coordinately regulated in response to the presence or absence of lactose. This work, for which they won the 1965 Nobel Prize in Physiology or Medicine, uncovered one of the central themes in molecular biology: some genes encode proteins with the sole function of regulating the expression of other genes.

695

The lac operon (Figure 20-1) includes the genes for β-galactosidase (lacZ), galactoside permease (lacY), and thiogalactoside transacetylase (lacA)—sometimes referred to collectively as the lac genes. Although the operon is transcribed as a single unit (i.e., the mRNA is polycistronic), the transcript contains three ribosome-binding sites, one preceding each open reading frame, that allow independent translation of each protein product. Each resulting protein functions in the metabolism of lactose. β-Galactosidase catalyzes cleavage of lactose into its components, glucose and galactose (Figure 20-2), which can then be metabolized further to generate ATP. The galactoside permease protein inserts into the bacterial plasma membrane and imports lactose into the cell. Thiogalactoside transacetylase modifies toxic galactosides that are imported along with lactose, facilitating their removal from the cell. When lactose is available, wild-type E. coli expresses these three genes. When lactose is unavailable, transcription from the lac operon is greatly reduced (Figure 20-3a).

Figure 20-1: The lactose (lac) operon of E. coli. The three genes of the lac operon are transcribed as a single unit from a single promoter. The operator region regulates transcription through interaction with the Lac repressor protein, encoded by lacI. The repressor is transcribed separately from the operon (i.e., has a separate promoter) and is constitutively expressed.
Figure 20-2: Lactose metabolism in E. coli. Galactoside permease, encoded by lacY, is a membrane protein that permits entry of lactose into the cell. β-Galactosidase, encoded by lacZ, converts lactose to galactose and glucose, and also converts a small amount of lactose to allolactose, the lac operon inducer.
Figure 20-3: Jacob and Monod’s merodiploid analysis of the lac operon. (a) A simplified view of the wild-type lac operon. (b) Jacob and Monod isolated two mutant strains of otherwise normal (haploid) bacterial cells with mutations resulting in constitutive expression from the lac operon. These strains had mutations in either lacI (the repressor gene) or lacO (the operator region) that rendered the gene or region nonfunctional. (c) Results from Jacob and Monod’s analysis of merodiploid (partial diploid) strains carrying both a mutant and a wild-type lac operon suggested that the product of the lacI gene acts in trans (i.e., is diffusible; top) and the lacO region functions in cis (i.e., does not produce a diffusible product; middle). A double mutant analysis confirmed this hypothesis (bottom).

Jacob and Monod isolated mutants of E. coli with defective regulation of the lac operon. They identified lacI and lacO, two DNA regions where mutations led to constitutive expression of the operon whether or not lactose was present (Figure 20-3b). To understand how lacI and lacO worked, the researchers performed merodiploid analysis, a procedure that essentially makes the bacterial cell diploid for the lac operon locus (see the How We Know section at the end of Chapter 5).

The first partial diploids they created combined wild-type strains with either the lacI or lacO mutants (Figure 20-3c, top and middle). The wild-type lacI allele was able to make up for (or “rescue”) the defect in the lacI mutant; these partial diploids had normal regulation of both sets of lactose metabolism genes. Jacob and Monod hypothesized that the lacI locus produced a diffusible product that could act on any DNA molecule, not just the DNA from which it was generated. In this case, the lacI gene product from the wild-type strain successfully regulated the lac operon DNA of the lacI mutant. However, a wild-type copy of the lacO regulatory region was not capable of rescuing the defect in a lacO mutant; these partial diploids still constitutively expressed the lac genes. Jacob and Monod hypothesized that lacO did not produce a diffusible substance: the lac operon DNA from the lacO mutant could not be correctly regulated, even in the presence of a wild-type copy of lacO.

Jacob had served in the military, and he likened the observations on the lac operon to the communication link between a bomber aircraft and a ground-based radio transmitter. If the transmitter on the ground were knocked out, a second transmitter could be used to direct the actions of the bomber. But if the receiver in the bomber were knocked out, neither a second transmitter nor a new bomber could direct the actions of the first bomber. Jacob and Monod hypothesized that lacI functioned like the transmitter; if knocked out it could be replaced by a second transmitter. But lacO functioned like the receiver in the bomber: if a message could not be received, the action (transcription of the lac genes) could not be controlled.

696

With this hypothesis, the researchers tested a prediction (see Figure 20-3c, bottom). In a further merodiploid analysis, they combined an operon carrying a lacZ mutation with an operon carrying mutations in lacO and lacY. The mutations in genes encoding different enzymes (lacZ or lacY) effectively “marked” the operons and allowed the researchers to determine which operon was giving rise to a particular gene product. They predicted that the lacI gene product (no matter which allele produced it) would not be capable of repressing the lacZ gene in an operon containing a lacO mutation. In the absence of lactose, this diploid construction produced β-galactosidase (the lacZ gene product)—exactly the result they predicted! These findings confirmed that the lacI gene encodes a diffusible molecule (i.e., acts in trans) that represses lac gene expression (whether on the same or, experimentally, on a different DNA), whereas lacO controls only the expression of lac operon genes to which it is connected (i.e., acts in cis).

697

KEY CONVENTION

In genetics, genes or gene products that operate “in cis” are those that must be physically linked to have an effect. Genes or gene products that operate “in trans” can function even when not physically associated with one another (i.e., a diffusible product is involved). Note that these definitions are distinct from the conventions governing cis and trans terminology in chemistry, where these terms refer to the orientation of covalently attached functional groups with respect to each other (see Chapter 4).

From the experiments performed by Jacob and Monod, we know how the lac operon functions. Operon control consists of two main elements: a protein repressor (the Lac repressor, encoded by the lacI gene) and a DNA sequence called the operator (lacO) to which the Lac repressor binds. In the absence of lactose, the lac genes are not transcribed, because the Lac repressor binds to the operator sequence. The lacI gene is located near the lac operon, but it is transcribed from its own promoter, independent of the lac genes. The operator is adjacent to the lac operon promoter, and repressor binding to the operator prevents RNA polymerase from initiating transcription of the DNA (Figure 20-4). When lactose is present, a small amount of allolactose, an isomer of lactose, is produced (see Figure 20-2). Allolactose is a small effector molecule that functions as an inducer of the lac operon, binding to the Lac repressor and causing the repressor to lose affinity for and dissociate from the operator. On dissociation of the repressor, the operon becomes active, because RNA polymerase is able to initiate transcription and synthesize the polycistronic mRNA encoding the lac genes. It is important to note that Jacob and Monod’s initial experiments examined E. coli grown solely in the presence of lactose, with no other sugars available for metabolism. As we will see shortly, bacteria preferentially metabolize some sugars over others and impose additional levels of regulation on the lac operon to shut down its transcription if a more highly preferred sugar source (such as glucose) is also available.

Figure 20-4: Negative regulation of the lac operon by the Lac repressor. In the absence of lactose (top), the Lac repressor binds the operator region and thus prevents RNA polymerase from leaving the promoter site and transcribing the operon. In the presence of lactose (bottom), its metabolite allolactose binds the Lac repressor, resulting in a conformational change that causes the repressor to dissociate from the operator. RNA polymerase can then initiate transcription. (Note that allolactose is enlarged for clarity; it is actually much smaller relative to the Lac repressor.)

Although it may seem simple today, the regulatory circuitry of the lac operon, the first to be discovered, was revealed only through powerful insight, prediction, intuitive reasoning, and creative thinking. Prior studies had focused on the fact that DNA encoded enzymes. But research on lactose metabolism revealed that some genes encode other kinds of proteins, such as DNA-binding proteins (e.g., the Lac repressor), and that some DNA sequences do not code for a gene product at all, but instead form genetic loci that affect cell function (e.g., lacO, the operator region).

The lac operon has been the subject of intense scrutiny by many laboratories since Jacob and Monod made their first observations. As research has progressed, new details of the regulatory machinery have come to light. We now know, for instance, that the operator region is more complex than suggested in Figure 20-1; in fact, there are three operator sequences.

The O1 operator, to which the Lac repressor binds most tightly, abuts the lac operon’s transcription start site (Figure 20-5a), but the operon also has two secondary binding sites for the repressor. One (O2) is about 400 bp downstream, within the gene encoding β-galactosidase (lacZ); the other (O3) is about 100 bp upstream, at the end of the lacI gene. As described in Chapter 19, most repressor proteins function as dimers, with each subunit binding to one half of an inverted repeat. The Lac repressor is unusual in that it functions as a tetramer of identical subunits, with two dimers tethered together at the end distant from the DNA-binding sites (Figure 20-5b). DNA recognition occurs in the major groove by means of a helix-turn-helix motif near the N-terminus of the repressor protein. An adjacent helix known as the hinge is important for positioning the helix-turn-helix and ensuring high-affinity DNA binding. As shown by crystal structures of the repressor, both alone and bound to inducers similar to allolactose, inducer binding causes disordering of the hinge and a resulting increase in flexibility of the DNA-binding region. DNA-binding affinity of inducer-bound Lac repressor is decreased, in a classic example of allosteric regulation (see Chapter 5).

Figure 20-5: Interaction of the Lac repressor and the operator region. (a) The lac operon contains three operator sequences to which Lac repressor can bind. O1 is adjacent to the lac operon promoter. The inverted repeat of the O1 site is shown (sequence repeats are shaded orange). (b) The Lac repressor tetramer can bind to O1 and O2 or to O1 and O3. The intervening DNA is looped out. The molecular model shows the Lac repressor, a tetramer formed from two homodimers. Each homodimer can bind one operator sequence.

698

When lactose levels are low, an E. coli cell contains about 20 tetramers of the Lac repressor. Each tethered dimer of the repressor tetramer separately binds to one of the three inverted-repeat operator sequences (see Figure 20-5b). To repress the operon, one dimer binds to the O1 operator and the other dimer binds simultaneously to one of the two secondary sites, O2 or O3. The symmetry of the O1 sequence corresponds to the twofold axis of symmetry of two paired Lac repressor subunits. The tetrameric Lac repressor binds to its operator sequences in vivo with very high affinity, with an estimated dissociation constant of about 10−10 m. The repressor discriminates between the operator and nonoperator sequences by a factor of about 106, so binding to just these few base pairs among the 4.6 × 106 bp of the E. coli chromosome is highly specific.

The simultaneous binding of the Lac repressor tetramer to O1 and to O2 or O3 most likely results in a looped DNA structure, providing an effective steric block to transcription initiation by RNA polymerase. Because each dimer of the repressor binds to a separate region of DNA, and DNA looping enables two regulatory sites to be bound at the same time, the sensitivity of the system is enhanced by the cooperative nature of the binding. In other words, the affinity of one dimer for DNA is affected by the conformation (DNA-bound or not) of the other dimer. The process of working out how the Lac repressor functions required the development of techniques for detecting when and how proteins bind to specific sites in DNA (Highlight 20-1). These methods are still widely used to analyze the properties of DNA-binding proteins in a variety of systems.

Despite formation of an elaborate complex, transcriptional repression of the lac operon by the Lac repressor is not absolute. Binding of the repressor reduces the rate of transcription initiation by a factor of 103. If the O2 and O3 sites are eliminated by deletion or mutation, the binding of repressor to O1 alone reduces transcription by a factor of about 102. Even in the repressed state, each cell has a few molecules of β-galactosidase and galactoside permease, presumably synthesized on the rare occasions when the repressor transiently dissociates from the operators. This basal level of transcription is essential to operon regulation.

When cells are provided with lactose, the lac operon is induced. The few existing molecules of galactoside permease enable lactose from the medium to enter the cell, where it is converted by β-galactosidase to allolactose, a lactose isomer (see Figure 20-2). Allolactose binds to a specific site on the Lac repressor, causing a conformational change that results in dissociation of the repressor from the operator. Release of the operator, triggered as the repressor binds to the inducer allolactose, allows expression of the lac genes and a 103-fold increase in the concentration of β-galactosidase.

699

Several β-galactosides structurally related to allolactose are inducers of the lac operon but are not substrates for β-galactosidase; others are substrates but not inducers. One very effective and nonmetabolizable inducer of the lac operon often used experimentally is isopropyl β-d-1-thiogalactopyranoside (IPTG) (Figure 20-6). An inducer that cannot be metabolized lets researchers study the regulation of the lac operon without concern about the inducer being depleted. Equally useful as a tool in molecular biology is the noninducer substrate X-gal (5-bromo-4-chloro-3-indolyl-β-d-galactopyranoside), which consists of galactose linked to a substituted indole. β-Galactosidase cleaves X-gal to produce galactose and 5-bromo-4-chloro-3-hydroxyindole, which is oxidized to an insoluble blue compound, 5,5′-dibromo-4,4′-dichloro-indigo. Bacterial colonies grown on an agar medium containing X-gal and an inducer of β-galactosidase (usually IPTG) turn blue if they contain a functional lacZ gene, a useful marker in molecular cloning (see the How We Know section at the end of Chapter 5).

Figure 20-6: Chemical structures of some small-molecule effectors of the lac operon. Like allolactose, IPTG (isopropyl β-d-1-thiogalactopyranoside) can bind the Lac repressor and cause its dissociation from the operator, inducing transcription of the lac operon. However, IPTG is not a substrate for β-galactosidase. The β-galactoside X-gal (5-bromo-4-chloro-3-indolyl-β-d-galactopyranoside) does not induce expression of the lac operon, but it does serve as an experimentally useful substrate for β-galactosidase, producing a blue color when metabolized.

The lac Operon Also Undergoes Positive Regulation

The operator-repressor-inducer interactions affecting the lac operon provide an intuitively satisfying model for an on/off switch in the regulation of gene expression. However, operon regulation is rarely that simple. A bacterium’s environment is too complex for its genes to be controlled by one signal. Other factors besides lactose affect the expression of the lac genes, such as the availability of glucose. Glucose, metabolized directly by glycolysis, is E. coli’s preferred energy source. Other sugars can serve as the main or sole nutrient, but extra steps are required to prepare them for entry into glycolysis, necessitating the synthesis of additional enzymes. Clearly, expressing the genes for proteins that metabolize other sugars, such as lactose or arabinose, is wasteful when glucose is abundant. Only in the absence of glucose is it in the cell’s best interest to increase the expression of genes that allow the use of alternative energy sources.

What happens to the expression of the lac operon when both glucose and lactose are available? A regulatory mechanism known as catabolite repression restricts expression of the genes required for metabolizing lactose, arabinose, or other sugars in the presence of glucose, even when these secondary sugars are also present. At first glance this may seem like another example of negative regulation, but it is a form of positive regulation for the lac operon. As we will see, the lac operon is activated for gene expression when glucose is absent.

The effect of glucose on expression of the lac operon is mediated by cyclic AMP (cAMP), a small-molecule effector, and by the activator cAMP receptor protein, or CRP, a homodimer with binding sites for DNA and cAMP (Figure 20-7). DNA binding is mediated by a helix-turn-helix motif within the protein’s DNA-binding domain. When glucose is absent, CRP-cAMP binds to a site near the Lac promoter and stimulates RNA transcription fiftyfold. CRP-cAMP is therefore a positive regulatory factor responsive to glucose levels, whereas the Lac repressor is a negative regulatory factor responsive to lactose. The two act in concert.

Figure 20-7: Structure of the cAMP receptor protein (CRP) homodimer bound to DNA. In the absence of glucose, the CRP-cAMP complex binds near the Lac promoter and associates with RNA polymerase to induce transcription.

700

HIGHLIGHT 20-1 TECHNOLOGY: Classical Techniques in the Analysis of Gene Regulation

Many proteins affect gene expression, and some of them do so in part by interacting directly with DNA. Researchers often want to find out whether regulatory proteins bind DNA and, if so, what sequence or structure they recognize and how tight the interaction is. The electrophoretic mobility shift assay (EMSA) and DNA footprinting experiments (in various forms) are widely used to address these questions.

Both of these techniques test the ability of a protein to interact directly with DNA. In EMSA, fragments of DNA are incubated with the protein of interest and then analyzed on a nondenaturing polyacrylamide or agarose gel. The DNA used in the experiment is visualized either by staining with a dye or by covalently attaching a radioactive phosphate group at one end. Free DNA fragments migrate more quickly through the gel than DNA bound by protein. Thus, a shift in migration of DNA from fast to slow in the presence of protein indicates a direct binding interaction between the protein and DNA (Figure 1).

FIGURE 1 Results of an electrophoretic mobility shift assay demonstrate the shift in migration of Lac operator DNA (lane 1) with increasing concentrations of Lac repressor protein (lanes 2 through 7). The nondenaturing polyacrylamide gel preserves the Lac repressor in its folded state so that it can interact with the operator. The gel was stained with fluorophores that bind DNA (green) and protein (red). Yellow bands indicate protein-DNA complexes. (MW indicates the lane with molecular weight markers.)

Once a direct DNA-binding interaction has been established, DNA footprinting can be used to map the exact nucleotide bases in contact with the bound protein (see Figure 15-12). Nucleases are incubated with DNA-protein complexes to cleave the DNA—often radiolabeled so that it can be readily visualized; cleavage occurs at sites exposed to solvent, but not at sites that are physically protected by the presence of bound protein. Conditions are carefully controlled so that each piece of DNA is cleaved only once, generating a set of fragments that represent all possible cleavage products, with the exception of DNA protected by the bound protein. The resulting fragments are separated by denaturing gel electrophoresis and detected by exposing the gel to film or to a phosphorimager screen (which detects radioactive emissions from the 32P-labeled bands of DNA fragments). The gap in the cleavage sites where the protein associates with the DNA produces a “footprint” that indicates the boundaries of the protein-binding site. The footprint can be identified by analyzing the sites of nuclease cleavage in the DNA before and after adding the protein (Figure 2a; the result for a specific example is shown in Figure 2b).

FIGURE 2 (a) DNA footprinting analysis reveals a protein’s binding site on a DNA fragment. (b) In this example, the binding site of RNA polymerase at the Lac promoter is determined using DNase to digest the lac DNA wherever the polymerase is not directly binding it and protecting it. The lanes show no polymerase added (−), polymerase added (+), and the control reaction with no DNase added (C). The upstream sites are indicated to the right of the gel.

A related approach, called chemical protection footprinting, uses chemical reagents such as dimethyl sulfate to covalently modify DNA at nucleotide bases not protected by bound protein. This results in DNA containing methyl groups at sites outside the protein-binding site; the position of the protein creates a “footprint” marked by an absence of methylation sites. To locate the footprint, modified sites are detected using a DNA polymerase to copy the methylated template DNA by extending an annealed primer oligonucleotide that binds to a region just outside the DNA segment to be analyzed. The methylated bases lead to chemical fragmentation of the DNA, and the ends of the resulting DNA fragments are mapped by primer elongation. The products of the elongation reaction terminate at sites of methylated bases. These products are identified by analysis on a denaturing polyacrylamide gel, alongside the products of control primer-extension reactions conducted in parallel on unmodified DNA in the presence of dNTPs plus a small amount of A, C, G, or T dideoxynucleotide, which are chain-terminating nucleotide analogs. The products in these control reactions correspond to the positions of T, G, C, or A in the sequence, enabling exact determination of the sites of the protein footprint.

Chemical modification interference, a related approach, involves first reacting the DNA with a limiting amount of a chemical reagent to introduce nucleotide modifications randomly, at just one or a few sites. The resulting pool of modified DNA molecules, each containing a modification at a different site or sites, is then allowed to bind to protein, and free DNA is separated from DNA-protein complexes by nondenaturing gel electrophoresis, as in EMSA. Free and protein-bound DNA can then be excised from the gel, eluted from the gel matrix, and analyzed by the primer-extension method. The DNA in the protein-bound sample contain chemical modifications only at sites that do not interfere with protein binding. DNA molecules in the unbound sample—which migrated differently in the gel than the protein-bound DNA—contain chemical modifications primarily at sites that interfere with protein recognition.

701

702

CRP-cAMP has little effect on the lac operon when the Lac repressor is blocking transcription (Figure 20-8a, b), and dissociation of the repressor from the operator has little effect unless CRP-cAMP is present to facilitate transcription (Figure 20-8c). When CRP-cAMP is not bound, the wild-type Lac promoter is a relatively weak promoter. The open RNA polymerase–promoter complex does not form readily unless CRP-cAMP is present. CRP interacts directly with RNA polymerase (at the region shown in Figure 20-7) through the polymerase’s α subunit. Binding of CRP to the α subunit stimulates polymerase binding to the Lac promoter, triggering formation of the open polymerase-promoter complex.

Figure 20-8: Positive regulation of the lac operon by CRP. (a), (b) When lactose is absent, the repressor binds the operator, blocking RNA polymerase and preventing transcription of the lac genes. It does not matter whether glucose is present or absent (and thus whether or not CRP-cAMP binds the operon). (c) When lactose is available, the repressor dissociates from the operator. However, if glucose is also available, low cAMP levels prevent CRP-cAMP formation and DNA binding. RNA polymerase may weakly bind the promoter and occasionally initiate transcription, leading to a very low level of lac operon expression. (d) Only when glucose levels are low, causing cAMP levels to rise and CRP-cAMP to bind the operon, and when lactose is present, causing repressor to dissociate, does the polymerase robustly bind and transcription proceed. (Note that cAMP is enlarged for clarity; it is actually much smaller relative to CRP; see Figure 20-7).

The effect of glucose on CRP is mediated by the cAMP interaction. CRP binds to DNA most avidly when cAMP concentrations are high. When glucose is transported into the cell, the synthesis of cAMP is inhibited and efflux of cAMP from the cell is stimulated. As the cAMP concentration declines, CRP binding to DNA declines, thereby decreasing expression of the lac operon. Strong induction of the lac operon therefore requires both lactose (to inactivate the Lac repressor) and a lowered concentration of glucose (to trigger an increase in cAMP concentration and increased binding of cAMP to CRP) (Figure 20-8d).

CRP and cAMP are involved in the coordinated regulation of many operons, primarily those that encode enzymes for the metabolism of secondary sugars such as lactose and arabinose. Recall from Chapter 19 that a network of operons with a common regulator is known as a regulon. Other bacterial regulons include the heat shock gene system that responds to changes in temperature (see Chapter 15) and the genes induced in E. coli as part of the SOS response to DNA damage (described later in this section).

CRP Functions with Activators or Repressors to Control Gene Transcription

Other secondary sugars also trigger expression of their metabolic enzymes when present in the environment, and again, CRP provides a mechanism for activation only in the absence of the preferred sugar, glucose. For example, arabinose metabolism is regulated by CRP and the protein AraC, which acts as either an activator or a repressor of the arabinose (ara) operon, depending on whether arabinose is present. When arabinose is absent, AraC forms a dimeric structure in which one AraC monomer binds to the ara operon gene araI1 and the other binds a separate site much farther upstream called araO2 (Figure 20-9a). Similar to the effect of the Lac repressor, this mode of DNA binding causes the DNA to loop into a configuration that inhibits polymerase binding. When arabinose is present, it binds to AraC, causing AraC to adopt a different dimeric conformation that allows binding to two adjacent DNA half-sites, araI1 and araI2 (Figure 20-9b). This positions one monomer of AraC close to the promoter, where it can recruit RNA polymerase to activate transcription.

Figure 20-9: Regulation of the ara operon. (a) When arabinose is absent, AraC forms a dimer in which one monomer binds to araO2 and the other to araI1, preventing RNA polymerase binding and transcription of the operon. (b) Activation of the ara operon occurs when AraC binds arabinose (its small-molecule effector) and CRP-cAMP (formed in the absence of glucose). The AraC dimer changes conformation such that one monomer binds araI1 and the other binds araI2. The interaction with araI2 recruits RNA polymerase to the promoter and activates transcription of the ara operon. The molecular models show the AraC dimerization domain in the (a) absence and (b) presence of arabinose.

703

Cynthia Wolberger

In determining the crystal structures of the AraC arabinose-binding and dimerization domains in the presence and absence of l-arabinose, Cynthia Wolberger found that arabinose binding changes the structure of the AraC dimerization domain. As long as glucose is absent, CRP-cAMP occupies a site on the DNA preceding the AraC-binding site and helps activate transcription of the ara operon.

In the case of the galactose (gal) operon, the Gal repressor inhibits transcription of the operon in the absence of galactose, and CRP-cAMP serves as the activator in the absence of glucose. The Gal repressor works differently from the Lac repressor in that it does not prevent RNA polymerase from binding to the Gal promoter. Instead, the Gal repressor probably prevents transition of the polymerase-promoter complex from the closed to the open form, thereby blocking formation of the elongation-competent form of RNA polymerase (Figure 20-10).

Figure 20-10: Regulation of the gal operon. (a) Structure of the gal operon. (b) The Gal repressor does not prevent RNA polymerase from binding the promoter; rather, it prevents formation of the open promoter-polymerase complex that is required for transcription initiation. (c) Similar to regulation of the lac and ara operons, transcription of the gal operon is increased only when glucose is absent, and thus CRP-cAMP binding is required.

704

Transcription Attenuation Often Controls Amino Acid Biosynthesis

Other important small molecules besides sugars help in regulating the expression of the genes involved in their metabolism. E. coli can produce all 20 of the common amino acids required for protein synthesis, but biosynthesis of an amino acid is necessary only when the intracellular concentration of that amino acid is low. The genes encoding the enzymes for synthesizing an amino acid generally cluster in an operon that is repressed whenever existing supplies of that amino acid are adequate for cellular requirements. When more of the amino acid is needed, the operon is actively transcribed and the biosynthetic enzymes are expressed.

The E. coli tryptophan (trp) operon provides a classic example of the kind of regulation that enables fine-tuning of gene expression levels to suit the needs of the cell (Figure 20-11a, b). The trp operon includes five genes encoding the enzymes required to synthesize tryptophan. The short half-life (∼3 minutes) of the mRNA transcribed from the trp operon allows the cell to respond rapidly to changing needs for tryptophan. A homodimeric repressor protein, the Trp repressor, regulates the operon. Tryptophan acts as a small-molecule effector for the Trp repressor (Figure 20-11b). When tryptophan is abundant, it binds the repressor and induces a conformational change that permits the repressor to bind the Trp operator and inhibit expression of the trp operon. The Trp operator site overlaps the promoter such that binding of the repressor blocks the binding of RNA polymerase. In this way, the trp operon is negatively regulated: a corepressor (in this case tryptophan) binds the repressor protein, rendering the repressor competent for DNA binding. This is distinct from the negative regulation of the lac operon, in which the Lac repressor binds the operator in the absence of inducer (allolactose), dissociating from DNA only when the small-molecule effector is present.

Figure 20-11: Regulation of the trp operon. (a) In the absence of tryptophan, the Trp repressor cannot bind the operator, and transcription of the trp operon is initiated. (b) When tryptophan is abundant, the protein products from the trp operon are no longer needed. Tryptophan serves as the effector molecule for the Trp repressor; their association causes the Trp repressor to bind the operator, blocking transcription. Notice the presence of the leader sequence; this is required for a second level of transcriptional control (see Figure 20-12). The molecular model shows the homodimeric trp repressor bound to DNA.

Once again, this simple on/off circuit mediated by a repressor protein and small effector molecule is only part of the regulatory story. Different cellular concentrations of tryptophan can alter the rate of synthesis of the biosynthetic enzymes over a 700-fold range. Repressor action accounts for only about a 70-fold difference in gene expression between the repressed and activated states of the operon. Once repression is lifted and transcription begins, the rate of transcription is modulated by a second regulatory process, transcription attenuation, in which transcription is initiated normally but is abruptly halted before the operon genes are transcribed. Attenuation provides a honing of gene expression that, when combined with repressor action, results in the 700-fold difference in expression of the tryptophan biosynthetic enzymes.

The frequency with which transcription of the trp operon is attenuated is regulated by the availability of tryptophan in the cell and relies on the very close coupling of transcription and translation in bacteria. This mode of regulation is necessarily unique to cells that lack a nucleus. In eukaryotic cells, where transcription and translation are physically and temporally separated, these processes cannot be coupled for the kind of attenuation described here.

The mechanism of attenuation in the trp operon relies on a 162-nucleotide region at the 5′ end of the mRNA, called the leader sequence, which precedes the initiation codon of the first gene (Figure 20-12a). Within the leader sequence are four regulatory regions, sequences 1 through 4. Sequences 3 and 4 can base-pair to form a terminator, a G≡C-rich stem-and-loop (hairpin) structure, closely followed by a series of U residues. Formation of the terminator causes RNA polymerase to terminate transcription prematurely and dissociate from the DNA before the operon genes can be transcribed. The termination mechanism involves polymerase slowing or stalling when encountering the stable hairpin (terminator), then dissociating from the DNA as a result of relatively weak base pairing between the adjacent U-rich sequence and the complementary A-rich sequence in the DNA template. However, sequence 3 can also base-pair with sequence 2. When sequences 2 and 3 associate, the terminator cannot form, and uninterrupted transcription continues into the trp genes. The loop formed by the pairing of sequences 2 and 3 does not block transcription.

Figure 20-12: Graded control of the trp operon through transcription attenuation. (a) The leader sequence of the trp mRNA. The transcript generated from the trp promoter includes a leader sequence at the 5′ end (containing four regulatory regions labeled 1 through 4). A portion of this sequence (sequence 1) is translated into the leader peptide, which has no known function other than to regulate the trp operon. (b) In the presence of tryptophan, the ribosome translates quickly through the Trp codons of sequence 1 and into sequence 2, allowing sequences 3 and 4 to associate to form a hairpin that stalls the RNA polymerase and terminates transcription. (c) In the absence of tryptophan, the ribosome stalls in sequence 1, allowing sequences 2 and 3 to associate. With sequence 3 unavailable to associate with sequence 4, the terminator structure is not formed and transcription can proceed. The amount of free tryptophan available for protein synthesis thus determines whether the trp operon is transcribed.

705

How is hairpin choice determined? Regulatory sequence 1 and the availability of tryptophan are crucial for determining whether sequence 3 pairs with sequence 2 (letting transcription continue) or with sequence 4 (attenuating transcription). Formation of the terminator stem-and-loop structure depends on events that occur during translation of regulatory sequence 1. Sequence 1 encodes a leader peptide of 14 amino acids (Figure 20-12b), two of which are Trp residues. The leader peptide has no other known cellular function; its synthesis is simply an operon regulatory device. This peptide is translated immediately after the leader RNA is transcribed, by a ribosome on the nascent mRNA that follows closely behind RNA polymerase on the DNA as transcription proceeds.

When tryptophan concentrations are high, concentrations of Trp-tRNATrp are also high. This allows translation to proceed rapidly past the two Trp codons of sequence 1 and into sequence 2, before sequence 3 is transcribed by RNA polymerase. In this situation, sequence 2 is covered by the ribosome and unavailable for pairing to sequence 3 when sequence 3 is synthesized; the terminator structure (sequences 3 and 4) forms instead, and transcription halts (see Figure 20-12b). However, when tryptophan concentrations are low, the ribosome stalls at the two Trp codons in sequence 1, because Trp-tRNATrp is less available. Sequence 2 remains free while sequence 3 is transcribed, allowing these two sequences to base-pair. Sequence 3 is then unavailable for pairing with sequence 4, preventing formation of the terminator and letting transcription proceed (Figure 20-12c). In this way, the proportion of transcripts that are prematurely terminated declines as tryptophan concentration declines. Other bacteria also use multiple levels of regulation for control of tryptophan biosynthetic genes; an example of this is the TRAP system found in Bacillus subtilis (see the How We Know section at the end of this chapter).

In E. coli, other amino acid biosynthetic operons use a similar attenuation strategy to fine-tune enzyme production to meet the prevailing cellular requirements. For example, the 15-residue leader peptide produced by the phe operon contains seven Phe residues. The leu operon leader peptide has four contiguous Leu residues. The his operon leader peptide has seven contiguous His residues. In fact, in the his operon and several others, attenuation is sufficiently sensitive to be the only regulatory mechanism.

The SOS Response Leads to Coordinated Transcription of Many Genes

As described in Chapter 19, the many different genes that are required for a particular cell function are sometimes regulated together by a single transcription factor and/or small-molecule effector. This global regulation of transcription is an economical way for the cell to coordinate the expression of multiple genes that are needed at the same time.

706

An interesting example of this kind of genetic control is the cell’s response to DNA damage. Extensive breakage or mutation of the bacterial chromosome triggers the expression of genes involved in DNA repair, which are located at different sites in the chromosome. This response is known as the SOS response and requires two key regulatory proteins: the RecA protein and the LexA repressor protein.

SOS genes encode proteins useful to cells with damaged DNA. These include Y-family polymerases, also known as translesion synthesis (TLS) polymerases, which have relaxed fidelity and can replicate DNA containing chemical lesions, such as UV-induced cross-links. The LexA repressor inhibits transcription of all the SOS genes by binding near their promoters, and induction of the SOS response requires the removal of LexA (Figure 20-13a). This is not a simple dissociation from DNA in response to the binding of a small molecule, as in the regulation of the lac operon. Instead, the LexA repressor inactivates itself by catalyzing self-cleavage at a specific Ala–Gly peptide bond, producing two roughly equal protein fragments. At physiological pH, this autocleavage reaction requires the RecA protein. RecA is not a protease in the classical sense, but its interaction with LexA promotes the repressor’s self-cleavage reaction. This function of RecA is sometimes called a coprotease activity.

Figure 20-13: The global SOS response to DNA damage. (a) In the default state of the E. coli cell, the LexA repressor prevents transcription of the SOS genes. In response to DNA damage, LexA is stimulated to undergo autocleavage, inactivating itself and allowing transcription of the SOS genes. (b) Autocleavage of the LexA repressor requires RecA protein. DNA damage creates sites of single-stranded DNA, which are quickly bound by RecA. DNA-bound RecA becomes a coprotease for LexA, and their association facilitates the destruction of LexA and induction of the SOS response.

The RecA protein provides a functional link between the biological signal (DNA damage) and activation of the SOS genes. Heavy DNA damage leads to numerous single-strand gaps in the DNA, and RecA binds tightly to single-stranded DNA. Only RecA that is bound to single-stranded DNA can facilitate cleavage of the LexA repressor (Figure 20-13b). Binding of RecA at the gaps eventually activates its coprotease activity, leading to cleavage of LexA and induction of the SOS response.

Active RecA, bound to single-stranded DNA, induces cleavage of LexA molecules, a system that works in part because LexA constantly cycles on and off the DNA. Eventually, all the LexA is proteolyzed and there is no intact LexA left to repress the SOS genes.

707

The SOS response is an example of how a single regulatory mechanism can coordinate the expression of related sets of genes. It also provides a remarkable illustration of evolutionary adaptation. During induction of the SOS response in a severely damaged bacterial cell, RecA also facilitates cleavage of repressors that allow the propagation of certain viruses present in the cell in a dormant, lysogenic state (a state described in Section 20.3). These repressors, like LexA, undergo self-cleavage at a specific Ala–Gly peptide bond. Induction of the SOS response permits replication of the virus and lysis of the cell, releasing new viral particles. Thus, bacteriophages have evolved to use the bacterial SOS system to their advantage, giving themselves the opportunity to make a hasty exit from a compromised bacterial host cell.

The bacterial SOS response is just one of the many ways in which cells control the expression of related genes. Another kind of mechanism involves the synthesis and detection of small molecules that can diffuse between cells in a process called quorum sensing (see this chapter’s Moment of Discovery and How We Know section). Understanding how some kinds of pathogenic bacteria use quorum sensing (and other regulatory mechanisms) to control the genes necessary for rapid growth in infected individuals could offer new avenues for therapeutic intervention.

SECTION 20.1 SUMMARY

  • Dissociation of a repressor from, or binding of an activator to, its target sequence to activate transcription can be triggered by a specific small molecule called an inducer. This was first elucidated in studies of the lac operon of E. coli. The Lac repressor dissociates from the Lac operator when the repressor binds its inducer, allolactose, activating expression of genes needed to metabolize lactose when this sugar is abundant.

  • Catabolite repression is a mechanism of positive gene regulation in bacteria in which the presence of a preferred carbon source, such as glucose, prevents the activation of operons encoding enzymes required for metabolizing secondary sugars, such as lactose and arabinose. When glucose is depleted, cAMP concentrations increase and, in turn, increase the amount of CRP-cAMP complex, which stimulates transcription of these operons.

  • When arabinose is present, CRP binds to the activator protein AraC, causing AraC to dimerize and bind to two DNA sites, activating the promoter of the ara operon. Alternative AraC-binding sites occupied by AraC in the absence of arabinose and CRP configure the promoter in an inactive state.

  • Bacterial operons that produce the enzymes of amino acid synthesis use transcription attenuation, a regulatory process that involves a transcription termination site in the mRNA. Formation of the terminator is modulated by a mechanism that couples transcription and translation while responding to small changes in amino acid concentration.

  • Many biosynthetic pathway operons, such as those encoding amino acid–synthesizing enzymes, are repressed by the end product of the pathway. In this way, amino acids inhibit their own production.

  • In the SOS response, multiple genes throughout the chromosome, repressed by a single repressor protein, LexA, are activated simultaneously when DNA damage triggers RecA protein–facilitated autocatalytic proteolysis of LexA.