31.2 Prokaryotic DNA-Binding Proteins Bind Specifically to Regulatory Sites in Operons

A historically important example reveals many common principles of gene regulation by DNA-binding proteins. Bacteria such as E. coli usually rely on glucose as their source of carbon and energy, even when other sugars are available. However, when glucose is scarce, E. coli can use lactose as their carbon source, even though this disaccharide does not lie on any major metabolic pathways. An essential enzyme in the metabolism of lactose is β-galactosidase, which hydrolyzes lactose into galactose and glucose. These products are then metabolized by pathways discussed in Chapter 16.

928

Figure 31.6: β-Galactosidase induction. The addition of lactose to an E. coli culture causes the production of β-galactosidase to increase from very low amounts to much larger amounts. The increase in the amount of enzyme parallels the increase in the number of cells in the growing culture. β-Galactosidase constitutes 6.6% of the total protein synthesized in the presence of lactose.

This reaction can be conveniently followed in the laboratory through the use of alternative galactoside substrates that form colored products such as X-Gal (Figure 31.5). An E. coli cell growing on a carbon source such as glucose or glycerol contains fewer than 10 molecules of β-galactosidase. In contrast, the same cell will contain several thousand molecules of the enzyme when grown on lactose (Figure 31.6). The presence of lactose in the culture medium induces a large increase in the amount of β-galactosidase by eliciting the synthesis of new enzyme molecules rather than by activating a preexisting but inactive precursor.

Figure 31.5: Monitoring the β-galactosidase reaction. The galactoside substrate X-Gal produces a colored product on cleavage by β-galactosidase. The appearance of this colored product provides a convenient means for monitoring the amount of the enzyme both in vitro and in vivo.

A crucial clue to the mechanism of gene regulation was the observation that two other proteins are synthesized in concert with β-galactosidase—namely, galactoside permease and thiogalactoside transacetylase. The permease is required for the transport of lactose across the bacterial cell membrane (Section 13.3). The transacetylase is not essential for lactose metabolism but appears to play a role in the detoxification of compounds that also may be transported by the permease. Thus, the expression levels of a set of enzymes that all contribute to the adaptation to a given change in the environment change together. Such a coordinated unit of gene expression is called an operon.

An operon consists of regulatory elements and protein-encoding genes

The parallel regulation of β-galactosidase, the permease, and the transacetylase suggested that the expression of genes encoding these enzymes is controlled by a common mechanism. François Jacob and Jacques Monod proposed the operon model to account for this parallel regulation as well as the results of other genetic experiments. The genetic elements of the model are a regulator gene that encodes a regulatory protein, a regulatory DNA sequence called an operator site, and a set of structural genes (Figure 31.7).

Figure 31.7: Operons. (A) The general structure of an operon as conceived by Jacob and Monod. (B) The structure of the lactose operon. In addition to the promoter, p, in the operon, a second promoter is present in front of the regulator gene, i, to drive the synthesis of the regulator.

The regulator gene encodes a repressor protein that binds to the operator site. The binding of the repressor to the operator prevents transcription of the structural genes. The operator and its associated structural genes constitute the operon. For the lactose (lac) operon, the i gene encodes the repressor, o is the operator site, and the z, y, and a genes are the structural genes for β-galactosidase, the permease, and the transacetylase, respectively. The operon also contains a promoter site (denoted by p), which directs the RNA polymerase to the correct transcription initiation site. The z, y, and a genes are transcribed to give a single mRNA molecule that encodes all three proteins. An mRNA molecule encoding more than one protein is known as a polygenic or polycistronic transcript.

929

The lac repressor protein in the absence of lactose binds to the operator and blocks transcription

In the absence of lactose, the lactose operon is repressed. How does the lac repressor mediate this repression? The lac repressor exists as a tetramer of 37-kDa subunits with two pairs of subunits coming together to form the DNA-binding unit previously discussed. In the absence of lactose, the repressor binds very tightly and rapidly to the operator. When the lac repressor is bound to DNA, the repressor prevents RNA polymerase from transcribing the protein-coding genes inasmuch as the operator site is directly adjacent to and downstream of the promoter site where the repressor would block the progress of RNA polymerase.

Figure 31.8: Structure of the lac repressor. A lac repressor dimer is shown bound to DNA. Notice that the amino-terminal domain binds to DNA, whereas the carboxyl-terminal domain forms a separate structure. A part of the structure that mediates the formation of lac repressor tetramers is not shown.
[Drawn from 1EFA.pdb.]

How does the lac repressor locate the operator site in the E. coli chromosome? The lac repressor binds 4 × 106 times as strongly to operator DNA as it does to random sites in the genome. This high degree of selectivity allows the repressor to find the operator efficiently even with a large excess (4.6 × 106) of other sites within the E. coli genome. The dissociation constant for the repressor–operator complex is approximately 0.1 pM (10−13 M). The rate constant for association (≈ 1010 M−1 s−1) is strikingly high, indicating that the repressor finds the operator primarily by diffusing along a DNA molecule (a one-dimensional search) rather than encountering it from the aqueous medium (a three-dimensional search). This diffusion has been confirmed by studies that monitored the behavior of fluorescently labeled single molecules of lac repressor inside living E. coli cells.

Inspection of the complete E. coli genome sequence reveals two sites within 500 bp of the primary operator site that approximate the sequence of the operator. When one dimeric DNA-binding unit binds to the operator site, the other DNA-binding unit of the lac repressor tetramer can bind to one of these sites with similar sequences. The DNA between the two bound sites forms a loop. No other sites that closely match the sequence of the lac operator site are present in the rest of the E. coli genome sequence. Thus, the DNA-binding specificity of the lac repressor is sufficient to specify a nearly unique site within the E. coli genome.

The three-dimensional structure of the lac repressor has been determined in various forms. Each monomer consists of a small amino-terminal domain that binds DNA and a larger domain that mediates the formation of the dimeric DNA-binding unit and the tetramer (Figure 31.8). A pair of the amino-terminal domains come together to form the functional DNA-binding unit. Each monomer has a helix-turn-helix unit that interacts with the major groove of the bound DNA.

930

Ligand binding can induce structural changes in regulatory proteins

In the situation just described, glucose is present and lactose is absent, and the lac operon is repressed. How does the presence of lactose trigger the relief of this repression and, hence, the expression of the lac operon? Interestingly, lactose itself does not have this effect; rather, allolactose, a combination of galactose and glucose with an α-1,6 rather than an α-1,4 linkage, does. Allolactose is thus referred to as the inducer of the lac operon. Allolactose is a side product of the β-galactosidase reaction and is produced at low levels by the few molecules of β-galactosidase that are present before induction. Some other β-galactosides such as isopropylthiogalactoside (IPTG) are potent inducers of β-galactosidase expression, although they are not substrates of the enzyme. IPTG is useful in the laboratory as a tool for inducing gene expression in engineered bacterial strains.

The inducer triggers gene expression by preventing the lac repressor from binding the operator. The inducer binds to the lac repressor and thereby greatly reduces the repressors affinity for operator DNA. An inducer molecule binds in the center of the large domain within each monomer. This binding leads to conformational changes that modify the relation between the two small DNA-binding domains (Figure 31.9). These domains can no longer easily contact DNA simultaneously, leading to a dramatic reduction in DNA-binding affinity.

Figure 31.9: Effects of IPTG on lac repressor structure. The structure of the lac repressor bound to the inducer isopropylthiogalactoside (IPTG), shown in orange, is superimposed on the structure of the lac repressor bound to DNA, shown in purple. Notice that the binding of IPTG induces structural changes that alter the relation between the two DNA-binding domains so that they cannot interact effectively with DNA. The DNA-binding domains of the lac repressor bound to IPTG are not shown, because these regions are not well ordered in the crystals studied.

Let us recapitulate the processes that regulate gene expression in the lactose operon (Figure 31.10). In the absence of inducer, the lac repressor is bound to DNA in a manner that blocks RNA polymerase from transcribing the z, y, and a genes. Thus, very little β-galactosidase, permease, or transacetylase are produced. The addition of lactose to the environment leads to the formation of allolactose. This inducer binds to the lac repressor, leading to conformational changes and the release of DNA by the lac repressor. With the operator site unoccupied, RNA polymerase can then transcribe the other lac genes and the bacterium will produce the proteins necessary for the efficient use of lactose.

Figure 31.10: Induction of the lac operon. (A) In the absence of lactose, the lac repressor binds DNA and represses transcription from the lac operon. (B) Allolactose or another inducer binds to the lac repressor, leading to its dissociation from DNA and to the production of lac mRNA.

The structure of the large domain of the lac repressor is similar to those of a large class of proteins that are present in E. coli and other bacteria. This family of homologous proteins binds ligands such as sugars and amino acids at their centers. Remarkably, domains of this family are utilized by eukaryotes in taste proteins and in neurotransmitter receptors, as will be discussed in Chapter 33.

The operon is a common regulatory unit in prokaryotes

Figure 31.11: Binding-site distributions. The E. coli genome contains only a single region that closely matches the sequence of the lac operator (shown in blue). In contrast, 20 sites match the sequence of the pur operator (shown in red). Thus, the pur repressor regulates the expression of many more genes than does the lac repressor.

Many other gene-regulatory networks function in ways analogous to those of the lac operon. For example, genes taking part in purine and, to a lesser degree, pyrimidine biosynthesis are repressed by the pur repressor. This dimeric protein is 31% identical in sequence with the lac repressor and has a similar three-dimensional structure. However, the behavior of the pur repressor is opposite that of the lac repressor: whereas the lac repressor is released from DNA by binding to a small molecule, the pur repressor binds DNA specifically, blocking transcription, only when bound to a small molecule. Such a small molecule is called a corepressor. For the pur repressor, the corepressor can be either guanine or hypoxanthine. The dimeric pur repressor binds to inverted-repeat DNA sites of the form 5′-ANGCAANCGNTTNCNT-3′, in which the bases shown in boldface type are particularly important. Examination of the E. coli genome sequence reveals the presence of more than 20 such sites, regulating 19 operons and including more than 25 genes (Figure 31.11).

931

Because the DNA-binding sites for these regulatory proteins are short, it is likely that they evolved independently and are not related by divergence from an ancestral regulatory site. Once a ligand-regulated DNA-binding protein is present in a cell, binding sites for the protein may arise by mutation adjacent to additional genes. Binding sites for the pur repressor have evolved in the regulatory regions of a wide range of genes taking part in nucleotide biosynthesis. All such genes can then be regulated in a concerted manner.

The organization of prokaryotic genes into operons is useful for the analysis of completed genome sequences. Sometimes a gene of unknown function is discovered to be part of an operon containing well-characterized genes. Such associations can provide powerful clues to the biochemical and physiological functions of the uncharacterized gene.

Transcription can be stimulated by proteins that contact RNA polymerase

All the DNA-binding proteins discussed thus far function by inhibiting transcription until some environmental condition, such as the presence of lactose, is met. There are also DNA-binding proteins that stimulate transcription. One particularly well studied example is a protein in E. coli that stimulates the expression of catabolic enzymes.

E. coli grown on glucose, a preferred energy source, have very low levels of catabolic enzymes for metabolizing other sugars. Clearly, the synthesis of these enzymes when glucose is abundant would be wasteful. Glucose has an inhibitory effect on the genes encoding these enzymes, an effect called catabolite repression. It is due to the fact that glucose lowers the concentration of cyclic AMP in E. coli. When the cAMP concentration is high, it stimulates the concerted transcription of many catabolic enzymes by acting through a protein called the catabolite activator protein (CAP), which is also known as the cAMP receptor protein (CRP).

Figure 31.12: Binding site for catabolite activator protein (CAP). This protein binds as a dimer to an inverted repeat that is at the position −61 relative to the start site of transcription. The CAP-binding site on DNA is adjacent to the position at which RNA polymerase binds.

When bound to cAMP, CAP stimulates the transcription of lactose- and arabinose-catabolizing genes. CAP is a sequence-specific DNA-binding protein. Within the lac operon, CAP binds to an inverted repeat that is centered near position −61 relative to the start site for transcription (Figure 31.12). This site is approximately 70 base pairs from the operator site. As expected from the symmetry of the binding site, CAP functions as a dimer of identical subunits.

The CAP–cAMP complex stimulates the initiation of transcription by approximately a factor of 50. Energetically favorable contacts between CAP and RNA polymerase increase the likelihood that transcription will be initiated at sites to which the CAP–cAMP complex is bound (Figure 31.13). Thus, in regard to the lac operon, gene expression is maximal when the binding of allolactose relieves the inhibition by the lac repressor and the CAP–cAMP complex stimulates the binding of RNA polymerase.

Figure 31.13: Structure of a dimer of CAP bound to DNA. The residues shown in yellow in each CAP monomer have been implicated in direct interactions with RNA polymerase.
[Drawn from 1RUN.pdb.]

932

The E. coli genome contains many CAP-binding sites in positions appropriate for interactions with RNA polymerase. Thus, an increase in the cAMP level inside an E. coli bacterium results in the formation of CAP–cAMP complexes that bind to many promoters and stimulate the transcription of genes encoding a variety of catabolic enzymes.