21.2 COMBINATORIAL CONTROL OF GENE EXPRESSION

The expression of eukaryotic genes is modulated by combinations of transcription factors, and when some of these factors are common to the regulation of multiple genes, the regulation is called combinatorial control. We learned in Chapter 20 that different bacterial genes driving sugar metabolism use a common transcription activator, cAMP receptor protein (CRP). CRP is employed in regulation of the operons involved in the metabolism of lactose and galactose, as well as other sugars. This is an example of combinatorial control.

Eukaryotes make much more extensive use of combinatorial control than do bacteria. First of all, as we have seen, eukaryotes generally require many regulatory proteins at any given promoter, increasing the combinatorial possibilities severalfold. Indeed, analysis of genome sequences reveals the use of greater numbers of transcription factors as genome size and complexity increase. For example, yeast are thought to use about 300 transcription factors, Caenorhabditis elegans and D. melanogaster more than 1,000, and humans more than 3,000. Although the number of transcription factors increases with the number of genes, there are still many fewer factors than there are genes to be regulated. Somehow, different genes must use the same transcription factors, but in different ways, to achieve activation. Given the increasing complexity of promoter sequences in more complex genomes and the greater number of transcription factors, combinatorial control allows higher eukaryotes to achieve exquisite specificity in gene regulation.

We begin with the relatively simple combinatorial control system that regulates the yeast GAL genes, driving the metabolism of galactose. The mechanism behind galactose metabolism is one of the best-understood systems (Highlight 21-2). We then describe some increasingly complex mechanisms of combinatorial gene regulation.

Combinatorial Control of the Yeast GAL Genes Involves Positive and Negative Regulation

The enzymes required for importing and metabolizing galactose in yeast are encoded by GAL genes scattered over several chromosomes. Yeast cells have no operons like those in bacteria, and each of the GAL genes is transcribed separately. However, all the GAL genes have similar promoters and are regulated coordinately by a common set of proteins. The promoters for the GAL genes consist of the TATA box and an upstream activator sequence, which for each GAL gene is composed of one or more sequences denoted UASGAL. Each UASGAL site is recognized by a DNA-binding transactivator, the Gal4 protein (Gal4p). For example, the UAS of the gene GAL1 is 118 bp long and contains four Gal4p-binding sites of 17 bp each (Figure 21-8).

Figure 21-8: The GAL1 promoter. The promoters of the GAL genes of yeast each contain an upstream activator sequence (UAS), composed of one or more UASGAL sites. Each 17 bp UASGAL sequence is a binding site for the transcription activator Gal4p. The UAS of the GAL1 gene has four UASGAL sites.

Like the bacterial lac operon, the yeast GAL genes require more than just one protein (Gal4p) for activation. Control of gene expression by galactose depends on three proteins: the transcription activator Gal4p, the inhibitor Gal80p, and the ligand sensor Gal3p (Figure 21-9). Gal4p binds the 17mer UASGAL sites and, left to its own devices, would activate gene expression at GAL promoters. However, at low galactose concentrations, Gal80p binds to Gal4p and blocks its transcription-activating region. When galactose is present, it binds Gal3p; Gal3p also binds ATP, and the Gal3p-galactose-ATP complex then interacts with Gal80p. This interaction causes a conformational change that relieves the inhibition of Gal4p and allows it to function as a transactivator at GAL promoters.

Figure 21-9: Regulation of GAL genes by the proteins Gal3p, Gal4p, and Gal80p. (a) Gal4p binds UASGAL, but Gal80p binds Gal4p and prevents its activation of Pol II and the general transcription factors. (b) Galactose is a small-molecule effector for Gal3p, causing it to bind Gal80p and alter Gal80p conformation, which frees Gal4p to activate transcription.

737

Glucose is the preferred carbon source for yeast, as it is for bacteria. When glucose is present, most of the GAL genes are repressed—whether galactose is available or not. The GAL gene regulatory system described above is effectively overridden by a global catabolite repression system. Global repression is achieved by the protein Mig1, which binds near the GAL promoter. Repression of the GAL genes also requires Tup1, a corepressor that binds Mig1 (Figure 21-10). Mig1 is regulated in a way that is not possible in bacteria—namely, through intracellular localization, which is regulated by phosphorylation. In the absence of glucose, Mig1 is phosphorylated and cannot enter the nucleus. Relegated to the cytoplasmic compartment, it is unable to bind DNA and repress the GAL genes. But when glucose is present, phosphorylation of Mig1 is blocked and the protein enters the nucleus, where it can bind DNA and associate with Tup1. Tup1 represses GAL gene expression by blocking transcription initiation, and possibly also by stimulating histone deacetylation at neighboring nucleosomes.

Figure 21-10: Combinatorial control in global repression of yeast GAL genes. Expression levels of a GAL gene are shown under three different growth conditions, with (+) or without (−) glucose or galactose. (a) In the absence of glucose and galactose, Gal4p occupies UASGAL, but the GAL gene is repressed by Gal80p. (b) In the presence of galactose and absence of glucose, Gal4p activates transcription of the GAL gene because Gal80p repression is relieved by binding of Gal3p. (c) In the presence of both glucose and galactose, glucose is the preferred carbon source; there is no transcription of the GAL gene because the Mig1-Tup1 complex represses its expression below basal levels.

738

HIGHLIGHT 21-2 TECHNOLOGY: Discovering and Analyzing DNA-Binding Proteins

Regulatory DNA sequences, such as the binding site for Gal4p in yeast, can be identified by sequence comparisons of genes that code for proteins of the same metabolic pathway. The Gal4p-binding site was one of the first eukaryotic activator-binding sites to be recognized. Genetic studies identified several genes in the pathway of galactose metabolism in yeast. In the presence of galactose, expression of the GAL genes increases as much as 1,000-fold. Clones containing the regulated GAL genes were sequenced, and comparison of the regions upstream from the TATA boxes revealed a common sequence, designated UASGAL (Figure 1). The UASGAL sequence is a 17mer, CGG(N)11CCG, with a twofold axis of symmetry, indicating that the protein that binds it probably functions as a dimer. In vivo, mutation of UASGAL sequences upstream from the GAL genes eliminated the usual activation in response to galactose. In a reporter gene assay in which the UASGAL sequence was cloned into the upstream region of lacZ, β-galactosidase (the lacZ gene product) expression was induced by addition of galactose. Furthermore, expression levels of β-galactosidase depended on the number and sequence of UASGAL sites, confirming their importance in transcription activation (Figure 2).

FIGURE 1 A comparison of the upstream sequences of the yeast GAL genes showed that they have common sequences, the UASGAL sites, each 17 bp long (dark blue).
FIGURE 2 The function of UASGAL sequences was confirmed in reporter gene assays in which promoter activity was determined by the activity of β-galactosidase (produced by the bacterial lacZ gene). As shown in these five assays (1 is the wild-type), deletion or mutation of UASGAL elements, but not other areas close to the promoter, resulted in decreased promoter activity (β-galactosidase level).

We now know that the GAL genes are activated by the protein Gal4p, which recognizes UASGAL. Early experiments demonstrated that Gal4p binds UASGAL and functions as a transcription activator. Genetic studies revealed that a single gene, when mutated, results in loss of activation of all GAL genes. These results suggested that this single gene, GAL4, was a master regulator, much like the bacterial CRP protein. GAL4 was isolated by transforming a yeast genomic library into GAL4-mutant cells and selecting for colonies in which the GAL genes were again activated in the presence of galactose. GAL4 was then cloned into an E. coli expression vector, and Gal4p was purified (see Chapter 7 for these cloning methods).

The technique of deletion analysis revealed the modular architecture of Gal4p, a structure now known to be common among many bacterial and eukaryotic transcription activators. In deletion analysis, nucleases or restriction enzymes are used to selectively delete pieces of DNA from a specified gene. The truncated protein product of this gene can be purified and tested for activity in vitro, or tested for function in vivo using a reporter assay. Studies such as these were performed with deletion constructs of Gal4p. DNA binding of the truncated proteins was measured in vitro with electrophoretic mobility shift assays, and the ability of the truncated proteins to activate transcription was tested in vivo with a reporter gene assay. In the reporter assay, deletion constructs of GAL4 were transferred into GAL4-mutant yeast cells containing a plasmid with the bacterial lacZ reporter gene, driven by a typical GAL promoter with a UASGAL sequence (Figure 3a). The ability of each Gal4p-deletion construct to activate transcription of the lacZ gene was determined by measuring the activity of β-galactosidase (Figure 3b).

FIGURE 3 (a) The reporter gene construct used for deletion analysis of Gal4p. Only constructs with functional Gal4p will bind UASGAL and drive expression of the reporter gene (lacZ). (b) Deletion analysis of Gal4p. Two activities were measured: in vitro DNA binding (indicated by + or − in the first column on the right) and in vivo transcriptional activation of the reporter gene construct (second column). (c) In this model of the Gal4 protein, derived from the deletion analysis, Gal4p has separable DNA-binding and transcription-activation domains joined by a flexible linker. (d) The DNA-binding domain, expressed alone, will bind DNA but will not activate transcription.

The in vitro DNA-binding activity of Gal4p was destroyed by a small deletion at the protein’s N-terminus, but was not affected by small or large C-terminal deletions. Only the N-terminal 74 amino acid residues were needed for DNA-binding activity. Transcriptional activation required the DNA-binding region, as one would expect. Deletion of 60 residues from the C-terminus of Gal4p had little effect on gene activation. But deletion of 126 C-terminal residues reduced activation substantially, and a 191 residue C-terminal deletion completely eliminated activation. A large segment between these N- and C-terminal regions could be deleted without interfering with either activity.

The findings suggested that the two activities inherent in Gal4p require 260 or fewer residues: 74 at the N-terminus and 191 at the C-terminus. This result was surprising, given that the entire Gal4 protein is 881 amino acids long. To confirm the result, the researchers spliced together DNA for the 74 residue N-terminal DNA-binding domain and various lengths of the C-terminal transcription-activation region. They found that a 217 residue protein, missing 664 amino acids between the two regions, restored full activity in both DNA-binding and transcription-activation assays.

Clearly, the ability of Gal4p to activate transcription is the result of two distinct and separable domains. Similar results were obtained with other transcription activators from several different eukaryotes. Furthermore, examination of some transcription activators showed that the region between the two functional domains is highly sensitive to proteases, suggesting that the two domains are linked by sections of polypeptide that are open and flexible. These experiments gave rise to a model for some transcription activators, with two functional domains joined by a flexible linker (Figure 3c, d). The flexible region may help loosen the geometric constraints imposed by the DNA loop that forms between the transcription activator at an upstream binding site and the proteins it binds at the distant promoter. That the DNA-binding and transcription-activation domains of regulatory proteins can act independently has been demonstrated by “domain-swapping” experiments (see Figure 19-23).

Combinatorial Control of Transcription Causes Mating-Type Switches in Yeast

Saccharomyces cerevisiae (baker’s yeast) can grow as either diploid or haploid cells, both of which reproduce by mitosis (see the Model Organisms Appendix). The diploid cells contain two copies of each of the four yeast chromosomes, and haploid cells contain one copy of each. When stressed by starvation, diploid cells can undergo meiosis to produce four haploid spores, two each of the mating types a and α. Haploid cells of the a mating type (a cells) can mate only with α haploids (α cells), and vice versa; thus, haploid cells display a simple sexual differentiation that is readily distinguishable when tested for mating ability.

Mating type is determined by the allele present at a single genetic locus, MAT. The identity of the allele at the MAT locus can switch as often as every cell division cycle. The mating-type switch occurs through site-specific recombination (see Chapter 14), to express either the MATa allele or the MATα allele. The MATa allele encodes the a1 protein, which directs transcription of a-specific genes, and the MATα allele encodes the α1 and α2 proteins, which stimulate transcription of α-specific genes (Figure 21-11). After mating, the resulting diploid cells contain two MAT loci, one with the MATa allele and the other with the MATα allele; the presence of both the a and α gene products directs the diploid-specific transcriptional program, and haploid-specific gene expression is turned off.

Figure 21-11: Combinatorial control of the yeast mating-type switch. In all S. cerevisiae cells, haploid and diploid, Mcm1 is expressed and is used in combinatorial control. (a) The haploid a cell expresses protein a1, but this protein is used only in diploid cells. Mcm1 alone turns on a-specific genes (aSG). Other haploid-specific genes (hSG) are also expressed. (b) The haploid α cell expresses the α1 activator and α2 repressor; α2 associates with Mcm1 to turn off a-specific genes, and α1 binds Mcm1 to turn on α-specific genes (αSG). Other haploid-specific genes are expressed. (c) Diploid cells express both a1 and α2. Each, in conjunction with Mcm1, represses transcription of a set of genes: a1-Mcm1 represses haploid-specific genes, and α2-Mcm1 represses a-specific genes. Because α1 is not expressed, α-specific genes are also not expressed.

739

The transcriptional activation and repression of genes in each mating type is an example of combinatorial control, because control is achieved by combinations of regulators, at least one of which is common to the different cell types. In addition to the presence or absence of the a1, α1, and α2 proteins, specific activation and repression also involves Mcm1, expressed by both haploid cell types, as well as by diploid cells. In a cells, Mcm1 binds the promoters of a-specific genes and activates transcription. The genes specific to α cells are turned off in a cells, because the α1 activator is not present (see Figure 21-11a). In α cells, Mcm1 and α1 interact to activate α-specific gene transcription, while α2 (in association with Mcm1) represses transcription of a-specific genes (see Figure 21-11b).

740

There are also genes specific to both haploid states, but on mating to produce a diploid cell, the haploid-specific genes are turned off. Repression of genes specific to all haploid cells is made possible by the interactions of a1 and α2 repressor proteins, which are always expressed together in diploid cells (see Figure 21-11c).

Combinatorial Mixtures of Heterodimers Regulate Transcription

Like their bacterial counterparts, most eukaryotic transcription factors bind to DNA as homodimers. However, several types of eukaryotic transcription factors can form heterodimers of two different members of a family of similarly structured proteins, creating a larger number of functional transcription factors from a smaller number of individual proteins. For example, three possible dimers can form from just two similarly structured proteins: two homodimers and one heterodimer; a hypothetical family of four different but structurally related proteins could form up to 10 different dimeric species (Figure 21-12).

Figure 21-12: Combinatorial control by heterodimer formation. (a) Two regulatory proteins that form homodimers and a heterodimer could form 3 different structures, which could bind 3 different regulatory sites. (b) Four proteins have the potential to form 10 different structures and bind 10 regulatory sites. The possible combinations increase dramatically as the number of potential dimerization partners increases.

An example of proteins that behave in this fashion is the mammalian AP-1 transcription activators. AP-1 activators can be either homodimers or heterodimers, formed from subunits that belong to the family of proteins that includes Fos, Jun, and ATF. Gene regulation by AP-1 homodimers and heterodimers occurs in response to a variety of external stimuli, including growth factors, cytokines, and factors involved in stress and infection. Thus, AP-1 transcription factors control such important processes as cell proliferation, differentiation, and programmed cell death. Indeed, some members of the Fos and Jun protein family are encoded by proto-oncogenes, which are genes that promote tumor formation when overexpressed. In other words, alterations in one or more of the subunits that make up AP-1 can be fatal for the cell, or even the entire organism.

The protein-dimerization and DNA-binding regions of AP-1 family members are of the basic leucine zipper type. The crystal structure of the dimerization and DNA-binding portions of a Fos-Jun heterodimer bound to DNA is shown in Figure 21-13a. AP-1 dimers activate genes containing an AP-1–binding site. AP-1 variants bind to AP-1–binding sites with different affinities and activate gene transcription to different extents, depending on the composition of that AP-1. Figure 21-13b shows the result of an electrophoretic mobility shift assay that examined DNA-binding affinity of Jun-Jun or Fos-Fos homodimers, as well as an AP-1 Fos-Jun heterodimer. A short DNA fragment containing an AP-1–binding site was end-labeled with 32P, then mixed with Fos, Jun, or the Fos-Jun heterodimer. The experiment also examined the binding affinity of a subfragment of Fos (FosC) that contains the DNA-binding and dimerization elements. The resulting gel shows that Fos, FosC, and Jun do not bind appreciably to the AP-1–binding site on their own. However, the Fos-Jun and FosC-Jun heterodimers bind the AP-1 site much more tightly, such that they could be detected in this experiment.

Figure 21-13: AP-1 transcription factors. (a) Structural model of the AP-1 heterodimer of Fos (purple) and Jun (green), bound to DNA. (b) Gel from an electrophoretic mobility shift assay using a 32P-end-labeled DNA fragment containing the AP-1–binding site sequence. The DNA was mixed with Fos, FosC (a fragment of Fos), or Jun (all of which would form a homodimer), or with a mixture of Fos and Jun or a mixture of FosC and Jun (both of which would form the two types of homodimer and the heterodimer). Reactions were analyzed by polyacrylamide gel electrophoresis, then autoradiography. Binding of protein dimers to the DNA causes the complex to migrate more slowly through the gel, resulting in distinct bands. The radioactive signal at the bottom of the gel is unbound DNA.

741

This differential DNA binding, depending on the composition of the AP-1 transcriptional control complex, is another example of combinatorial control. Although many AP-1 variants contain transcription-activation domains, some lack them and instead function as transcription inhibitors. Thus, the effect of AP-1 can be varied by changing its composition, depending on the needs of the cell.

Differentiation Requires Extensive Use of Combinatorial Control

A more complicated example of combinatorial control can be seen in body plan development in the fruit fly, D. melanogaster. Before it is released to become fertilized, the developing oocyte is surrounded by cells called nurse cells. The nurse cells secrete mRNAs encoding various transcription factors into the egg at specific locations, establishing concentration gradients of mRNA for the different transcription factors within the egg. During early embryonic development the nuclei divide quickly, producing 3,000 to 6,000 nuclei before plasma membranes form to delineate individual cells. When plasma membranes do form, the newly formed cells trap the specific mRNAs present at that particular position in the embryo. Each new cell thus produces a unique complement of transcription factors that act in a combinatorial fashion to express different proteins in the early embryo.

742

An example of combinatorial control by these unevenly distributed transcription factors is regulation of the eve gene, which produces a protein called even-skipped. Even-skipped is expressed only in specific cells of the embryo, generating a pattern of seven stripes that can be visualized using a fluorescent antibody to even-skipped (Figure 21-14a). The eve gene is essential to development; the even-skipped product is a transcription activator that promotes further differentiation in the cells where it is expressed.

Figure 21-14: Combinatorial control of eve gene expression in fruit fly development. (a) A Drosophila embryo stained with fluorescent antibodies that recognize the protein even-skipped (product of eve), showing its striped pattern of expression. (b) The graphs represent the relative levels and positions along the length of the embryo of even-skipped (top) and four transcription factors that regulate its expression (bottom). Specific combinations of transcription factors activate the eve gene.

Expression of eve is controlled by the concentrations of four proteins translated from the original mRNAs deposited in the developing oocyte by the nurse cells. Two of these four proteins, Bicoid and Hunchback, are activators; the other two, Giant and Krüppel, are repressors. Different gradients of the mRNAs for these activators and repressors, established by the nurse cells, result in unique ratios of the four regulatory proteins in nearly every cell of the embryo. Expression of even-skipped occurs only in cells that have the proper ratio of the four proteins to activate eve (Figure 21-14b). But if eve were activated by only one particular ratio of protein concentrations, eve would be expressed in only one place in the embryo. How can the eve gene be expressed in seven different stripes? The striped pattern of eve expression is made possible by combinatorial control.

The eve gene has five different enhancers, each with a complex array of binding sites for transcription activators and repressors (Figure 21-15a). Only one enhancer needs to be active for eve to be expressed in a given cell. But if eve is to be expressed normally, all five enhancers need to be active (albeit in different cells). Each enhancer is activated by a different combination of transcription factors. Some activator and repressor sites overlap and are controlled by competition, whereas some repressor sites are distinct from activator sites and repress the gene at a distance (Figure 21-15b). Seven stripes of even-skipped expression, each four cells wide, are formed because the local concentration of each activator and repressor is just right for activation of one of the five enhancers in particular cells along the length of the embryo. The same four transcription factors are used by the five different enhancers in different ways. Expression of the eve gene is an example of exquisite, complex combinatorial control.

Figure 21-15: Five independently acting enhancers of the eve gene producing seven stripes of eve expression in the early embryo. (a) The eve gene and its upstream and downstream enhancers, any one of which can activate eve expression if bound by the correct combination of transcription factors. Numbers 1 through 7 indicate the stripe(s) activated by each enhancer (see Figure 21-14). (b) The binding sites in the stripe 2 enhancer for the Bicoid and Hunchback activators and the Krüppel and Giant repressors. (c) Changes in concentration of the four transcription factors along the length of the embryo in the region that expresses eve stripe 2.

743

Michael Levine

The enhancer that activates eve expression in stripe 2 has been extensively studied in Michael Levine’s laboratory. This enhancer is 500 bp long and contains binding sites for both repressors and activators (see Figure 21-15b). Both activators, Bicoid and Hunchback, must bind to their sites for gene expression to occur. Some binding sites for these activators overlap repressor-binding sites, and other activators bind DNA but are inactivated by repressors that bind within about 100 bp of the activators’ binding sites. Increasing the distance between activator- and repressor-binding sites of this type prevents repressor function. Although the mechanism of repression is unclear, it might occur through covalent modification of the activator. The region of the embryo that expresses eve in stripe 2 is largely deficient in both repressors, yet contains both activators (Figure 21-15c). Hence, the particular cells that express eve in stripe 2 do so because this is the only location in the embryo where the condition for activator binding in the absence of repressors is met. In all other cells, the stripe 2 enhancer is silent. Combinatorial control also governs formation of the other stripes expressing eve. The other eve enhancers contain different combinations and arrangements of the repressor- and activator-binding sites, such that each enhancer is active in only a narrow region of the embryo.

These examples of combinatorial control of transcription illustrate a central mechanism by which eukaryotic cells govern gene expression. Through the use of a relatively small number of regulatory proteins in each case, many different genes can be regulated either in concert or differentially, depending on the immediate needs of the cell. In this way, cells can respond quickly and appropriately to changing environmental conditions or to developmental requirements, within the context of a tissue or an entire organism.

SECTION 21.2 SUMMARY

  • Eukaryotic transcription activators such as Gal4p have DNA-binding and transcription-activation domains.

  • Eukaryotes make greater use of combinatorial control of gene expression than do bacteria. In combinatorial control, the same transcription factor is used in the regulation of more than one gene.

  • Combinatorial control can be achieved in a variety of ways. Some transcription factors are formed from combinations of two different subunits that form heterodimers, each of which has different strengths as an activator. Or a gene has several enhancers, each of which uses a different combination of transcription factors.

  • Mating-type switching in yeast is a classic example of combinatorial control. Unique sets of genes are expressed specifically in the a and α haploid states, due to activation by specific regulatory factors. On mating to produce a diploid cell, the haploid-specific genes are repressed by the interactions of a1 and α2 repressor proteins, which are expressed together only in diploid cells.

  • Body plan organization in D. melanogaster uses gradients of mRNAs for different transcription factors in the developing embryo. Different concentrations of transcription activators and repressors control where the gene eve is activated, producing seven stripes that influence cell differentiation.