2.6 The Amino Acid Sequence of a Protein Determines Its Three-Dimensional Structure

How is the elaborate three-dimensional structure of proteins attained? The classic work of Christian Anfinsen in the 1950s on the enzyme ribonuclease revealed the relation between the amino acid sequence of a protein and its conformation. Ribonuclease is a single polypeptide chain consisting of 124 amino acid residues cross-linked by four disulfide bonds (Figure 2.51). Anfinsen’s plan was to destroy the three-dimensional structure of the enzyme and to then determine what conditions were required to restore the structure.

Figure 2.51: Amino acid sequence of bovine ribonuclease. The four disulfide bonds are shown in color.
[After C. H. W. Hirs, S. Moore, and W. H. Stein, J. Biol. Chem. 235:633–647, 1960.]

Agents such as urea or guanidinium chloride effectively disrupt a protein’s noncovalent bonds. Although the mechanism of action of these agents is not fully understood, computer simulations suggest that they replace water as the molecule solvating the protein and are then able to disrupt the van der Waals interactions stabilizing the protein structure. The disulfide bonds can be cleaved reversibly by reducing them with a reagent such as β-mercaptoethanol (Figure 2.52). In the presence of a large excess of β-mercaptoethanol, the disulfides (cystines) are fully converted into sulfhydryls (cysteines).

Figure 2.52: Role of β-mercaptoethanol in reducing disulfide bonds. Note that, as the disulfides are reduced, the β-mercaptoethanol is oxidized and forms dimers.

50

Most polypeptide chains devoid of cross-links assume a random-coil conformation in 8 M urea or 6 M guanidinium chloride. When ribonuclease was treated with β-mercaptoethanol in 8 M urea, the product was a fully reduced, randomly coiled polypeptide chain devoid of enzymatic activity. When a protein is converted into a randomly coiled peptide without its normal activity, it is said to be denatured (Figure 2.53).

Figure 2.53: Reduction and denaturation of ribonuclease.
Figure 2.54: Reestablishing correct disulfide pairing. Native ribonuclease can be re-formed from scrambled ribonuclease in the presence of a trace of β-mercaptoethanol.

Anfinsen then made the critical observation that the denatured ribonuclease, freed of urea and β-mercaptoethanol by dialysis (Section 3.1), slowly regained enzymatic activity. He perceived the significance of this chance finding: the sulfhydryl groups of the denatured enzyme became oxidized by air, and the enzyme spontaneously refolded into a catalytically active form. Detailed studies then showed that nearly all the original enzymatic activity was regained if the sulfhydryl groups were oxidized under suitable conditions. All the measured physical and chemical properties of the refolded enzyme were virtually identical with those of the native enzyme. These experiments showed that the information needed to specify the catalytically active structure of ribonuclease is contained in its amino acid sequence. Subsequent studies have established the generality of this central principle of biochemistry: sequence specifies conformation. The dependence of conformation on sequence is especially significant because of the intimate connection between conformation and function.

A quite different result was obtained when reduced ribonuclease was reoxidized while it was still in 8 M urea and the preparation was then dialyzed to remove the urea. Ribonuclease reoxidized in this way had only 1% of the enzymatic activity of the native protein. Why were the outcomes so different when reduced ribonuclease was reoxidized in the presence and absence of urea? The reason is that the wrong disulfides formed pairs in urea. There are 105 different ways of pairing eight cysteine molecules to form four disulfides; only one of these combinations is enzymatically active. The 104 wrong pairings have been picturesquely termed “scrambled” ribonuclease. Anfinsen found that scrambled ribonuclease spontaneously converted into fully active, native ribonuclease when trace amounts of β-mercaptoethanol were added to an aqueous solution of the protein (Figure 2.54). The added β-mercaptoethanol catalyzed the rearrangement of disulfide pairings until the native structure was regained in about 10 hours. This process was driven by the decrease in free energy as the scrambled conformations were converted into the stable, native conformation of the enzyme. The native disulfide pairings of ribonuclease thus contribute to the stabilization of the thermodynamically preferred structure.

Similar refolding experiments have been performed on many other proteins. In many cases, the native structure can be generated under suitable conditions. For other proteins, however, refolding does not proceed efficiently. In these cases, the unfolded protein molecules usually become tangled up with one another to form aggregates. Inside cells, proteins called chaperones block such undesirable interactions. Additionally, it is now evident that some proteins do not assume a defined structure until they interact with molecular partners, as we will see shortly.

51

Amino acids have different propensities for forming α helices, β sheets, and turns

How does the amino acid sequence of a protein specify its three-dimensional structure? How does an unfolded polypeptide chain acquire the form of the native protein? These fundamental questions in biochemistry can be approached by first asking a simpler one: What determines whether a particular sequence in a protein forms an α helix, a β strand, or a turn? One source of insight is to examine the frequency of occurrence of particular amino acid residues in these secondary structures (Table 2.3). Residues such as alanine, glutamate, and leucine tend to be present in α helices, whereas valine and isoleucine tend to be present in β strands. Glycine, asparagine, and proline are more commonly observed in turns.

Amino acid

α helix

β sheet

Reverse turn

Glu

1.59

0.52

1.01

Ala

1.41

0.72

0.82

Leu

1.34

1.22

0.57

Met

1.30

1.14

0.52

Gln

1.27

0.98

0.84

Lys

1.23

0.69

1.07

Arg

1.21

0.84

0.90

His

1.05

0.80

0.81

Val

0.90

1.87

0.41

Ile

1.09

1.67

0.47

Tyr

0.74

1.45

0.76

Cys

0.66

1.40

0.54

Trp

1.02

1.35

0.65

Phe

1.16

1.33

0.59

Thr

0.76

1.17

0.96

Gly

0.43

0.58

1.77

Asn

0.76

0.48

1.34

Pro

0.34

0.31

1.32

Ser

0.57

0.96

1.22

Asp

0.99

0.39

1.24

Note: The amino acids are grouped according to their preference for α helices (top group), β sheets (middle group), or turns (bottom group).

Source: T. E. Creighton, Proteins: Structures and Molecular Properties, 2d ed. (W. H. Freeman and Company, 1992), p. 256.

Table 2.3: Relative frequencies of amino acid residues in secondary structures
Figure 2.55: Alternative conformations of a peptide sequence. Many sequences can adopt alternative conformations in different proteins. Here the sequence VDLLKN shown in red assumes an α helix in one protein context (left) and a β strand in another (right).
[Drawn from (left) WRP.pdb and (right) 2HLA.pdb.]

Studies of proteins and synthetic peptides have revealed some reasons for these preferences. Branching at the β-carbon atom, as in valine, threonine, and isoleucine, tends to destabilize α helices because of steric clashes. These residues are readily accommodated in β strands, where their side chains project out of the plane containing the main chain. Serine and asparagine tend to disrupt α helices because their side chains contain hydrogen-bond donors or acceptors in close proximity to the main chain, where they compete for main-chain NH and CO groups. Proline tends to disrupt both α helices and β strands because it lacks an NH group and because its ring structure restricts its ϕ value to near 60 degrees. Glycine readily fits into all structures, but its conformational flexibility renders it well-suited to reverse turns.

Can we predict the secondary structure of a protein by using this knowledge of the conformational preferences of amino acid residues? Accurate predictions of secondary structure adopted by even a short stretch of residues have proved to be difficult. What stands in the way of more-accurate prediction? Note that the conformational preferences of amino acid residues are not tipped all the way to one structure (Table 2.3). For example, glutamate, one of the strongest helix formers, prefers α helix to β strand by only a factor of three. The preference ratios of most other residues are smaller. Indeed, some penta- and hexapeptide sequences have been found to adopt one structure in one protein and an entirely different structure in another (Figure 2.55). Hence, some amino acid sequences do not uniquely determine secondary structure. Tertiary interactions— interactions between residues that are far apart in the sequence—may be decisive in specifying the secondary structure of some segments. Context is often crucial: the conformation of a protein has evolved to work in a particular environment. Nevertheless, substantial improvements in secondary structure prediction have been achieved by using families of related sequences, each of which adopts the same structure.

52

Protein folding is a highly cooperative process

Figure 2.56: Transition from folded to unfolded state. Most proteins show a sharp transition from the folded to the unfolded form on treatment with increasing concentrations of denaturants.

Proteins can be denatured by any treatment that disrupts the weak bonds stabilizing tertiary structure, such as heating, or by chemical denaturants such as urea or guanidinium chloride. For many proteins, a comparison of the degree of unfolding as the concentration of denaturant increases reveals a sharp transition from the folded, or native, form to the unfolded, or denatured form, suggesting that only these two conformational states are present to any significant extent (Figure 2.56). A similar sharp transition is observed if denaturants are removed from unfolded proteins, allowing the proteins to fold.

The sharp transition seen in Figure 2.56 suggests that protein folding and unfolding is an “all or none” process that results from a cooperative transition. For example, suppose that a protein is placed in conditions under which some part of the protein structure is thermodynamically unstable. As this part of the folded structure is disrupted, the interactions between it and the remainder of the protein will be lost. The loss of these interactions, in turn, will destabilize the remainder of the structure. Thus, conditions that lead to the disruption of any part of a protein structure are likely to unravel the protein completely. The structural properties of proteins provide a clear rationale for the cooperative transition.

The consequences of cooperative folding can be illustrated by considering the contents of a protein solution under conditions corresponding to the middle of the transition between the folded and the unfolded forms. Under these conditions, the protein is “half folded.” Yet the solution will appear to have no partly folded molecules but, instead, look as if it is a 50/50 mixture of fully folded and fully unfolded molecules (Figure 2.57). Although the protein may appear to behave as if it exists in only two states, this simple two-state existence is an impossibility at a molecular level. Even simple reactions go through reaction intermediates, and so a complex molecule such as a protein cannot simply switch from a completely unfolded state to the native state in one step. Unstable, transient intermediate structures must exist between the native and denatured state. Determining the nature of these intermediate structures is an area of intense biochemical research.

Figure 2.57: Components of a partly denatured protein solution. In a half-unfolded protein solution, half the molecules are fully folded and half are fully unfolded.

53

Proteins fold by progressive stabilization of intermediates rather than by random search

Figure 2.58: Typing-monkey analogy. A monkey randomly poking a typewriter could write a line from Shakespeare’s Hamlet, provided that correct keystrokes were retained. In the two computer simulations shown, the cumulative number of keystrokes is given at the left of each line.

How does a protein make the transition from an unfolded structure to a unique conformation in the native form? One possibility a priori would be that all possible conformations are sampled to find the energetically most favorable one. How long would such a random search take? Consider a small protein with 100 residues. Cyrus Levinthal calculated that, if each residue can assume three different conformations, the total number of structures would be 3100, which is equal to 5 × 1047. If it takes 10−13 s to convert one structure into another, the total search time would be 5 × 1047 × 10−13 s, which is equal to 5 × 1034 s, or 1.6 × 1027 years. In reality, small proteins can fold in less than a second. Clearly, it would take much too long for even a small protein to fold properly by randomly trying out all possible conformations. The enormous difference between calculated and actual folding times is called Levinthal’s paradox. This paradox clearly reveals that proteins do not fold by trying every possible conformation; instead, they must follow at least a partly defined folding pathway consisting of intermediates between the fully denatured protein and its native structure.

The way out of this paradox is to recognize the power of cumulative selection. Richard Dawkins, in The Blind Watchmaker, asked how long it would take a monkey poking randomly at a typewriter to reproduce Hamlet’s remark to Polonius, “Methinks it is like a weasel” (Figure 2.58). An astronomically large number of keystrokes, on the order of 1040, would be required. However, suppose that we preserved each correct character and allowed the monkey to retype only the wrong ones. In this case, only a few thousand keystrokes, on average, would be needed. The crucial difference between these cases is that the first employs a completely random search, whereas, in the second, partly correct intermediates are retained.

Figure 2.60: Folding funnel. The folding funnel depicts the thermodynamics of protein folding. The top of the funnel represents all possible denatured conformations—that is, maximal conformational entropy. Depressions on the sides of the funnel represent semistable intermediates that can facilitate or hinder the formation of the native structure, depending on their depth. Secondary structures, such as helices, form and collapse onto one another to initiate folding.
[After D. L. Nelson and M. M. Cox, Lehninger Principles of Biochemistry, 5th ed. (W. H. Freeman and Company, 2008), p. 143.]

The essence of protein folding is the tendency to retain partly correct intermediates. However, the protein-folding problem is much more difficult than the one presented to our simian Shakespeare. First, the criterion of correctness is not a residue-by-residue scrutiny of conformation by an omniscient observer but rather the total free energy of the transient species. Second, proteins are only marginally stable. The free-energy difference between the folded and the unfolded states of a typical 100-residue protein is 42 kJ mol–1 (10 kcal mol–1), and thus each residue contributes on average only 0.42 kJ mol–1 (0.1 kcal mol–1) of energy to maintain the folded state. This amount is less than the amount of thermal energy, which is 2.5 kJ mol–1 (0.6 kcal mol–1) at room temperature. This meager stabilization energy means that correct intermediates, especially those formed early in folding, can be lost. The analogy is that the monkey would be somewhat free to undo its correct keystrokes. Nonetheless, the interactions that lead to cooperative folding can stabilize intermediates as structure builds up. Thus, local regions that have significant structural preference, though not necessarily stable on their own, will tend to adopt their favored structures and, as they form, can interact with one other, leading to increasing stabilization. This conceptual framework is often referred to as the nucleation-condensation model.

A simulation of the folding of a protein, based on the nucleation-condensation model, is shown in Figure 2.59. This model suggests that certain pathways may be preferred. Although Figure 2.59 suggests a discrete pathway, each of the intermediates shown represents an ensemble of similar structures, and thus a protein follows a general rather than a precise pathway in its transition from the unfolded to the native state. The energy surface for the overall process of protein folding can be visualized as a funnel (Figure 2.60). The wide rim of the funnel represents the wide range of structures accessible to the ensemble of denatured protein molecules. As the free energy of the population of protein molecules decreases, the proteins move down into narrower parts of the funnel and fewer conformations are accessible. At the bottom of the funnel is the folded state with its well-defined conformation. Many paths can lead to this same energy minimum.

Figure 2.59: Proposed folding pathway of chymotrypsin inhibitor. Local regions with sufficient structural preference tend to adopt their favored structures initially (1). These structures come together to form a nucleus with a nativelike, but still mobile, structure (4). This structure then fully condenses to form the native, more rigid structure (5).
[From A. R. Fersht and V. Daggett. Cell 108:573–582, 2002; with permission from Elsevier.]

54

Prediction of three-dimensional structure from sequence remains a great challenge

The prediction of three-dimensional structure from sequence has proved to be extremely difficult. The local sequence appears to determine only between 60 and 70% of the secondary structure; long-range interactions are required to stabilize the full secondary structure and the tertiary structure.

Investigators are exploring two fundamentally different approaches to predicting three-dimensional structure from amino acid sequence. The first is ab initio (Latin, “from the beginning”) prediction, which attempts to predict the folding of an amino acid sequence without prior knowledge about similar sequences in known protein structures. Computer-based calculations are employed that attempt to minimize the free energy of a structure with a given amino acid sequence or to simulate the folding process. The utility of these methods is limited by the vast number of possible conformations, the marginal stability of proteins, and the subtle energetics of weak interactions in aqueous solution. The second approach takes advantage of our growing knowledge of the three-dimensional structures of many proteins. In these knowledge-based methods, an amino acid sequence of unknown structure is examined for compatibility with known protein structures or fragments therefrom. If a significant match is detected, the known structure can be used as an initial model. Knowledge-based methods have been a source of many insights into the three-dimensional conformation of proteins of known sequence but unknown structure.

55

Some proteins are inherently unstructured and can exist in multiple conformations

The discussion of protein folding thus far is based on the paradigm that a given protein amino acid sequence will fold into a particular three-dimensional structure. This paradigm holds well for many proteins. However, it has been known for some time that some proteins can adopt two different structures, only one of which results in protein aggregation and pathological conditions. Such alternate structures originating from a unique amino acid sequence were thought to be rare, the exception to the paradigm. Recent work has called into question the universality of the idea that each amino acid sequence gives rise to one structure for certain proteins, even under normal cellular conditions.

Our first example is a class of proteins referred to as intrinsically unstructured proteins (IUPs). As the name suggests, these proteins, completely or in part, do not have a discrete three-dimensional structure under physiological conditions. Indeed, an estimated 50% of eukaryotic proteins have at least one unstructured region greater than 30 amino acids in length. Unstructured regions are rich in charged and polar amino acids with few hydrophobic residues. These proteins assume a defined structure on interaction with other proteins. This molecular versatility means that one protein can assume different structures and interact with the different partners, yielding different biochemical functions. IUPs appear to be especially important in signaling and regulatory pathways.

Another class of proteins that do not adhere to the paradigm is metamorphic proteins. These proteins appear to exist in an ensemble of structures of approximately equal energy that are in equilibrium. Small molecules or other proteins may bind to different members of the ensemble, resulting in various complexes, each having a different biochemical function. An especially clear example of a metamorphic protein is the chemokine lymphotactin. Chemokines are small signaling proteins in the immune system that bind to receptor proteins on the surface of immune-system cells, instigating an immunological response. Lymphotactin exists in two very different structures that are in equilibrium (Figure 2.61). One structure is a characteristic of chemokines, consisting of a three-stranded β sheet and a carboxyl-terminal helix. This structure binds to its receptor and activates it. The alternative structure is an identical dimer of all β sheets. When in this structure, lymphotactin binds to glycosaminoglycan, a complex carbohydrate (Chapter 11). The biochemical activities of each structure are mutually exclusive: the chemokine structure cannot bind the glycosaminoglycan, and the β-sheet structure cannot activate the receptor. Yet, remarkably, both activities are required for full biological activity of the chemokine.

Figure 2.61: Lymphotactin exists in two conformations, which are in equilibrium.
[R. L. Tuinstra, F. C. Peterson, S. Kutlesa, E. S. Elgin, M. A. Kron, and B. F. Volkman. Proc. Natl. Sci. U.S.A. 105:5057–5062, 2008, Fig. 2A.]

56

Note that IUPs and metamorphic proteins effectively expand the protein-encoding capacity of the genome. In some cases, a gene can encode a single protein that has more than one structure and function. These examples also illustrate the dynamic nature of the study of biochemistry and its inherent excitement: even well-established ideas are often subject to modifications.

Protein misfolding and aggregation are associated with some neurological diseases

Understanding protein folding and misfolding is of more than academic interest. A host of diseases, including Alzheimer disease, Parkinson disease, Huntington disease, and transmissible spongiform encephalopathies (prion disease), are associated with improperly folded proteins. All of these diseases result in the deposition of protein aggregates, called amyloid fibrils or plaques. These diseases are consequently referred to as amyloidoses. A common feature of amyloidoses is that normally soluble proteins are converted into insoluble fibrils rich in β sheets. The correctly folded protein is only marginally more stable than the incorrect form. But the incorrect form aggregates, pulling more correct forms into the incorrect form. We will focus on the transmissible spongiform encephalopathies.

One of the great surprises in modern medicine was that certain infectious neurological diseases were found to be transmitted by agents that were similar in size to viruses but consisted only of protein. These diseases include bovine spongiform encephalopathy (commonly referred to as mad cow disease) and the analogous diseases in other organisms, including Creutzfeld–Jakob disease (CJD) in human beings, scrapie in sheep, and chronic wasting disease in deer and elk. The agents causing these diseases are termed prions. Prions are composed largely, if not exclusively, of a cellular protein called PrP, which is normally present in the brain; its function is still the focus of active research. The infectious prions are aggregated forms of the PrP protein termed PrPSC.

How does the structure of the protein in the aggregated form differ from that of the protein in its normal state in the brain? The normal cellular protein PrP contains extensive regions of α helix and relatively little β strand. The structure of the form of PrP present in infected brains, termed PrPSC, has not yet been determined because of challenges posed by its insoluble and heterogeneous nature. However, a variety of evidence indicates that some parts of the protein that had been in α-helical or turn conformations have been converted into β-strand conformations (Figure 2.62). The β strands of largely planar monomers stack on one another with their side chains tightly interwoven. A side view shows the extensive network of hydrogen bonds between the monomers. These fibrous protein aggregates are often referred to as amyloid forms.

Figure 2.62: A model of the human prion protein amyloid. A detailed model of a human prion amyloid fibril deduced from spin labeling and electron paramagnetic resonance (EPR) spectroscopy studies shows that protein aggregation is due to the formation of large parallel β sheets. The black arrow indicates the long axis of the fibril.
[N. J. Cobb, F. D. Sönnichsen, H. Mchaourab, and W. K. Surewicz. Proc. Natl. Acad. Sci. U.S.A. 104: 18946–18951, 2007, Fig. 4E.]

With the realization that the infectious agent in prion diseases is an aggregated form of a protein that is already present in the brain, a model for disease transmission emerges (Figure 2.63). Protein aggregates built of abnormal forms of PrPSC act as sites of nucleation to which other PrP molecules attach. Prion diseases can thus be transferred from one individual organism to another through the transfer of an aggregated nucleus, as likely happened in the mad cow disease outbreak in the United Kingdom that emerged in the late 1980s. Cattle fed on animal feed containing material from diseased cows developed the disease in turn.

Figure 2.63: The protein-only model for prion-disease transmission. A nucleus consisting of proteins in an abnormal conformation grows by the addition of proteins from the normal pool.

57

Amyloid fibers are also seen in the brains of patients with certain noninfectious neurodegenerative disorders such as Alzheimer and Parkinson diseases. For example, the brains of patients with Alzheimer disease contain protein aggregates called amyloid plaques that consist primarily of a single polypeptide termed Aβ. This polypeptide is derived from a cellular protein called amyloid precursor protein (APP) through the action of specific proteases. Polypeptide Aβ is prone to form insoluble aggregates. Despite the difficulties posed by the protein’s insolubility, a detailed structural model for Aβ has been derived through the use of NMR techniques that can be applied to solids rather than to materials in solution. As expected, the structure is rich in β strands, which come together to form extended parallel β-sheet structures (Figure 2.62).

How do such aggregates lead to the death of the cells that harbor them? The answer is still controversial. One hypothesis is that the large aggregates themselves are not toxic but, instead, smaller aggregates of the same proteins may be the culprits, perhaps damaging cell membranes.

Protein modification and cleavage confer new capabilities

Proteins are able to perform numerous functions that rely solely on the versatility of their 20 amino acids. In addition, many proteins are covalently modified, through the attachment of groups other than amino acids, to augment their functions (Figure 2.64). For example, acetyl groups are attached to the amino termini of many proteins, a modification that makes these proteins more resistant to degradation. As discussed earlier, the addition of hydroxyl groups to many proline residues stabilizes fibers of newly synthesized collagen. The biological significance of this modification is evident in the disease scurvy: a deficiency of vitamin C results in insufficient hydroxylation of collagen, and the abnormal collagen fibers that result are unable to maintain normal tissue strength (Section 27.6). Another specialized amino acid is γ-carboxyglutamate. In vitamin K deficiency, insufficient arboxylation of glutamate in prothrombin, a clotting protein, can lead to hemorrhage (Section 10.4). Many proteins, especially those that are present on the surfaces of cells or are secreted, acquire carbohydrate units on specific asparagine, serine, or threonine residues (Chapter 11). The addition of sugars makes the proteins more hydrophilic and able to participate in interactions with other proteins. Conversely, the addition of a fatty acid to an α-amino group or a cysteine sulfhydryl group produces a more hydrophobic protein.

Figure 2.64: Finishing touches. Some common and important covalent modifications of amino acid side chains are shown.

58

Many hormones, such as epinephrine (adrenaline), alter the activities of enzymes by stimulating the phosphorylation of the hydroxyl amino acids serine and threonine; phosphoserine and phosphothreonine are the most ubiquitous modified amino acids in proteins. Growth factors such as insulin act by triggering the phosphorylation of the hydroxyl group of tyrosine residues to form phosphotyrosine. The phosphoryl groups on these three modified amino acids are readily removed; thus the modified amino acids are able to act as reversible switches in regulating cellular processes. The roles of phosphorylation in signal transduction will be discussed extensively in Chapter 14.

The preceding modifications consist of the addition of special groups to amino acids. Other special groups are generated by chemical rearrangements of side chains and, sometimes, the peptide backbone. For example, the jellyfish Aequorea victoria produces green fluorescent protein (GFP), which emits green light when stimulated with blue light. The source of the fluorescence is a group formed by the spontaneous rearrangement and oxidation of the sequence Ser-Tyr-Gly within the center of the protein (Figure 2.65A). Since the discovery of GFP, a number of mutants have been engineered which absorb and emit light across the entire visible spectrum (Figure 2.65B). These proteins are of great utility to researchers as markers within cells (Figure 2.65C).

Figure 2.65: Chemical rearrangement in GFP. (A) The structure of green fluorescent protein (GFP). The rearrangement and oxidation of the sequence Ser-Tyr-Gly is the source of fluorescence. (B) Mutants of GFP emit light across the visible spectrum. (C) A melanoma cell line engineered to express one of these GFP mutants, red fluorescent protein (RFP), was then injected into a mouse whose blood vessels express GFP. In this fluorescence micrograph, the formation of new blood vessels (green) in the tumor (red) is readily apparent.
[(A) Drawn from 1GFL.pdb; (B) R.Y. Tsien. Integr. Biol. 2:77–93, 2010, Fig. 12; (C) M. Yang, et al. Proc. Natl. Acad. Sci. U.S.A. 100:14259–14262, 2003, Fig. 2B]

59

Finally, many proteins are cleaved and trimmed after synthesis. For example, digestive enzymes are synthesized as inactive precursors that can be stored safely in the pancreas. After release into the intestine, these precursors become activated by peptide-bond cleavage (Section 10.4). In blood clotting, peptide-bond cleavage converts soluble fibrinogen into insoluble fibrin. A number of polypeptide hormones, such as adrenocorticotropic hormone, arise from the splitting of a single large precursor protein. Likewise, many viral proteins are produced by the cleavage of large polyprotein precursors. We shall encounter many more examples of modification and cleavage as essential features of protein formation and function. Indeed, these finishing touches account for much of the versatility, precision, and elegance of protein action and regulation.