9.5 The Proteome

Chapter 8 began with a discussion of the number of genes in the human genome and how that number (about 21,000) was much lower than the actual number of proteins in a human cell (more than 100,000). Now that you are familiar with how information encoded in DNA is transcribed into RNA and how RNA is translated into protein, it is a good time to revisit this matter and look more closely at the sources of protein diversification. First, let’s review a few old terms and add a new one that will be useful in this discussion. You already know that the genome is the entire set of genetic material in an organism. You will learn in Chapter 14 that the transcriptome is the complete set of coding and noncoding transcripts in an organism, organ, tissue, or cell. Another term is the proteome, which was briefly introduced in Chapter 8 but is defined here as the complete set of proteins in an organism, organ, tissue, or cell. In the remainder of this chapter, you will see how the proteome is enriched by two cellular processes: the alternative splicing of pre-mRNA and the posttranslational modification of proteins.

Alternative splicing generates protein isoforms

As you recall from Chapter 8, alternative splicing of pre-mRNA allows one gene to encode more than one protein. Proteins are made up of functional domains that are often encoded by different exons. Thus, the alternative splicing of a pre-mRNA can lead to the synthesis of multiple proteins (called isoforms) with different combinations of functional domains. This concept is illustrated by FGFR2, a human gene that encodes the receptor that binds fibroblast growth factors and then transduces a signal inside the cell (Figure 9-19). The FGFR2 protein is made up of several domains, including an extracellular ligand-binding domain. Alternative splicing results in two isoforms that differ in their extracellular domains. Because of this difference, each isoform binds different growth factors. For many genes that are alternatively spliced, different isoforms are made in different tissues.

Figure 9-19: Alternative splicing produces related but distinct protein isoforms
Figure 9-19: Messenger RNAs produced by alternative splicing of the pre-mRNA of the human FGFR2 gene encode two protein isoforms that bind to different ligands (the growth factors).

340

Posttranslational events

When released from the ribosome, most newly synthesized proteins are unable to function. As you will see in this section and in subsequent chapters of this book, DNA sequence is only part of the story of how organisms function. In this case, all newly synthesized proteins need to fold up correctly and the amino acids of some proteins need to be chemically modified. Because some protein folding and modification take place after protein synthesis, they are called posttranslational events.

Protein folding inside the cell The most important posttranslational event is the folding of the nascent (newly synthesized) protein into its correct three-dimensional shape. A protein that is folded correctly is said to be in its native conformation (in contrast with an unfolded or misfolded protein that is nonnative). As we saw at the beginning of this chapter, proteins exist in a remarkable diversity of structures. The distinct structures of proteins are essential for their enzymatic activity, for their ability to bind to DNA, or for their structural roles in the cell. Although it has been known since the 1950s that the amino acid sequence of a protein determines its three-dimensional structure, it is also known that the aqueous environment inside the cell does not favor the correct folding of most proteins. Given that proteins do in fact fold correctly in the cell, a long-standing question has been, How is this correct folding accomplished?

341

The answer seems to be that nascent proteins are folded correctly with the help of chaperones—a class of proteins found in all organisms from bacteria to plants to humans. One family of chaperones, called the GroE chaperonins, form large multisubunit complexes called chaperonin folding machines. Although the precise mechanism is not yet understood, newly synthesized, unfolded proteins are believed to enter a chamber in the folding machine that provides an electrically neutral microenvironment within which the nascent protein can successfully fold into its native conformation.

Posttranslational modification of amino acid side chains As already stated, proteins are polymers of amino acids made from any of the 20 different types. However, biochemical analysis of many proteins reveals that a variety of molecules can be covalently attached to amino acid side chains. More than 300 modifications of amino acid side chains are possible after translation. Two of the more commonly encountered posttranslational modifications—phosphorylation and ubiquitination—are considered next.

Figure 9-20: Phosphorylation and dephosphorylation of proteins
Figure 9-20: Proteins can be activated through the enzymatic attachment of phosphate groups to their amino acid side groups and inactivated by the removal of those phosphate groups.

Phosphorylation Enzymes called kinases attach phosphate groups to the hydroxyl groups of the amino acids serine, threonine, and tyrosine, whereas enzymes called phosphatases remove these phosphate groups. Because phosphate groups are negatively charged, their addition to a protein usually changes protein conformation. The addition and removal of phosphate groups serves as a reversible switch to control a variety of cellular events, including enzyme activity, proteinprotein interactions, and protein–DNA interactions (Figure 9-20).

One measure of the importance of protein phosphorylation is the number of genes encoding kinase activity in the genome. Even a simple organism such as yeast has hundreds of kinase genes, whereas the mustard plant Arabidopsis thaliana has more than 1000. Another measure of the significance of protein phosphorylation is that most of the numerous protein–protein interactions that take place in a typical cell are regulated by phosphorylation.

Recent analyses of the protein–protein interactions of the proteome indicate that most proteins function by interacting with other proteins. The interactome is the name given to the complete set of protein–protein interactions in an organism, organ, tissue, or cell. One way to display the network of protein–protein interactions that constitute an interactome is shown in Figure 9-21. To generate this figure, researchers determined the 3186 protein interactions among 1705 human proteins. However, these interactions constitute only a tiny fraction of the protein–protein interactions that are taking place in all human cells under all growth conditions.

Figure 9-21: Some of the protein interactions in the human interactome
Figure 9-21: Proteins (represented by circles) interact with other proteins (connected by lines) to form simple or large protein complexes. This interactome shows 3186 interactions among 1705 human proteins.
[Data from Ulrich Steizl et al., Max Delbrück Center for Molecular Medicine (MDC) Berlin-Buch. Copyright MDC]

What is the biological significance of these interactions? In this chapter and preceding ones, you have seen that protein–protein interactions are central to the function of large biological machines such as the replisome, the spliceosome, and the ribosome. Another set of significant interactions is the associations between human proteins and the proteins of human pathogens. For example, the interactome of 40 Epstein–Barr virus (EBV) proteins and 112 human proteins consists of 173 interactions (Figure 9-22). Understanding of this web of interactions may lead to new therapies for mononucleosis, a disease caused by EBV infection.

Figure 9-22: Interactions between EBV and human proteins
Figure 9-22: The web of 173 interactions among 40 proteins from Epstein-Barr virus (EBV) and 112 human proteins. Virus proteins are shown as yellow circles and human proteins as blue squares. Interactions are shown as red lines.
[ Data from Calderwood et al., Proceedings of the National Academy of Sciences 104, 2007, 7606-7611. Copyright 2007 by National Academy of Sciences.]

Ubiquitination Surprisingly, one of the most common posttranslational modifications is not a subtle one like the addition of a phosphate group. Instead, this modification targets the protein for degradation by a biological machine and protease called the 26S proteasome (Figure 9-23). The modification targeting a protein for degradation is the addition of chains of multiple copies of a protein called ubiquitin to the ε-amine of lysine residues (called ubiquitination). Ubiquitin contains 76 amino acids and is found only in eukaryotes, where it is highly conserved in plants and animals. Two broad classes of proteins are targeted for destruction by ubiquitination: short-lived proteins such as cell-cycle regulators or proteins that have become damaged or mutated.

Figure 9-23: Ubiquitination targets a protein for degradation
Figure 9-23: The major steps in ubiquitin-mediated protein degradation are shown. Ubiquitin is first conjugated to another protein and then degraded by the proteasome. Ubiquitin and oligopeptides are then recycled.

342

343

Protein targeting In eukaryotes, all proteins are synthesized on ribosomes in the cytoplasm. However, some of these proteins end up in the nucleus, others in the mitochondria, and still others anchored in the membrane or secreted from the cell. How do these proteins “know” where they are supposed to go? The answer to this seemingly complex problem is actually quite simple: a newly synthesized protein contains a short sequence that targets the protein to the correct place or cellular compartment. For example, a newly synthesized membrane protein or a protein destined for an organelle has a short leader peptide, called a signal sequence, at its amino-terminal end. For membrane proteins, this stretch of 15 to 25 amino acids directs the protein to channels in the endoplasmic reticulum membrane where the signal sequence is cleaved by a peptidase (Figure 9-24). From the endoplasmic reticulum, the protein is directed to its ultimate destination. A similar phenomenon exists for certain bacterial proteins that are secreted.

Proteins destined for the nucleus include the RNA and DNA polymerases and transcription factors discussed in Chapters 7 and 8. Amino acid sequences embedded in the interiors of such nucleusbound proteins are necessary for transport from the cytoplasm into the nucleus. These nuclear localization sequences (NLSs) are recognized by cytoplasmic receptor proteins that transport newly synthesized proteins through nuclear pores—sites in the membrane through which large molecules are able to pass into and out of the nucleus. A protein not normally found in the nucleus will be directed to the nucleus if an NLS is attached to it.

Figure 9-24: Signal sequences target proteins for secretion
Figure 9-24: Proteins destined to be secreted from the cell have an amino-terminal sequence that is rich in hydrophobic residues. This signal sequence binds to proteins in the endoplasmic reticulum (ER) membrane that draw the remainder of the protein through the lipid bilayer. The signal sequence is cleaved from the protein in this process by an enzyme called signal peptidase (not shown). Once inside the endoplasmic reticulum, the protein is directed to the cell membrane, from which it will be secreted.

Why are signal sequences cleaved during targeting, whereas an NLS, located in a protein’s interior, remains after the protein moves into the nucleus? One explanation might be that, in the nuclear disintegration that accompanies mitosis (see Chapter 2), proteins localized to the nucleus may find themselves in the cytoplasm. Because such a protein contains an NLS, it can relocate to the nucleus of a daughter cell that results from mitosis.

KEY CONCEPT

Most eukaryotic proteins are inactive unless modified after translation. Some posttranslational events, such as phosphorylation or ubiquitination, modify amino acid side groups, thus promoting protein activation or degradation, respectively. Other posttranslational mechanisms recognize amino acid signatures in a protein sequence and target those proteins to places where their activity is required inside or outside the cell.

344