20.4 Proteomics Analyzes the Complete Set of Proteins Found in a Cell

DNA sequence data offer tremendous insight into the biology of an organism, but they are not the whole story. Many genes encode proteins, and proteins carry out the vast majority of the biochemical reactions that shape the phenotype of an organism. Although proteins are encoded by DNA, many proteins undergo modifications after translation and, in more-complex eukaryotes there are many more proteins than genes. Thus, in recent years, molecular biologists have turned their attention to analysis of the protein content of cells. The ultimate goal is to determine the proteome, the complete set of proteins found in a given cell. The study of the proteome is termed proteomics.

Plans are underway to identify and characterize all proteins in the human body, an effort that has been called the Human Proteome Project. The project would catalog which proteins are present in which cell types, where each protein is located within the cell, and which other proteins each interacts with. Many researchers feel that this information will be of immense benefit in identifying drug targets, understanding the biological basis of disease, and understanding the molecular basis of many biological processes.

Determination of Cellular Proteins

The basic procedure for characterizing the proteome is first to separate the proteins found in a cell and then to identify and quantify the individual proteins. One method for separating proteins is two-dimensional polyacrylamide gel electrophoresis (2D-PAGE), in which the proteins are separated in one dimension by charge, separated in a second dimension by mass, and then stained (Figure 20.19a). This procedure separates the different proteins into spots, with the size of each spot proportional to the amount of protein present. A typical 2D-PAGE gel may contain several hundred to several thousand spots (Figure 20.19b).

Figure 20.19: Two-dimensional acrylamide gel electrophoresis (2D-PAGE) can be used to separate cellular proteins.
[After G. Gibson and S. Muse. 2004. A Primer of Genome Science, 2e. Sinauer Associates, Inc. p. 274, Fig. 5.4.]

Because 2D-PAGE does not detect some proteins in low abundance and is difficult to automate, researchers have turned to liquid chromatography for separating proteins. In liquid chromatography, a mixture of molecules is dissolved in a liquid and passed through a column packed with solid particles. Different affinities for the liquid and solid phases cause some components of the mixture to travel through the column more slowly than others, resulting in separation of the components of the mixture.

The traditional method for identifying a protein is to remove its amino acids one at a time and determine the identity of each amino acid removed. This method is far too slow and labor intensive for analyzing the thousands of proteins present in a typical cell. Today, researchers use mass spectrometry, which is a method for precisely determining the molecular mass of a molecule. In mass spectrometry, a molecule is ionized and its migration rate in an electrical field is determined. Because small molecules migrate more rapidly than larger molecules, the migration rate can accurately determine the mass of the molecule.

To analyze proteins with mass spectrometry, a protein is first digested with the enzyme trypsin, which cleaves the protein into smaller peptide fragments, each containing several amino acids (Figure 20.20a). Mass spectrometry is then used to separate the peptides on the basis of their mass-to-charge ratio (Figure 20.20b). This separation produces a profile of peaks, in which each peak corresponds to the mass-to-charge ratio of one peptide (Figure 20.20c). A computer program then searches through a database of proteins to find a match between the profile generated and the profile expected with a known protein (Figure 20.20d), allowing the protein in the sample to be identified. Using bioinformatics, the computer creates “virtual digests” and predicts the profiles of all proteins found in a genome, given the DNA sequences of the protein-encoding genes.

Figure 20.20: Mass spectrometry is used to identify proteins.

604

Mass spectrometric methods can also be used to measure the amount of each protein identified. With recent advances, researchers now carry out “shotgun” proteomics, which eliminates most of the initial protein-separation stage. In this procedure, a complex mixture of proteins (such as those from a tissue sample) is digested and analyzed with mass spectrometry. The computer program then sorts out the proteins present in the original sample from the peptide profiles.

Mary Lipton and her colleagues used this approach to study the proteome of Deinococcus radiodurans, an exceptional bacterium that is able to withstand high doses of ionizing radiation that are lethal to all other organisms. The genome of D. radiodurans had already been sequenced. Lipton and her colleagues extracted proteins from the bacteria, digested them with trypsin, separated the fragments with liquid chromatography, and then determined the proteins from the peptide fragments with mass spectrometry. They were able to identify 1910 proteins, which is more than 60% of the proteins predicted on the basis of the genome sequence.

Deciphering the proteome of even a single cell is a challenging task. Every cell contains a complete sequence of genes, but different cells express vastly different proteins. Each gene may produce a number of different proteins through alternative processing (see Chapter 14) and post-translational protein processing (see Chapter 15). A typical human cell contains as many as 100,000 different proteins that vary greatly in abundance, and no technique such as PCR can be used to easily amplify proteins.

Affinity Capture

Proteomics concerns not just the identification of all proteins in a cell, but also an understanding of how these proteins interact and how their expression varies with the passage of time. Researchers have developed a number of techniques for identifying proteins that interact within the cell. In affinity capture, an antibody (see Figure 22.20) to a specific protein is used to capture one protein from a complex mixture of proteins. The protein captured will “pull down” with it any proteins with which the captured protein physically interacts. The pulled-down mixture of proteins can then be analyzed by mass spectrometry to identify the proteins. Various modifications of affinity capture and other techniques can be used to determine the complete set of protein interactions in a cell, termed the interactome.

Protein Microarrays

Protein-protein interactions can also be analyzed with protein microarrays (Figure 20.21), which are similar to the microarrays used for examining gene expression. With this technique, a large number of different proteins are applied to a glass slide as a series of spots, with each spot containing a different protein. In one application, each spot is an antibody for a different protein, labeled with a tag that fluoresces when bound. An extract of tissue is applied to the protein microarray. A spot of fluorescence appears when a protein in the extract binds to an antibody, indicating the presence of that particular protein in the tissue.

Figure 20.21: Protein microarrays can be used to examine interactions among proteins. (a) A microarray containing 4400 proteins found in yeast. (b) The array was probed with an enzyme that phosphylates proteins to determine which proteins serve as substrate for the enzyme. Dark spots represent proteins that were phosphylated by the enzyme. Proteins that phosphylate themselves (autophosphorylate) are included in each block of the microarray (shown in blue boxes) to serve as reference points.
[From D. Hall, J. Ptacek, and M. Snyder, 2006. Mechanisms of Ageing and Development 128 (2007) 161-167. © 2006, with permission from Elsevier.]

605

Structural Proteomics

The high-resolution structure of a protein provides a great deal of useful information. It is often a source of insight into the function of an unknown protein; it may also suggest the location of active sites and provide information about other molecules that interact with the protein. Knowledge of a protein’s structure often suggests targets for potential drugs that might interact with the protein. Because structure often provides information about function, a goal of proteomics is to determine the structure of every protein found in a cell.

Two procedures are currently used to solve the structures of complex proteins: (1) X-ray crystallography, in which crystals of the protein are bombarded with X-rays and the diffraction patterns of the X-rays are used to determine the structure (see Chapter 10) and (2) nuclear magnetic resonance (NMR), which provides information on the position of specific atoms within a molecule by using the magnetic properties of nuclei.

Both X-ray crystallography and NMR require human intervention at many stages and are too slow for determining the structure of thousands of proteins that may exist within a cell. Because the structures of hundreds of thousands of proteins are required for studies of the proteome, researchers ultimately hope to be able to predict the structure of a protein from its amino acid sequence. This method is not possible at the present time, but the hope is that, if enough high-resolution structures are solved, it may be possible in the future to model the structure from the amino acid sequence alone. As scientists work on automated methods that will speed the structural determination of proteins, bio-informaticists are developing better computer programs for predicting protein structure from sequence.

CONCEPTS

The proteome is the complete set of proteins found in a cell. Techniques of protein separation and mass spectrometry are used to identify the proteins present within a cell. Affinity capture and microarrays are used to determine sets of interacting proteins. Structural proteomics attempts to determine the structure of all proteins.

CONCEPT CHECK 9

Why is knowledge of a protein’s structure important?