Chapter Introduction

Exploring Proteins and Proteomes

65

Milk, a source of nourishment for all mammals, is composed, in part, of a variety of proteins. The protein components of milk are revealed by the technique of MALDI–TOF mass spectrometry, which separates molecules on the basis of their mass-to-charge ratio.
[(Left) Okea/istockphoto.com. (Right) Courtesy of Dr. Brian Chait.]

OUTLINE

  1. The Purification of Proteins Is an Essential First Step in Understanding Their Function

  2. Immunology Provides Important Techniques with Which to Investigate Proteins

  3. Mass Spectrometry Is a Powerful Technique for the Identification of Peptides and Proteins

  4. Peptides Can Be Synthesized by Automated Solid-Phase Methods

  5. Three-Dimensional Protein Structure Can Be Determined by X-ray Crystallography and NMR Spectroscopy

Proteins play crucial roles in nearly all biological processes—in catalysis, signal transmission, and structural support. This remarkable range of functions arises from the existencChapter (\d)e of thousands of proteins, each folded into a distinctive three-dimensional structure that enables it to interact with one or more of a highly diverse array of molecules. A major goal of biochemistry is to determine how amino acid sequences specify the conformations, and hence functions, of proteins. Other goals are to learn how individual proteins bind specific substrates and other molecules, mediate catalysis, and transduce energy and information.

It is often preferable to study a protein of interest after it has been separated from other components within the cell so that the structure and function of this protein can be probed without any confounding effects from contaminants. Hence, the first step in these studies is the purification of the protein of interest. Proteins can be separated from one another on the basis of solubility, size, charge, and binding ability. After a protein has been purified, its amino acid sequence can be determined. Many protein sequences, often deduced from genome sequences, are available in vast sequence databases. If the sequence of a purified protein has been archived in a publicly searchable database, the job of the investigator becomes much easier. The investigator need determine only a small stretch of amino acid sequence of the protein to find its match in the database. Alternatively, such a protein might be identified by matching its mass to those deduced for proteins in the database. Mass spectrometry provides a powerful method for determining the mass and sequence of a protein.

66

After a protein has been purified and its identity confirmed, the challenge remains to determine its function within a physiologically relevant context. Antibodies are choice probes for locating proteins in vivo and measuring their quantities. Monoclonal antibodies, able to recognize specific proteins, can be obtained in large amounts and used to detect and quantify the protein both in isolation and in cells. Peptides and proteins can be chemically synthesized, providing tools for research and, in some cases, highly pure material for use as drugs. Finally, x-ray crystallography and nuclear magnetic resonance (NMR) spectroscopy are the principal techniques for elucidating three-dimensional structure, the key determinant of function.

The exploration of proteins by this array of physical and chemical techniques has greatly enriched our understanding of the molecular basis of life. These techniques make it possible to tackle some of the most challenging questions of biology in molecular terms.

The proteome is the functional representation of the genome

As will be discussed in Chapter 5, the complete DNA base sequences, or genomes, of many organisms are now available. For example, the roundworm Caenorhabditis elegans has a genome of 97 million bases and about 19,000 protein-encoding genes, whereas that of the fruit fly Drosophila melanogaster contains 180 million bases and about 14,000 genes. The completely sequenced human genome contains 3 billion bases and about 23,000 genes. However, these genomes are simply inventories of the genes that could be expressed within a cell under specific conditions. Only a subset of the proteins encoded by these genes will actually be present in a given biological context. The proteome—derived from proteins expressed by the genome—of an organism signifies a more complex level of information content, encompassing the types, functions, and interactions of proteins within its biological environment.

The proteome is not a fixed characteristic of the cell. Because it represents the functional expression of information, it varies with cell type, developmental stage, and environmental conditions, such as the presence of hormones. The proteome is much larger than the genome because almost all gene products are proteins that can be chemically modified in a variety of ways. Furthermore, these proteins do not exist in isolation; they often interact with one another to form complexes with specific functional properties. Whereas the genome is “hard wired,” the proteome is highly dynamic. An understanding of the proteome is acquired by investigating, characterizing, and cataloging proteins. In some, but not all, cases, this process begins by separating a particular protein from all other biomolecules in the cell.