Protein Primary Structure Can Be Determined by Chemical Methods and from Gene Sequences

The classic method for determining the amino acid sequence of a protein is Edman degradation. In this procedure, the free amino group of the N-terminal amino acid of a polypeptide is labeled, and the labeled amino acid is then cleaved from the polypeptide and identified by high-pressure liquid chromatography. The polypeptide is left one residue shorter, with a new amino acid at the N-terminus. The cycle is repeated on the ever-shortening polypeptide until all the residues have been identified.

Before about 1985, biologists commonly used Edman degradation for determining protein sequences. Now, however, complete protein sequences usually are determined primarily by analysis of genome and messenger RNA sequences. The complete genomes of many organisms have already been sequenced, and the database of genome sequences from humans and numerous model organisms is expanding rapidly. As discussed in Chapter 6, the sequences of proteins can be deduced from DNA sequences that are predicted to encode proteins.

A powerful approach for determining the primary structure of an isolated protein combines MS and the use of sequence databases. First, the peptide mass fingerprint of the protein is obtained by MS. A peptide mass fingerprint is the list of the molecular weights of peptides that are generated from the protein by digestion with a specific protease, such as trypsin. The molecular weights of the parent protein and its proteolytic fragments are then used to search genome databases for any similar-sized protein with identical or similar peptide mass fingerprints. Mass spectrometry can also be used to directly sequence peptides using MS/MS, as described above.