Comparing Protein Sequences and Structures Provides Insight into Protein Function and Evolution

Analyses of many diverse proteins have conclusively established a relation between the amino acid sequence, three-dimensional structure, and function of proteins. One of the earliest examples involved a comparison of two oxygen-carrying proteins: myoglobin in muscle and hemoglobin in red blood cells. Myoglobin—a monomer (consisting of one polypeptide chain/protein molecule)—and hemoglobin—a tetramer (consisting of two α and two β polypeptides, or subunits, per protein)—both contain a heme group noncovalently attached to each polypeptide chain (Figure 3-14a). The heme group binds oxygen. A mutation in the gene encoding the β chain of hemoglobin that results in the substitution of a valine for a glutamic acid disturbs this protein’s folding and function and causes sickle-cell disease (also called sickle-cell anemia). The properly aligned sequences of the 141-residue myoglobin and the 153-residue β subunit of hemoglobin have 40 residues in equivalent positions in the sequences that are identical and another 21 that have side chains that are chemically very similar. This high degree of identity and similarity (43 percent of the myoglobin residues) is consistent with their similar oxygen-binding functions. X-ray crystallographic analysis showed that the three-dimensional structures of myoglobin and of the α and β subunits of hemoglobin, as well as that of the evolutionarily distant oxygen-carrying leghemoglobin from plants, are remarkably similar (see Figure 3-14a).

image
FIGURE 3-14 Evolution of the globin protein family. (a) Hemoglobin is a tetramer of two α and two β subunits. The structural similarity of these subunits to leghemoglobin and myoglobin, both of which are monomers, is evident. A heme molecule (red) noncovalently associated with each globin polypeptide is directly responsible for oxygen binding in these proteins. (b) A primitive monomeric oxygen-binding globin is thought to be the ancestor of modern-day blood hemoglobins, muscle myoglobins, and plant leghemoglobins. Sequence comparisons have revealed that the evolution of the globin proteins parallels the evolution of animals and plants. Major changes occurred with the divergence of plant globins from animal globins and of myoglobin from hemoglobin. Later, gene duplication gave rise to the α and β subunits of hemoglobin. See R. C. Hardison, 1996, P. Natl. Acad. Sci. USA 93:5675.
[Part (a) data from G. Fermi et al., 1984, J. Mol. Biol. 175:159–174, PDB ID 2hbb (hemoglobin), H. C. Watson, 1969, Prog. Stereochem. 4:299, PDB ID 1mbn (myoglobin), and M. S. Hargrove et al., 1997, J. Mol. Biol. 266:1032–1042, PDB ID 1bin (leghemoglobin).]

A good rule of thumb is that the greater the similarity of the sequences of two polypeptide chains, the more likely they are to have similar three-dimensional structures and similar functions. While this comparative approach is very powerful, caution must always be exercised when attributing to one protein, or a part of a protein, a function or structure similar to that of another protein based only on amino acid sequence similarities. There are examples in which proteins with similar overall structures display different functions, as well as cases in which functionally unrelated proteins with dissimilar amino acid sequences nevertheless have very similar folded tertiary structures, as will be explained below. Nevertheless, in many cases, such comparisons of sequences provide important insights into protein structure and function.

80

Use of sequence comparisons to deduce protein structure and function has expanded substantially in recent years as the genomes and messenger RNAs of more and more organisms have been sequenced, permitting a vast array of protein sequences to be deduced. Indeed, the molecular revolution in biology during the last decades of the twentieth century created a new scheme of biological classification based on similarities and differences in the amino acid sequences of proteins. Proteins that have a common ancestor are referred to as homologs. The main evidence for homology among proteins, and hence for their common ancestry, is similarity in their sequences, which is often reflected in similar structures. We can describe homologous proteins as belonging to a “family” and can trace their lineage—how closely or distantly they are related to one another in an evolutionary sense—from comparisons of their sequences. Generally, more closely related proteins exhibit greater sequence similarity than more distantly related proteins because, over evolutionary time, mutations accumulate in the genes encoding these proteins. The folded three-dimensional structures of homologous proteins may be similar even if some parts of their primary structure show little evidence of sequence homology. Initially, proteins with relatively high sequence similarities (>50 percent exact amino acid matches, or “identities”) and related functions or structures were defined as an evolutionarily related family, while a superfamily encompassed two or more families in which the interfamily sequences matched less well (~30–40 percent identities) than within one family. It is generally thought that proteins with about 30 percent sequence identity are likely to have similar three-dimensional structures; however, such high sequence identity is not required for proteins to share similar structures. Revised definitions of family and superfamily have been proposed, in which a family comprises proteins with a clear evolutionary relationship (>30 percent identity or additional structural and functional information showing common descent but <30 percent identity), while a superfamily comprises proteins with only a probable common evolutionary origin—for example, lower sequence identities but one or more common motifs or domains.

The kinship among homologous proteins is most easily visualized by a tree diagram based on sequence analyses. For example, the amino acid sequences of globins—the proteins hemoglobin and myoglobin and their relatives from bacteria, plants, and animals—suggest that they evolved from an ancestral monomeric oxygen-binding protein (Figure 3-14b). With the passage of time, the gene for this ancestral protein slowly changed, initially diverging into lineages leading to animal and plant globins. Subsequent changes gave rise to myoglobin and to the α and β subunits of the tetrameric hemoglobin molecule (α2β2) of the vertebrate circulatory system.