Models of sequence evolution are used to calculate evolutionary divergence

The sequence comparison procedure illustrated in Figure 23.1 gives a simple count of the number of similarities and differences between the proteins of two species. In the context of two aligned DNA sequences, we can count the number of differences at homologous nucleotide positions, and this count indicates the minimum number of nucleotide changes that must have occurred since the two sequences diverged from a common ancestral sequence.

Although it is useful in determining a minimum number of changes between two DNA sequences, the count provided by sequence alignment almost certainly underestimates the actual number of changes that have occurred since the sequences diverged. Any given change counted in a similarity matrix of DNA sequences may result from multiple substitution events that occurred at a given nucleotide position over time. As illustrated in Figure 23.2, any of the following events may have occurred at a given nucleotide position that would not be revealed by a simple count of similarities and differences between two DNA sequences:

image
Figure 23.2 Multiple Substitutions Are Not Reflected in Pairwise Sequence Comparisons Two observed sequences descended from a common ancestral sequence (center) have undergone a series of substitutions. Although the two observed sequences differ by only three nucleotides (colored letters), these three differences result from a total of nine substitutions (arrows).

489

To correct for undercounting of substitutions, molecular evolutionists have developed mathematical models that describe how DNA (and protein) sequences evolve. These models take into account the relative rates of change from one nucleotide to another. For example, transitions (changes between the two purines, A ↔ G, or between the pyrimidines, C ↔ T) are typically more frequent than transversions (a purine is replaced by a pyrimidine, or vice versa). These models also include parameters such as the different rates of substitution across different parts of a gene and the proportions of each nucleotide present in a given sequence. Once such parameters have been estimated, the model is used to correct for multiple substitutions, coincident substitutions, parallel substitutions, and back substitutions. The revised estimate accounts for the total number of substitutions likely to have occurred between two sequences, which is almost always greater than the observed number of differences.

As sequence information becomes available for more and more genes in an ever-expanding database, sequence alignments can be extended across multiple homologous sequences, and the minimum number of insertions, deletions, and substitutions can be summed across homologous genes of an entire group of organisms. Similar databases have been constructed for homologous proteins. Figure 23.3 shows aligned data for cytochrome c protein sequences in 33 species of animals, plants, and fungi. Such information is used extensively in determining evolutionary relationships among species.

image
Figure 23.3 Amino Acid Sequences of Cytochrome c The amino acid sequences shown in the table were obtained from analyses of the enzyme cytochrome c from 29 species of plants, fungi, and animals. Note the lack of variation across the sequences at positions 70–80, suggesting that this region is under strong purifying selection and that changing its amino acid sequence would impair the protein’s function. The molecular models at the upper left are created from these sequences and show the three-dimensional structures of tuna and rice cytochrome c. Alpha helixes are in red, and the molecule’s heme group is shown in yellow.

Media Clip 23.1 The Ubiquitous Protein

www.life11e.com/mc23.1