PROBLEMS

Question 6.1

What’s the score? Using the identity-based scoring system (Section 6.2), calculate the score for the following alignment. Do you think the score is statistically significant?

  1. WYLGKITRMDAEVLLKKPTVRDGHFLVTQCESSPGEF-

  2. WYFGKITRRESERLLLNPENPRGTFLVRESETTKGAY-

    SISVRFGDSVQ-----HFKVLRDQNGKYYLWAVK-FN-

    CLSVSDFDNAKGLNVKHYKIRKLDSGGFYITSRTQFS-

    SLNELVAYHRTASVSRTHTILLSDMNV

    SSLQQLVAYYSKHADGLCHRLTNV

Question 6.2

Sequence and structure. A comparison of the aligned amino acid sequences of two proteins each consisting of 150 amino acids reveals them to be only 8% identical. However, their three-dimensional structures are very similar. Are these two proteins related evolutionarily? Explain.

Question 6.3

It depends on how you count. Consider the following two sequence alignments:

  1. A-SNLFDIRLIG
    GSNDFYEVKIMD

  2. ASNLFDIRLI-G
    GSNDFYEVKIMD

189

Which alignment has a higher score if the identity-based scoring system (Section 6.2) is used? Which alignment has a higher score if the Blosum-62 substitution matrix (Figure 6.9) is used?

Question 6.4

Discovering a new base pair. Examine the ribosomal RNA sequences in Figure 6.20. In sequences that do not contain Watson–Crick base pairs, what base tends to be paired with G? Propose a structure for your new base pair.

Question 6.5

Overwhelmed by numbers. Suppose that you wish to synthesize a pool of RNA molecules that contain all four bases at each of 40 positions. How much RNA must you have in grams if the pool is to have at least a single molecule of each sequence? The average molecular weight of a nucleotide is 330 g mol−1.

Question 6.6

Form follows function. The three-dimensional structure of biomolecules is more conserved evolutionarily than is sequence. Why?

Question 6.7

Shuffling. Using the identity-based scoring system (Section 6.2), calculate the alignment score for the alignment of the following two short sequences:

  1. ASNFLDKAGK

  2. ATDYLEKAGK

Generate a shuffled version of sequence 2 by randomly reordering these 10 amino acids. Align your shuffled sequence with sequence 1 without allowing gaps, and calculate the alignment score between sequence 1 and your shuffled sequence.

Question 6.8

Interpreting the score. Suppose that the sequences of two proteins each consisting of 200 amino acids are aligned and that the percentage of identical residues has been calculated. How would you interpret each of the following results in regard to the possible divergence of the two proteins from a common ancestor?

  1. 80%

  2. 50%

  3. 20%

  4. 10%.

Question 6.9

Particularly unique. Consider the Blosum-62 matrix in Figure 6.9. Replacement of which three amino acids never yields a positive score? What features of these residues might contribute to this observation?

Question 6.10

A set of three. The sequences of three proteins (A, B, and C) are compared with one another, yielding the following levels of identity:

Assume that the sequence matches are distributed uniformly along each aligned sequence pair. Would you expect protein A and protein C to have similar three-dimensional structures? Explain.

Question 6.11

RNA alignment. Sequences of an RNA fragment from five species have been determined and aligned. Propose a likely secondary structure for these fragments.

  1. UUGGAGAUUCGGUAGAAUCUCCC

  2. GCCGGGAAUCGACAGAUUCCCCG

  3. CCCAAGUCCCGGCAGGGACUUAC

  4. CUCACCUGCCGAUAGGCAGGUCA

  5. AAUACCACCCGGUAGGGUGGUUC

Question 6.12

The more the merrier. When RNA alignments are used to determine secondary structure, it is advantageous to have many sequences representing a wide variety of species. Why?

Question 6.13

To err is human. You have discovered a mutant form of a thermostable DNA polymerase with significantly reduced fidelity in adding the appropriate nucleotide to the growing DNA strand, compared with wild-type DNA polymerase. How might this mutant be useful in the molecular-evolution experiments described in Section 6.5?

Question 6.14

Generation to generation. When performing a molecular-evolution experiment, such as that described in Section 6.5, why is it important to repeat the selection and replication steps for several generations?

Question 6.15

BLAST away. Using the National Center for Biotechnology Information Web site (www.ncbi.nlm.nih.gov), find the sequence of the enzyme triose phosphate isomerase from E. coli strain K-12. Use this sequence as the query for a protein–protein BLAST search. In the output, find the alignment with the sequence of triose phosphate isomerase from human beings (Homo sapiens). How many identities are observed in the alignment?

190