39.1 The Genetic Code Links Nucleic Acid and Protein Information

✓ 1 Describe the genetic code.

For any sort of translation to take place, there must be a lexicon—a Rosetta stone—that links the two languages. The genetic code is the relation between the sequence of bases in DNA (or its RNA transcripts) and the sequence of amino acids in proteins. What are the characteristics of this code?

  1. Three nucleotides encode an amino acid. Proteins are built from 20 amino acids, but there are only four bases in nucleic acids. Simple calculations show that a minimum of three bases is required to encode at least 20 amino acids. Genetic experiments showed that an amino acid is in fact encoded by a group of three bases, called a codon.

  2. The code is nonoverlapping. Consider a base sequence ABCDEF. In an overlapping code, ABC specifies the first amino acid, BCD the next, CDE the next, and so on. In a nonoverlapping code, ABC designates the first amino acid, DEF the second, and so forth. Genetics experiments established the code to be nonoverlapping.

  3. The code has no punctuation. In principle, one base (denoted as Q) might serve as a “comma” between codons:

    … QABCQDEFQGHIQJKLQ …

    However, it is not the case. Rather, the sequence of bases is read sequentially from a fixed starting point without punctuation.

  4. The genetic code has directionality. The code is read from the 5′ end of the messenger RNA to its 3′ end.

  5. The genetic code is degenerate. In the context of the genetic code, degeneracy means that some amino acids are encoded by more than one codon, inasmuch as there are 64 possible base triplets and only 20 amino acids. In fact, 61 of the 64 possible triplets specify particular amino acids and 3 triplets (called Stop codons) designate the termination of translation. Thus, for most amino acids, there is more than one codon. Codons that specify the same amino acid are called synonyms. For example, CAU and CAC are synonyms for histidine.

All 64 codons have been deciphered (Table 39.1). Only tryptophan and methionine are encoded by just one triplet each. The other 18 amino acids are each encoded by two or more. Indeed, leucine, arginine, and serine are specified by six codons each.

Table 39.1 The genetic code

What is the biological significance of the extensive degeneracy of the genetic code? If the code were not degenerate, 20 codons would designate amino acids and 44 would lead to chain termination. The probability of mutating to chain termination would therefore be much higher with a nondegenerate code. Chain-termination mutations usually lead to inactive proteins. Degeneracy also allows for mutations that will not change the encoded amino acid, such as when a codon mutates to a synonym or mutates to the codon for another amino acid. The latter mutation is called a substitution, many of which are harmless. Thus, degeneracy minimizes the deleterious effects of mutations.

The Genetic Code Is Nearly Universal

Table 39.2 Distinctive codons of human mitochondria

Most organisms use the same genetic code. This universality accounts for the fact that human proteins, such as insulin, can be synthesized in the bacterium E. coli and harvested from it for the treatment of diabetes. However, genome-sequencing studies have shown that not all genomes are translated by the same code. Ciliated protozoa, for example, differ from most organisms in that UAA and UAG are read as codons for amino acids rather than as stop signals; UGA is their sole termination signal. The first variations in the genetic code were found in mitochondria from a number of species, including human beings (Table 39.2). The genetic code of mitochondria can differ from that of the rest of the cell because mitochondrial DNA encodes a distinct set of transfer RNAs, adaptor molecules that recognize the alternative codons. Thus, the genetic code is nearly but not absolutely universal.

709

Transfer RNA Molecules Have a Common Design

The fidelity of protein synthesis requires the accurate recognition of three-base codons on messenger RNA. An amino acid itself is not structurally complex enough to recognize a codon. Consequently, some sort of adaptor is required. Transfer RNA (tRNA) serves as the adapter molecule between the codon and its specified amino acid. The tRNA acts as an adaptor by binding to a specific codon and brings with it an amino acid for incorporation into the polypeptide chain.

There is at least one tRNA molecule for each of the amino acids. These molecules have many common structural features, as might be expected because all tRNA molecules must be able to interact in nearly the same way with the ribosomes, mRNAs, and protein factors that participate in translation.

All known transfer RNA molecules have the following features:

  1. Each is a single strand containing between 73 and 93 ribonucleotides (∼25 kDa).

  2. The three-dimensional molecule is L-shaped (Figure 39.1).

    Figure 39.1: Transfer RNA structure. Notice the L-shaped structure revealed by this skeletal model of yeast phenylalanyltRNA. The CCA region is at the end of one arm, and the anticodon loop is at the end of the other.

    710

  3. They contain many unusual bases, typically between 7 and 15 per tRNA. Some are methylated or dimethylated derivatives of A, U, C, and G. Methylation prevents the formation of certain base pairs, thereby rendering some of the bases accessible for interactions with other components of the translation machinery. In addition, methylation imparts a hydrophobic character to some regions of tRNAs, which may be important for their interaction with proteins required for protein synthesis. Modified bases, such as inosine, also are components of tRNA. The inosines in tRNA are formed by deamination of adenosine after the synthesis of the primary transcript.

  4. When depicted on a two-dimensional surface, all tRNA molecules can be arranged in a cloverleaf pattern, with about half the nucleotides in tRNAs base-paired to form double helices (Figure 39.2). Five groups of bases are not base-paired in this way: the 3′ CCA terminal region, which is part of a region called the acceptor stem; the TψC loop, which acquired its name from the sequence ribothymine-pseudouracil-cytosine; the “extra arm,” which contains a variable number of residues; the DHU loop, which contains several dihydrouracil residues; and the anticodon loop. The structural diversity generated by this combination of helices and loops containing modified bases ensures that the tRNAs can be uniquely distinguished, though structurally similar overall.

    Figure 39.2: The general structure of transfer RNA molecules. The structure of the tRNA molecule is shown in the cloverleaf pattern. Comparison of the base sequences of many tRNAs reveals a number of conserved features.

    711

  5. The 5′ end of a tRNA is phosphorylated. The 5′ terminal residue is usually pG.

  6. The activated amino acid is attached to a hydroxyl group of the adenosine residue located at the end of the 3′ CCA component of the acceptor stem. This region is a flexible single strand at the 3′ end of mature tRNAs.

  7. The anticodon is present in a loop near the center of the sequence.

Some Transfer RNA Molecules Recognize More Than One Codon Because of Wobble in Base-Pairing

What are the rules that govern the recognition of a codon by the anticodon of a tRNA? A simple hypothesis is that each of the bases of the codon forms a Watson–Crick type of base pair with a complementary base on the anticodon. The codon and anticodon would then be lined up in an antiparallel fashion. Recall that, by convention, nucleotide sequences are written in the 5′ → 3′ direction unless otherwise noted. Hence, the anticodon to AUG is written as CAU, but the actual base-pairing with the codon would be

According to this model, a particular anticodon can recognize only one codon.

Table 39.3 Allowed pairings at the third base of the codon according to the wobble hypothesis

However, things are not so simple. Some tRNA molecules can recognize more than one codon. For example, consider the yeast alanyl-tRNA, with the anticodon IGC, where I is the nucleoside inosine. This anticodon binds to three codons: GCU, GCC, and GCA. The first two bases of these codons are the same, whereas the third is different. Could it be that recognition of the third base of a codon is sometimes less discriminating than recognition of the other two? The pattern of degeneracy of the genetic code indicates that it might be so. Look again at Table 39.1. Generally speaking, XYU and XYC always encode the same amino acid; XYA and XYG usually do. These data suggest that the steric criteria might be less stringent for pairing of the third base than for the other two. In other words, there is some steric freedom (“wobble”) in the pairing of the third base of the codon. With this steric freedom, tRNA anticodons can bond to mRNA codons as shown in Table 39.3.

Two generalizations concerning the codon–anticodon interaction can be made:

  1. The first two bases of a codon pair in the standard way. Recognition is precise. Hence, codons that differ in either of their first two bases must be recognized by different tRNAs. For example, both UUA and CUA encode leucine but are read by different tRNAs.

  2. The first base of an anticodon determines whether a particular tRNA molecule reads one, two, or three kinds of codons: C or A (one codon), U or G (two codons), or I (three codons). Thus, part of the degeneracy of the genetic code arises from imprecision in the pairing of the third base of the codon with the first base of the anticodon. We see here a strong reason for the frequent appearance of inosine, one of the unusual nucleosides, in anticodons. Inosine maximizes the number of codons that can be read by a particular tRNA molecule.

712

The Synthesis of Long Proteins Requires a Low Error Frequency

The process of transcription is analogous to copying, word for word, a page from a book. There is no change of alphabet or vocabulary; so the likelihood of a change in meaning is small. Translating the base sequence of an mRNA molecule into a sequence of amino acids is similar to translating the page of a book into another language. Translation is a complex process, entailing many steps and dozens of molecules. The potential for error exists at each step. The complexity of translation creates a conflict between two requirements: the process must be not only accurate, but also fast enough to meet a cell’s needs. How fast is “fast enough”? In E. coli, translation can take place at a rate of 40 amino acids per second, a truly impressive speed considering the complexity of the process.

How accurate must protein synthesis be? The average E. coli protein is about 300 amino acids in length, with several dozen greater than 1000 amino acids. Let us consider possible error rates when synthesizing proteins of this size. As Table 39.4 shows, an error frequency of 10−2 (one incorrect amino acid for every 100 correct ones incorporated into a protein) would be intolerable, even for small proteins. An error value of 10−3 would usually lead to the error-free synthesis of a 300-residue protein (∼33 kDa) but not of a 1000-residue protein (∼110 kDa). Thus, the error frequency must not exceed approximately 10−4 to produce the larger proteins effectively. Lower error frequencies are conceivable; however, except for the largest proteins, they will not dramatically increase the percentage of proteins with accurate sequences. In addition, such lower error rates are likely to be possible only by a reduction in the rate of protein synthesis because additional time for proofreading will be required. In fact, the observed error values are close to 10–4. An error frequency of about 10–4 per amino acid residue was selected in the course of evolution to accurately produce proteins consisting of as many as 1000 amino acids while maintaining a remarkably rapid rate for protein synthesis.

Table 39.4 Accuracy of protein synthesis