17.1 DECIPHERING THE GENETIC CODE: tRNA AS ADAPTOR

DNA and RNA each consist of only four different nucleotides, whereas proteins can have up to 20 different amino acids. For only four nucleotides to specify the 20 common amino acids, multiple nucleotides must be combined to make up a code. Combinations of two nucleotides yield only 16 (42) different dinucleotide code words, insufficient to encode 20 amino acids. Combinations of three nucleotides yield 64 (43) code words, more than enough to specify 20 amino acids. Hence, the RNA “code word,” or codon, was hypothesized to be a combination of three nucleotides, or possibly more. Insightful experiments, described in this chapter, demonstrated that the code is indeed triplet.

In 1955, to explain how an RNA sequence codes for a sequence of amino acids, Francis Crick hypothesized the existence of an “adaptor” molecule. He proposed that adaptors can recognize specific codons in the mRNA and that each adaptor carries a specific amino acid (Figure 17-1). Adaptors line up on the mRNA, thus aligning the sequence of amino acids. Not long after Crick’s adaptor hypothesis, Paul Zamecnik and Mahlon Hoagland discovered a small RNA that covalently attaches to amino acids in a reaction requiring ATP (see the How We Know section at the end of this chapter). These RNA–amino acid hybrids could presumably base-pair with mRNA, because they contain nucleotides and thus fit the description of the adaptor molecule needed to translate the information in an mRNA sequence into a polypeptide sequence. This small RNA, later called transfer RNA (tRNA), is aminoacylated at the 3′ terminus in an ATP-dependent reaction. A tRNA with its attached amino acid is called an aminoacyl-tRNA, and the tRNA is said to be charged with that amino acid. The amino acid specificity of tRNAs is provided not by the anticodon in their nucleotide sequence but by the enzymes that attach amino acids to particular tRNAs, enzymes known as aminoacyl-tRNA synthetases. Thus, the association between the amino acid and the anticodon evolved, rather than being chemically predetermined.

Figure 17-1: Crick’s adaptor hypothesis. Adaptor molecules recognize codons in mRNA and carry specific amino acids. Thus, they line up amino acids in an order that depends on the sequence of codons in the mRNA. Today we know that the adaptor is a tRNA molecule. The amino acid is covalently bound at the 3′ end of the tRNA molecule, and a specific nucleotide triplet (anticodon) elsewhere in the tRNA interacts with a triplet codon in mRNA through hydrogen bonding of complementary bases.

591

KEY CONVENTION

In denoting tRNAs, the specificity is indicated by a superscript, and the aminoacylated-tRNA by a hyphenated name. For example, tRNALeu indicates an uncharged tRNA that is specific for leucine, and leucyl-tRNALeu, or LeutRNALeu, indicates a leucine-specific tRNA that is charged with leucine.

All tRNAs Have a Similar Structure

The structure of the tRNA molecule reveals how it is capable of functioning as an adaptor. We briefly discuss tRNA structure here, and then explore this topic in greater detail in Chapter 18.

Transfer RNAs are relatively small, single-stranded RNA molecules. The tRNAs in bacteria and in the cytoplasm of eukaryotic cells are 73 to 93 nucleotide residues long. Mitochondria and chloroplasts contain distinctive, somewhat smaller tRNAs. All tRNAs form intramolecular base pairs and fold up into a precise three-dimensional structure. They contain the trinucleotide sequence CCA at the 3′ terminus. The 3′-terminal A residue is the nucleotide to which the amino acid attaches.

When drawn in two dimensions, the hydrogen-bonding pattern of all tRNAs forms a cloverleaf structure with four arms; the longer tRNAs also have a short fifth arm, or extra arm (Figure 17-2a). In three dimensions, a tRNA folds up further into the form of a twisted L (Figure 17-2b). Two arms of the tRNA are critical for adaptor function, and they are located at the two ends of the L shape. The 3′ terminus of the amino acid arm, in the charged tRNA, carries a specific amino acid. At the opposite end is the anticodon arm, so named because it contains the anticodon, the three-nucleotide sequence that base-pairs with the complementary codon in mRNA. The other major arms are the D arm, which often contains the unusual nucleotide dihydrouridine (D), and the TΨC arm, containing ribothymidine (T) and pseudouridine (Ψ), which has an unusual carbon–carbon bond between the base and ribose. The base pairing between the anticodon in the tRNA and the codon in mRNA is antiparallel. For example, the codon for methionine is 5′-AUG, which base-pairs with the tRNAMet anticodon 5′-CAU (i.e., 3′-UAC) (Figure 17-3).

Figure 17-2: The structure of tRNA. (a) The cloverleaf form. Large dots on the backbone represent the nucleotide residues; blue lines indicate base pairs. Three nucleotides constitute the anticodon (at the bottom), and an amino acid is attached to the 3′-terminal amino acid arm. Unusual modified nucleotides are present in the D arm and TΨC arm. Py (pyrimidine) can be either U or C; Pu (purine) can be either A or G. The D arm often contains one or more dihydrouracil residues (D). (b) The three-dimensional structure folds into a twisted L shape. The positions of the anticodon, the 3′-terminal amino acid arm, and the D and TΨC arms are shown.
Figure 17-3: The pairing relationship of the codon and anticodon. The Met-tRNAMet (methionyl-tRNAMet) is shown. The nucleotide positions of the codon and anticodon are numbered 1, 2, and 3, in the 5′→3′ direction (thus, anticodon nucleotide 3 pairs with codon nucleotide 1).

592

KEY CONVENTION

The nucleotide positions of the codon (mRNA) and anticodon (tRNA) are numbered 1, 2, and 3, in the 5′→3′ direction. Due to the antiparallel base pairing between the anticodon and the codon, the numbering of nucleotides in the anticodon is the reverse of that in the codon. Thus, anticodon nucleotide 3 pairs with codon nucleotide 1.

As shown in Figure 17-2, the anticodon is some distance from the 3′ terminus of the amino acid arm (where the amino acid is attached), and thus the anticodon cannot directly specify the correct amino acid. Indeed, the ribosome will link any two amino acids lined up correctly on the mRNA, regardless of whether the tRNA is charged with a correct or an incorrect amino acid. It is the function of the aminoacyl-tRNA synthetases to place the correct amino acid onto the tRNA. Therefore, the specificity of the genetic code lies in the accuracy of protein-based aminoacylation of the tRNAs. Most cells contain 20 aminoacyl-tRNA synthetases, one for each amino acid. Because there are more codons than amino acids, some amino acids are specified by more than one tRNA, yet the same aminoacyl-tRNA synthetase recognizes all tRNAs that specify a given amino acid. The ribosome binds the mRNA and charged tRNAs, bringing the components into proximity for linking together the amino acids attached to adjacent aminoacyl-tRNAs as they align on the mRNA. The entire process of decoding the linear sequence of mRNA into the sequence of a protein is known as translation; it requires more than 100 different types of protein and RNA molecules (see Chapter 18).

The Genetic Code Is Degenerate

As we have seen, there are 64 unique ways to combine four different nucleotides in a triplet codon sequence, yet there are only 20 common amino acids. Therefore, either some codons are not found in mRNA sequences, or—as we now know—multiple codons encode the same amino acid. A degenerate code is one in which several codons have the same meaning. We refer to the genetic code as degenerate because a single amino acid can be encoded by more than one codon. As we will see later, the degeneracy of the genetic code is advantageous because it provides the DNA with the ability to absorb single-base mutations with minimal consequences for the protein sequences it encodes.

All 64 codons of the genetic code are used in some fashion: 61 for coding amino acids and 3 for specifying the termination of translation (Figure 17-4). Three amino acids—arginine, leucine, and serine—are each specified by six different codons. Five amino acids have four codons, isoleucine has three, and nine amino acids have two codons. Only two amino acids, tryptophan and methionine, are specified by a single codon (Table 17-1).

Figure 17-4: The genetic code. The codon sequences are written in the 5′3′ direction. The first nucleotide of each codon is shown on the left side of the grid, the second nucleotide at the top, and the third on the right. AUG (shaded green) also serves as the start codon; UAA, UAG, and UGA (red) are stop (or nonsense) codons.
Figure 17-1: The Degeneracy of the Genetic Code

When several different codons specify one amino acid, the first two nucleotides of each codon are the primary determinants of specificity, and the difference between the codons usually lies at the third position. For example, alanine is specified by the triplets GCU, GCC, GCA, and GCG. When four codons specify the same amino acid, they are referred to as a codon family. Within a codon family, the first two nucleotides are the same, the nucleotide at the third position does not matter, and base pairing of the first two nucleotides carries the information needed to specify the amino acid. Many amino acids are specified by two codons in which the third nucleotide is either a purine in both or a pyrimidine in both.

593

Wobble Enables One tRNA to Recognize Two or More Codons

If all three nucleotides in an mRNA codon were needed to form Watson-Crick base pairs with their counterparts in the tRNA anticodon, 61 different tRNAs would be required in every cell. In fact, only 32 tRNAs are required to recognize all the amino acid codons, because some tRNAs recognize more than one codon. However, some cells contain considerably more than 32 different tRNAs.

As noted above, when several codons specify the same amino acid, usually the third nucleotide is the only difference. In some cases the cell uses different tRNAs for the different codons that encode the same amino acid, and in these cases a single aminoacyl-tRNA synthetase recognizes the various tRNAs and charges them all with the same amino acid. Many tRNAs can recognize more than one codon, and these tRNAs often contain either a U or a G as the 5′ nucleotide of the anticodon (i.e., in position 1, which pairs with the third nucleotide of the codon), because these nucleotides can form noncanonical (i.e., non-Watson-Crick) base pairs: U can pair with either A or G, and G can pair with either C or U (Figure 17-5). These noncanonical base pairs are not found in DNA because they do not fit within the tight geometric constraints of the DNA duplex, but they are accommodated in the more flexible base pairing that occurs between tRNA and mRNA. The bases that participate in noncanonical base pairs are called wobble bases. The wobble bases allow a single tRNA anticodon to bind to more than one mRNA codon. The 5′ nucleotide in the anticodon is in the wobble position. However, it is important to note that the structure of tRNA can make the anticodon completely specific for perfect base pairing with one codon. We see this for tryptophan and methionine, each of which has only one codon. Thus, the necessary “flexibility” needed for wobble in tRNA involves more than the anticodon sequence—it also involves the way in which tRNA codon-anticodon pairs can accept particular base differences, as described shortly.

Figure 17-5: Wobble base pairing. (a) Wobble allows one tRNA to recognize two different codons. U normally pairs with A (top) but can form two hydrogen bonds with G to make a weak G–U wobble base pair, which occurs in the third position of the codon (bottom). (b) A tRNA pairs with two different codons through wobble pairing (shown in red) at the third nucleotide of the codon. The G of the G–U pair can be in either the anticodon or the codon.

The anticodon in some tRNAs includes inosine (designated I; this nucleotide residue contains the base hypoxanthine), which can hydrogen-bond with any of three different nucleotides: U, C, or A (Figure 17-6). These pairings are much weaker than the hydrogen bonds of Watson-Crick base pairs. When Robert Holley sequenced the yeast tRNAAla in 1965, he found inosine at the first position of the anticodon. This explains why the anticodon of yeast tRNAAla, 5′-IGC, can function with three different codons: 5′-GCA, 5′-GCU, and 5′-GCC. Inosine, like the other modified nucleotides in tRNA, is formed posttranscriptionally—an adenosine residue is deaminated by the enzyme adenosine deaminase to produce a keto moiety in place of the amino group.

Figure 17-6: Inosine as a wobble nucleotide. (a) Inosine (I) can form two hydrogen bonds with either C, U, or A. (b) A tRNA containing I in the first position of the anticodon can recognize three different codons, according to the wobble rules. Wobble pairings are shown in red.

594

The process by which some tRNAs can recognize more than one codon was formalized by Crick, who proposed a set of four relationships known as the wobble hypothesis:

  1. The first two bases of an mRNA codon always form Watson-Crick base pairs with the corresponding bases of the tRNA anticodon, and they confer most of the coding specificity.

  2. The first base of the anticodon (reading in the 5′→3′ direction) pairs with the third base of the codon and determines the number of codons recognized by the tRNA. When the first nucleotide of the anticodon is C or A, base pairing is specific, and only one codon is recognized by that tRNA. When the first nucleotide is U or G, base pairing is less specific, and two different codons may be read by the same tRNA. When the first nucleotide of an anticodon is I, three different codons can be recognized—the maximum number for any tRNA.

  3. When an amino acid is specified by several different codons, codons that differ in either of the first two bases require different tRNAs.

  4. A minimum of 32 tRNAs are required to translate all 61 codons (31 tRNAs for the amino acids and 1 for initiation).

Specific Codons Start and Stop Translation

As we describe in Section 17.2 and in further detail in Chapter 18, the codons in an mRNA molecule are read by the ribosome in the 5′→3′ direction, without gaps. Because each codon has three nucleotides, an mRNA sequence has the potential to encode three different polypeptide sequences, depending on exactly where translation begins—that is, depending on which register of triplets the translation apparatus acts upon (Figure 17-7). Each register of triplets in mRNA is called a reading frame. The amino acid sequence of the protein encoded by the mRNA depends on which reading frame is used.

Figure 17-7: Three possible reading frames. Shown here is a single RNA sequence translated in all three of its reading frames.

595

Specific sequences in mRNA signal the start of translation and thus define the reading frame. Translation almost always starts at an AUG codon, which specifies the amino acid methionine; this codon is referred to as the initiation codon or start codon. Occasionally, the codon GUG (usually encoding valine) or UUG (usually encoding leucine) is used as an initiation codon, yet the mRNA is still recognized by the initiating methionine tRNA, inserting a Met residue. The mRNA can also have internal AUG (or GUG and UUG) codons, but translation does not begin at these internal positions. In bacteria, there is a specific sequence in the mRNA next to the initiating AUG (or GUG) that binds the ribosome and directs it to start translation. In eukaryotes, the ribosome is directed to the 5′ terminus of the mRNA, after which it slides down the mRNA; translation can then be initiated at various sites, influenced by a nucleotide sequence known as the Kozak sequence (discussed in Chapter 18).

The three codons (UAA, UAG, and UGA) that signal the end of translation and do not specify any amino acid are called termination codons or stop codons (or, sometimes, nonsense codons; see Figure 17-7). Termination codons signal the ribosome to dissociate from the newly synthesized polypeptide chain. When the ribosome encounters a termination codon, a release factor associates with the ribosome and terminates protein synthesis. Release factors, even though they recognize specific codons, are proteins. In a fascinating display of molecular mimicry, the three-dimensional structure of release factor proteins is very similar to the structure of tRNA.

With 3 of the 64 codons acting as terminators, a random mRNA sequence should contain 1 stop codon for every 20 codons or so. A long sequence of nucleotide triplets with no stop codons is unlikely to occur by chance, and it generally encodes a protein. Such a sequence is known as an open reading frame, or ORF (Figure 17-8). For example, the average length of a gene in E. coli is 1,000 nucleotides, or about 333 codons that lack a termination codon.

Figure 17-8: Start and stop signals in the open reading frame of a gene. The reading frame of a gene that encodes a protein begins at an ATG start codon in the coding strand of the DNA (AUG in the mRNA) and ends at the first stop codon in the same reading frame as the start codon.

The Genetic Code Resists Single-Base Substitution Mutations

The degeneracy of the genetic code enables it to absorb many types of point mutations without serious consequence (see Chapter 12 for a more complete discussion of types of mutation). A single-base substitution that leads to the replacement of one amino acid with another is a missense mutation. However, because the genetic code is degenerate, many single-base substitutions are silent mutations that do not result in an amino acid replacement. For example, a nucleotide change in the third position of a codon results in a change in amino acid only about 25% of the time.

The ability of the code to withstand mutation is even more apparent when we consider that the most frequent mutation is a transition mutation, in which a purine is replaced by another purine (A·T replaced by G≡C, or G≡C by A·T). All three positions of the codon confer some type of protection from deleterious transition mutations. A transition mutation in the third position rarely cause a change at all, due to the wobble rules. Even the functioning of UAA and UAG stop codons is protected from damage by a transition mutation in the third position.

A transition mutation in the first position of most codons does result in an amino acid change, but the change is usually to an amino acid that is chemically similar to the original amino acid. This is especially evident for hydrophobic amino acids, as shown in the leftmost column of Figure 17-4. These codons contain U in the second position, and replacement of the first nucleotide results in a codon that specifies another hydrophobic residue. For example, a codon change of GUU to AUU results in an exchange of Ile for Val. Had the GUU codon been altered to CUU, the protein would contain Leu instead of Val. These amino acids have similar chemical properties and thus are much more likely to conserve the protein function than if a hydrophobic residue were replaced by a polar residue. The second position of a codon generally determines whether it encodes a polar (if nucleotide 2 is a purine) or hydrophobic (if a pyrimidine) amino acid. Therefore, transition mutations in the second position also tend to conserve the chemical nature of the protein product.

Errors produced during translation occur most frequently in the codon’s first and third nucleotide positions, but the redundancy in coding due to wobble in the third position removes most errors. Eight amino acids are specified by codons that contain any of the four nucleotides in the third position. This, coupled with the fact that any purine-pyrimidine mispairing in the wobble position results in the same amino acid in all but three cases, greatly reduces the effect of reading errors at the ribosome. Just as transition mutations generally lead to a conservative change, misreading of purine-pyrimidine codon-anticodon base pairs results in conservative changes.

596

Computational studies that examine the theoretical ability of randomly generated genetic codes to withstand the effects of mutation show that most codes would be much less resistant to mutation than is the code actually used by cells. In fact, the probability of arriving by chance at a code that is as resistant to mutation as the genetic code of living organisms is about one in a million. These considerations suggest that the code was extensively honed by natural selection before the divergence of other life forms from LUCA, the ancestral cell.

Some Mutations Are Suppressed by Special tRNAs

Far more deleterious than missense mutations are codon changes that result in a termination codon. These nonsense mutations abort protein synthesis, resulting in an incomplete protein that is rarely functional. The gene can be restored to function by a second mutation that converts the nonsense codon to a missense codon or by a mutation in a tRNA that suppresses termination at the nonsense codon by inserting an amino acid at that position (Figure 17-9). Mutant tRNAs that function at a stop codon to allow translation to continue are called suppressor tRNAs. For example, a change in the anticodon of tRNATyr from 5′-GUA to 5′-CUA results in an altered tRNATyr that inserts tyrosine at a 5′-UAG termination codon (Figure 17-10). Depending on the suppressor tRNA, other amino acids could be inserted at a 5′-UAG termination codon. In theory, any tRNA with an anticodon that is one base pair different from a stop codon could become a suppressor tRNA if a single point mutation occurred in the right place in the anticodon. In fact, suppressor mutations are rare in vivo, but this phenomenon has been harnessed as a tool in the molecular biology laboratory.

Figure 17-9: Suppression of a nonsense mutation. (a) The wild-type mRNA encodes a full-length protein, with CAG encoding Gln (glutamine). (b) A nonsense mutation at an internal CAG codon changes it to a UAG termination codon, resulting in an incomplete protein. (c) A tRNATyr suppressor has a mutant anticodon that pairs with the UAG nonsense (stop) codon, resulting in a full-length protein with a Tyr residue in place of the Gln residue of the wild-type protein.
Figure 17-10: The structure of a suppressor tRNA. The tRNATyr with anticodon 5′-GUA recognizes the UAC codon. The suppressor tRNATyr contains a mutation in the anticodon, altering it to 5′-CUA, which base-pairs with the UAG nonsense (stop) codon and inserts a Tyr residue in the protein.

Although suppressor tRNAs usually carry a single-nucleotide change in the anticodon, some mutations in suppressor tRNAs lie outside the anticodon. For example, the suppressor of UGA nonsense codons is usually a tRNATrp that usually recognizes UGG. The mutation that provides this ability to recognize UGA (and to insert tryptophan at this position) can be in the anticodon, but it can also be due to a change of G to A at nucleotide position 24 in the D arm of the tRNATrp. This change is presumed to lead to an altered conformation that can now recognize both the normal UGG codon and the UGA stop codon. There are other instances in which codon recognition by a tRNA is altered by mutations outside the anticodon, and these are probably mediated by effects on the larger tRNA structure, although more research is needed to clarify the mechanism.

Suppression must not be too efficient, otherwise normal termination codons would also be suppressed, leading to abnormally long protein products—an outcome that would be lethal to the cell. Suppression is limited in several ways. Many genes are terminated by multiple stop codons. But more importantly, there are multiple copies of each tRNA gene, even in cells that are not diploid. Some duplicate tRNA polynucleotide chains are weakly expressed and thus constitute only a small fraction of the tRNA pool for a particular amino acid. Suppressor mutants are typically found in one of these minor tRNA genes, leaving the major tRNA gene to function normally.

597

An example of suppression in E. coli is tRNATyr with the anticodon 5′-GUA. E. coli contains three identical tRNATyr genes, but one is much more highly transcribed than the others. The tRNATyr suppressor mutation, which changes the anticodon to 5′-CUA and thus recognizes the 5′-UAG stop codon, is found in one of the minor, less-transcribed tRNATyr genes. Therefore, the insertion of tyrosine at UAG stop codons is inefficient, but sufficient full-length protein is produced from a gene with a nonsense mutation to let the cell survive. Furthermore, UAG is used only rarely as a stop codon in E. coli. This allows suppression to be reasonably efficient (up to 50%) at UAG stop codons. In comparison, suppression at the more frequently used UAA and UGA stop codons must be kept below 5% to ensure cell viability. There are also examples of suppressor tRNAs for missense mutations, and of suppressor tRNAs for frameshift mutations, which place the ribosome in an incorrect reading frame by insertion or deletion of a nucleotide.

SECTION 17.1 SUMMARY

  • Transfer RNAs are small RNA molecules that can covalently attach at their 3′ end to an amino acid. The triplet anticodon in tRNA pairs with a triplet codon in mRNA, and this pairing mediates translation of the nucleotide sequence in mRNA into the amino acid sequence of a protein.

  • The genetic code is degenerate, because most amino acids are specified by two or more codons. One tRNA often reads two codon sequences, due to noncanonical or wobble base pairing at the third nucleotide position of the codon. When the anticodon contains inosine (I), a modified nucleotide residue, the tRNA recognizes three different codons, ending in A, C, or U.

  • An AUG codon, specifying methionine, typically initiates protein synthesis. The three termination codons do not specify an amino acid, but instead instruct the ribosome to stop translation.

  • Due to codon assignments and the degeneracy of the genetic code, single-base substitution mutations generally result in codons that specify the same or similar amino acids; nonsense mutations result in a stop codon that can lead to inactive protein. Mutant tRNAs that carry a single-nucleotide change in the anticodon can suppress nonsense mutations by inserting an amino acid in the polypeptide at the mutant termination codon.