Deciphering the Genetic Code

INTRODUCTION

DNA consists of a code language with just four letters, known as bases, making up a variety of words, known as codons, that are three letters in length. The four bases are strung together linearly along chromosomes, and in humans make up the 3-billion base pairs worth of sequences in the human genome. How did scientists first learn to interpret the genetic code? Some of the first clues appeared serendipitously as researchers studied aspects of gene expression. Later, researchers probed in earnest, setting up specific experiments to decipher the code used in RNA molecules. Marshall Nirenberg and his colleagues gathered the bulk of the data.

ANIMATION SCRIPT

By the early 1960s, scientists understood that DNA directs the synthesis of RNA. In turn, experiments implicated RNA as the template in protein synthesis. In 1961, Marshall Nirenberg and Heinrich Matthaei published experiments that aimed at proving the RNA connection. The experiments also had the serendipitous effect of revealing clues about the genetic code.

The scientists began their experiments by preparing extracts from E. coli cells. The extracts were prepared so that they contained ribosomes, amino acids, and other components required for protein synthesis. However, the extracts lacked mRNA. Without RNA, no protein synthesis occurred.

When the investigators added RNA to the tube, the extract produced protein. Nirenberg and Matthaei experimented with a number of different sources of RNA and even used a synthetic RNA that contained nucleotides with only the base uridine, called poly U. The scientists found that the extract produced protein, and the protein consisted entirely of phenylalanine amino acids.

In later experiments, Nirenberg and his colleagues found that a template RNA consisting of only adenine resulted in a protein made only of lysine amino acids. A poly C RNA resulted in a protein of proline amino acids. From these experiments, it was clear that RNA was required for protein synthesis, and that the sequence of the RNA dictated the amino acids in the protein.

From their experiments, the scientists knew that poly U RNA directed the incorporation of phenylalanine into a protein. However, they didn't know how many uracil-containing nucleotides were required to create a codeword, now known as a codon. Was one nucleotide enough, or were two, three, or four required?

Proteins contain 20 different amino acids, while mRNA contains only 4 different nucleotides. If an mRNA codon is 1, 2, 3, or 4 nucleotides long, how many different codons can be formed from different arrangements of 4 nucleotides? Would this be enough to code for 20 different amino acids?

A codon length of 3 is sufficient to code for 20 amino acids, but a codon length of 2 is not.

To determine a codon's length, Nirenberg and colleague Philip Leder devised a clever and quick assay. The assay involved ribosomes and tRNAs charged with radioactive phenylalanine. The scientists knew that aminoacyl tRNA molecules participated in protein synthesis and under certain conditions could be found attached to ribosomes. The assay also included poly U RNA.

In addition to the reaction mixture, the assay relied on the function of a membrane filter made of the chemical cellulose nitrate. This filter has special properties. If ribosomes are poured onto the filter, the ribosomes stick, while the fluid goes through the filter. A washing step does not dislodge the ribosomes.

Although the ribosomes stick to the filter, neither the tRNA molecules nor the poly U RNA molecules stick. Each type of molecule can flow through the porous filter during a washing step. The ability of the ribosomes, but not the other molecules, to bind to the filter was the key to the assay.

For their experiments, the investigators created mixtures that included ribosomes and tRNA molecules charged with radioactive phenylalanine. One of the mixtures also received poly U RNA. After giving these mixtures time to incubate, the mixtures were poured on the filters and then washed. After the washing step, the scientists found that only one of the filters was radioactive.

The poly U RNA acted as an intermediary for the radiolabeled tRNAs to stick to the ribosomes.

Nirenberg and Leder reported their results in graph form. At a variety of temperatures, poly U produced a positive reaction. In the absence of poly U, there was virtually no reaction, regardless of the temperature tested.

The investigators used their assay to determine the number of nucleotides in the poly U required to bring together ribosomes and tRNA charged with phenylalanine. The investigators used RNA molecules with 12 uracil-containing nucleotides and found reactivity. They also found reactivity with 6, 5, 4, and 3 nucleotides. However, 2 nucleotides produced almost no reactivity.

One can surmise that the minimal number of nucleotides that produces reactivity is also the number of nucleotides in a codon. That is, 3 nucleotides make up a codon.

Nirenberg and colleagues went on to characterize a number of trinucleotides, only a subset of which are shown here. The scientists found that threonine-charged tRNAs bind to ACU, alanine-charged tRNAs bind to GCU, proline-charged tRNAs bind to CCA, and serine-charged tRNAs bind to UCG. From such experiments, we know the genetic code, which consists of 64 codons.

Here is the complete code. Each codon specifies an amino acid, with the exception of three stop codons. When a ribosome encounters a stop codon, no tRNA will bind to the codon, and protein synthesis terminates. Another codon, called a start codon, is unique because it is always found at the beginning of a protein-coding sequence in mRNA.

Using the genetic code table, we can construct an RNA molecule that would code for the series of amino acids in this protein.

Notice that more than one codon will work for most of these amino acids. The genetic code is, therefore, redundant. However, the code is NOT ambiguous. For example, CCU specifies proline, but it does not specify any other amino acid, such as leucine or alanine.

CONCLUSION

In the 1960s, Marshall Nirenberg and his colleagues determined the language of the genetic code. Their meticulous work paved the way, decades later, for interpreting the sequences of the entire human genome and the genomes of many other organisms. The scientists designed specific RNA sequences to test the possible code words, or codons, in the genetic code. Through these types of experiments, we now know that of the 64 possible codons, 61 of them correspond to specific amino acids. Three codons code for no amino acids, are known as stop codons, and are found at the end of a coding sequence in a messenger RNA molecule.