4.5 Gene Expression Is the Transformation of DNA Information into Functional Molecules

The information stored as DNA becomes useful when it is expressed in the production of RNA and proteins. This rich and complex topic is the subject of several chapters later in this book, but here we introduce the basics of gene expression. DNA can be thought of as archival information, stored and manipulated judiciously to minimize damage (mutations). It is expressed in two steps. First, an RNA copy is made that encodes directions for protein synthesis. This messenger RNA can be thought of as a photocopy of the original information: it can be made in multiple copies, used, and then disposed of. Second, the information in messenger RNA is translated to synthesize functional proteins. Other types of RNA molecules exist to facilitate this translation.

Several kinds of RNA play key roles in gene expression

Scientists used to believe that RNA played a passive role in gene expression, as a mere conveyor of information. However, recent investigations have shown that RNA plays a variety of roles, from catalysis to regulation. Cells contain several kinds of RNA (Table 4.3):

Type

Relative amount (%)

Sedimentation coefficient (S)

Mass (kDa)

Number of nucleotides

Ribosomal RNA (rRNA)

80

23

1.2 × 103

3700

 

 

16

0.55 × 103

1700

 

 

  5

3.6 × 101

  120

Transfer RNA (tRNA)

15

  4

2.5 × 101

    75

Messenger RNA (mRNA)

  5

 

Heterogeneous

 

Table 4.3: RNA molecules in E. coli

120

1. Messenger RNA (mRNA) is the template for protein synthesis, or translation. An mRNA molecule may be produced for each gene or group of genes that is to be expressed in E. coli, whereas a distinct mRNA is produced for each gene in eukaryotes. Consequently, mRNA is a heterogeneous class of molecules. In prokaryotes, the average length of an mRNA molecule is about 1.2 kilobases (kb). In eukaryotes, mRNA has structural features, such as stem-loop structures, that regulate the efficiency of translation and the lifetime of the mRNA.

2. Transfer RNA (tRNA) carries amino acids in an activated form to the ribosome for peptide-bond formation, in a sequence dictated by the mRNA template. There is at least one kind of tRNA for each of the 20 amino acids. Transfer RNA consists of about 75 nucleotides (having a mass of about 25 kDa).

Kilobase (kb)

A unit of length equal to 1000 base pairs of a double-stranded nucleic acid molecule (or 1000 bases of a single-stranded molecule).

One kilobase of double-stranded DNA has a length of 0.34 μm at its maximal extension (called the contour length) and a mass of about 660 kDa.

3. Ribosomal RNA (rRNA) is the major component of ribosomes (Chapter 30). In prokaryotes, there are three kinds of rRNA, called 23S, 16S, and 5S RNA because of their sedimentation behavior. One molecule of each of these species of rRNA is present in each ribosome. Ribosomal RNA was once believed to play only a structural role in ribosomes. We now know that rRNA is the actual catalyst for protein synthesis.

Ribosomal RNA is the most abundant of these three types of RNA. Transfer RNA comes next, followed by messenger RNA, which constitutes only 5% of the total RNA. Eukaryotic cells contain additional small RNA molecules that play a variety of roles including the regulation of gene expression, processing of RNA and the synthesis of proteins. We will examine these small RNAs in later chapters. In this chapter, we will consider rRNA, mRNA, and tRNA.

All cellular RNA is synthesized by RNA polymerases

The synthesis of RNA from a DNA template is called transcription and is catalyzed by the enzyme RNA polymerase (Figure 4.27). RNA polymerase catalyzes the initiation and elongation of RNA chains. The reaction catalyzed by this enzyme is

Figure 4.27: RNA Polymerase. This large enzyme comprises many subunits, including β (red) and β′ (yellow), which form a “claw” that holds the DNA to be transcribed. Notice that the active site includes a Mg2+ ion (green) at the center of the structure. The curved tubes making up the protein in the image represent the backbone of the polypeptide chain.
[Drawn from 1L9Z, pdb.]

121

RNA polymerase requires the following components:

1. A template. The preferred template is double-stranded DNA. Single-stranded DNA also can serve as a template. RNA, whether single or double stranded, is not an effective template; nor are RNA–DNA hybrids.

2. Activated precursors. All four ribonucleoside triphosphates—ATP, GTP, UTP, and CTP—are required.

3. A divalent metal ion. Either Mg2+ or Mn2+ is effective.

The synthesis of RNA is like that of DNA in several respects (Figure 4.28). First, the direction of synthesis is 5′ → 3′. Second, the mechanism of elongation is similar: the 3′-OH group at the terminus of the growing chain makes a nucleophilic attack on the innermost phosphoryl group of the incoming nucleoside triphosphate. Third, the synthesis is driven forward by the hydrolysis of pyrophosphate. In contrast with DNA polymerase, however, RNA polymerase does not require a primer. In addition, the ability of RNA polymerase to correct mistakes is not as extensive as that of DNA polymerase.

Figure 4.28: Transcription mechanism of the chain-elongation reaction catalyzed by RNA polymerase.
[Source: J. L. Tymoczko, J. Berg, and L. Stryer, Biochemistry: A Short Course, 2nd ed. (W. H. Freeman and Company, 2013), Fig. 36.3.]

All three types of cellular RNA—mRNA, tRNA, and rRNA—are synthesized in E. coli by the same RNA polymerase according to instructions given by a DNA template. In mammalian cells, there is a division of labor among several different kinds of RNA polymerases. We shall return to these RNA polymerases in Chapter 29.

RNA polymerases take instructions from DNA templates

DNA template (plus, or coding, strand of ϕX174)

RNA product

A

25

U

25

T

33

A

32

G

24

C

23

C

18

G

20

Table 4.4: Base composition (percentage) of RNA synthesized from a viral DNA template

RNA polymerase, like the DNA polymerases described earlier, takes instructions from a DNA template. The earliest evidence was the finding that the base composition of newly synthesized RNA is the complement of that of the DNA template strand, as exemplified by the RNA synthesized from a template of single-stranded DNA from the ϕX174 virus (Table 4.4). The strongest evidence for the fidelity of transcription came from base-sequence studies. For instance, the nucleotide sequence of a segment of the gene encoding the enzymes required for tryptophan synthesis was determined with the use of DNA-sequencing techniques (Section 5.1). Likewise, the sequence of the mRNA for the corresponding gene was determined. The results showed that the RNA sequence is the precise complement of the DNA template sequence (Figure 4.29).

Figure 4.29: Complementarity between mRNA and DNA. The base sequence of mRNA (red) is the complement of that of the DNA template strand (blue). The sequence shown here is from the tryptophan operon, a segment of DNA containing the genes for five enzymes that catalyze the synthesis of tryptophan. The other strand of DNA (black) is called the coding strand because it has the same sequence as the RNA transcript except for thymine (T) in place of uracil (U).

122

Transcription begins near promoter sites and ends at terminator sites

Consensus sequence

Not all base sequences of promoter sites are identical. However, they do possess common features, which can be represented by an idealized consensus sequence. Each base in the consensus sequence TATAAT is found in most prokaryotic promoters. Nearly all promoter sequences differ from this consensus sequence at only one or two bases.

RNA polymerase must detect and transcribe discrete genes from within large stretches of DNA. What marks the beginning of the unit to be transcribed? DNA templates contain regions called promoter sites that specifically bind RNA polymerase and determine where transcription begins. In bacteria, two sequences on the 5′ (upstream) side of the first nucleotide to be transcribed function as promoter sites (Figure 4.30A). One of them, called the Pribnow box, has the consensus sequence TATAAT and is centered at −10 (10 nucleotides on the 5′ side of the first nucleotide transcribed, which is denoted by +1). The other, called the −35 region, has the consensus sequence TTGACA. The first nucleotide transcribed is usually a purine.

Figure 4.30: Promoter sites for transcription in (A) prokaryotes and (B) eukaryotes. Consensus sequences are shown. The first nucleotide to be transcribed is numbered +1. The adjacent nucleotide on the 5′ side is numbered −1. The sequences shown are those of the coding strand of DNA.
Figure 4.31: Base sequence of the 3′ end of an mRNA transcript in E. coli. A stable hairpin structure is followed by a sequence of uridine (U) residues.

Eukaryotic genes encoding proteins have promoter sites with a TATAAA consensus sequence, called a TATA box or a Hogness box, centered at about −25 (Figure 4.30B). Many eukaryotic promoters also have a CAAT box with a GGNCAATCT consensus sequence centered at about −75. The transcription of eukaryotic genes is further stimulated by enhancer sequences, which can be quite distant (as many as several kilobases) from the start site, on either its 5′ or its 3′ side.

123

In E. coli, RNA polymerase proceeds along the DNA template, transcribing one of its strands until it synthesizes a terminator sequence. This sequence encodes a termination signal, which is a base-paired hairpin on the newly synthesized RNA molecule (Figure 4.31). This hairpin is formed by base-pairing of self-complementary sequences that are rich in G and C. Nascent RNA spontaneously dissociates from RNA polymerase when this hairpin is followed by a string of U residues. Alternatively, RNA synthesis can be terminated by the action of rho, a protein. Less is known about the termination of transcription in eukaryotes. A more detailed discussion of the initiation and termination of transcription will be given in Chapter 29. The important point now is that discrete start and stop signals for transcription are encoded in the DNA template.

In eukaryotes, the messenger RNA is modified after transcription (Figure 4.32). A “cap” structure, a guanosine nucleotide attached to the mRNA with an unusual 5′-5′ triphosphate linkage, is attached to the 5′ end, and a sequence of adenylates, the poly(A) tail, is added to the 3′ end. These modifications will be presented in detail in Chapter 29.

Figure 4.32: Modification of mRNA. Messenger RNA in eukaryotes is modified after transcription. A nucleotide “cap” structure is added to the 5′ end, and a poly(A) tail is added at the 3′ end.

Transfer RNAs are the adaptor molecules in protein synthesis

Figure 4.33: Attachment of an amino acid to a tRNA molecule. The amino acid (shown in blue) is esterified to the 3′-hydroxyl group of the terminal adenylate of tRNA.
[Source: J. L. Tymoczko, J. Berg, and L. Stryer, Biochemistry: A Short Course, 2nd ed. (W. H. Freeman and Company, 2013), Fig. 39.3.]

We have seen that mRNA is the template for protein synthesis. How then does it direct amino acids to become joined in the correct sequence to form a protein? In 1958, Francis Crick wrote:

RNA presents mainly a sequence of sites where hydrogen bonding could occur. One would expect, therefore, that whatever went onto the template in a specific way did so by forming hydrogen bonds. It is therefore a natural hypothesis that the amino acid is carried to the template by an adaptor molecule, and that the adaptor is the part that actually fits onto the RNA. In its simplest form, one would require twenty adaptors, one for each amino acid.

This highly innovative hypothesis soon became established as fact. The adaptors in protein synthesis are transfer RNAs. The structure and reactions of these remarkable molecules will be considered in detail in Chapter 30. For the moment, it suffices to note that tRNAs contain an amino acid-attachment site and a template-recognition site. A tRNA molecule carries a specific amino acid in an activated form to the ribosome. The carboxyl group of this amino acid is esterified to the 3′- or 2′-hydroxyl group of the ribose unit of an adenylate at the 3′ end of the tRNA molecule. The adenylate is always preceded by two cytidylates to form the CCA arm of the tRNA (Figure 4.33). The joining of an amino acid to a tRNA molecule to form an aminoacyl-tRNA is catalyzed by a specific enzyme called an aminoacyl-tRNA synthetase. This esterification reaction is driven by ATP cleavage. There is at least one specific synthetase for each of the 20 amino acids. The template-recognition site on tRNA is a sequence of three bases called an anticodon (Figure 4.34). The anticodon on tRNA recognizes a complementary sequence of three bases, called a codon, on mRNA.

Figure 4.34: General structure of an aminoacyl-tRNA. The amino acid is attached at the 3′ end of the RNA. The anticodon is the template-recognition site. Notice that the tRNA has a cloverleaf structure with many hydrogen bonds (green dots) between bases.

124