3: Chemical Basis of Information Molecules

3.1 CHEMICAL BUILDING BLOCKS OF NUCLEIC ACIDS AND PROTEINS

We start by focusing on the underlying chemical properties that control the behavior of nucleic acids and proteins. Nucleic acid structures are discussed in more depth in Chapter 6, and amino acids are discussed in Chapter 4.

Nucleic Acids Are Long Chains of Nucleotides

Deoxyribonucleic acid (DNA) and ribonucleic acid (RNA) store and transmit genetic information, in part by coding for proteins. In addition, some RNA molecules function catalytically or structurally within larger, multimolecular complexes. Both DNA and RNA are composed of building blocks (monomers) called nucleotides, which are linked together by phosphodiester bonds to form long, unbranched chains. Nucleic acids can reach chain lengths of up to many millions of nucleotides and molecular masses of up to several billion daltons. A nucleotide molecule has three components: a nitrogenous base, a five-carbon (pentose) sugar, and a phosphate group. A base and sugar without the phosphate group is referred to as a nucleoside. In DNA or RNA molecules, the sugars and phosphates of adjacent, individual nucleotides are chemically linked to form a sugar–phosphate backbone that has a characteristic directionality. This directionality is defined by the chemical convention for numbering carbon atoms in the nucleotide sugar ring, giving rise to a 5′ end and a 3′ end (see Figure 3-1). The directionality of DNA and RNA is discussed in detail in Chapter 6.

Figure 3-1: Chemical building blocks of DNA and RNA. (a) Chemical differences between the major nucleotides of DNA (left) and RNA (right). Atoms in the base ring are numbered sequentially, starting with the nitrogen that is bound to the sugar; carbon numbering in the sugar ring is denoted by a prime (′). DNA nucleotides contain the sugar deoxyribose, whereas RNA nucleotides contain ribose (light gray). The difference between these two sugars is a single hydroxyl group on the 2′carbon (red). In addition, DNA contains the base thymine, rather than the uracil found in RNA; these bases differ by a methyl group on carbon 5 of the base (red). (b) Segments of deoxyribonucleotide (DNA, left) and ribonucleotide (RNA, right) chains. Phosphodiester bonds connect the individual nucleotide units; the sugar–phosphate backbone (outlined) is polar and directional, with free 5′ and 3′ ends. The 2′ hydrogens of DNA and the 2′-hydroxyl groups of RNA are highlighted in red. The color coding of bases, here and throughout the figures in this book, is as follows: cytosine, yellow; adenine, light blue; guanine, green; thymine, dark blue; uracil, purple.

The nucleotides that make up DNA polymers are deoxyribonucleotides (Figure 3-1a, left), named for the type of pentose sugar found in DNA: deoxyribose. Whereas the phosphate group and the type of pentose remain constant, each deoxyribonucleotide contains one of four different nitrogenous bases: thymine (T), adenine (A), guanine (G), or cytosine (C) (Figure 3-1b, left). The type of base establishes the identity of each individual deoxyribonucleotide. Thus, the information in DNA is written in a four-letter alphabet.

Chemically, RNA is very similar to DNA. Like DNA, it is a long, unbranched polymer of nucleotides. And like DNA nucleotides, all RNA nucleotides contain a pentose and a phosphate group, and one of four different nitrogenous bases. Two small differences in their chemical components, however, give rise to important distinctions between the structures and functions of RNA and DNA. The first is the type of pentose present. RNA nucleotides contain ribose and thus are named ribonucleotides (see Figure 3-1a, right). Ribose has one more hydroxyl (–OH) group on the sugar ring than does deoxyribose, which defines the RNA polynucleotide as ribonucleic acid rather than deoxyribonucleic acid. The second distinction is the assortment of nitrogenous bases found in RNA. Ribonucleotides contain three of the same bases found in DNA—adenine, guanine, and cytosine—but instead of thymine, the fourth base in RNA is uracil (U) (see Figure 3-1b, right). Uracil is structurally identical to thymine except for the absence of the methyl (–CH₃) group. The nucleotides of DNA and RNA are represented by both three-letter and one-letter abbreviations (Table 3-1).

KEY CONVENTION

DNA and RNA are defined by the type of sugar in the polynucleotide backbone (deoxyribose or ribose), not by the presence of thymine or uracil.

Even with just four types of nucleotides each, the number of possible DNA and RNA sequences (4ⁿ, where n is the number of nucleotides in the sequence) is enormous for even the shortest molecules. Thus, an almost infinite number of distinct genetic messages can exist.

Proteins Are Long Polymers of Amino Acids

Proteins, like nucleic acids, are unbranched polymers. The building blocks of protein chains are amino acids (Figure 3-2a). When amino acids are joined together by a peptide bond between the amino group of one amino acid and the carboxyl group of another, a peptide is formed. Longer chains of amino acids are called polypeptides (Figure 3-2b). Polypeptides have a characteristic directionality defined by the free amino group of the amino acid at one end of the polymer (the amino terminus, or N-terminus) and the free carboxyl group at the other end (the carboxyl terminus, or C-terminus). Once incorporated into a polypeptide chain, the individual amino acids are referred to as amino acid residues (see Chapter 4). A functional protein may be formed from one polypeptide chain or from several interacting polypeptides. Proteins are abundant in all cells. They perform many functions, including catalyzing biochemical reactions, serving structural roles, receiving and transmitting chemical signals within and among cells, and transporting specific ions and molecules across cellular membranes. Most proteins found in cells and viruses are composed of just 20 different amino acids.

Figure 3-2: Chemical building blocks of proteins. (a) The structure of an amino acid. The central carbon atom (C_α) bonds to an amino group (blue), a carboxyl group (pink), a hydrogen, and a side chain (R, purple). The amino and carboxyl groups are shown in the ionized forms found in solution at physiological pH. (b) A segment of a polypeptide chain. Note that polypeptide chains have directionality, with a free amino group at one end (the amino terminus, or N-terminus) and a free carboxyl group at the other (carboxyl terminus, or C-terminus). Peptide bonds connect the amino acid residues in the polypeptide chain.

All 20 common amino acids have a similar structure: a central carbon atom, the alpha carbon atom (α carbon, or C_α), bonded to four different chemical groups. For this reason, they are called α-amino acids. The α-amino acids have a carboxyl (–COOH) group, an amino (–NH₂) group, and a hydrogen atom, all bonded to the α carbon. Each amino acid also has a unique side chain, or R group, bonded to the α carbon atom (see Figure 3-2). The R groups vary in structure, size, electrical charge, and hydrophobicity. The diverse chemical properties of R groups are what give proteins the ability to form many different three-dimensional structures and to perform many different kinds of activities in biological systems. The 20 common amino acids are represented by both three-letter and one-letter abbreviations, which are used to indicate the composition and amino acid sequence of proteins. (The functional diversity of R groups and the nomenclature for the 20 common amino acids are discussed in detail in Chapter 4.) There are also many less-common amino acids, found both in proteins and as cellular constituents not incorporated into proteins. Note that with 20 different amino acid building blocks, the number of possible protein sequences (20ⁿ, where n is the number of amino acids in the sequence) is vast.

Chemical Composition Helps Determine Nucleic Acid and Protein Structure

The fact that some of the crucial requirements for life are met by polymeric molecules makes good sense, from a biosynthetic standpoint. As we have noted, a huge variety of nucleic acids and proteins can be produced by varying the sequence of nucleotide or amino acid monomers in the chains. DNA molecules are typically many millions of nucleotides long, but they form relatively uniform overall structures in which the nucleotide bases in two strands pair up along their length to produce a double helix (Figure 3-3a). RNA molecules, except for those that store the genetic information of viruses, are much shorter and more structurally diverse than DNA. A single strand of RNA can fold back on itself to form short helices that come together in a three-dimensional shape (Figure 3-3b). These differences between DNA and RNA structure stem from the role of RNA’s 2′-hydroxyl groups in altering the shape and chemical properties of the sugar–phosphate backbone (which we explore in Chapter 6).

Of all biological polymers, proteins have the greatest variety of three-dimensional structures and range of functional groups, resulting from the different types of amino acid side chains. This variety underlies the role of proteins as the primary catalysts of chemical reactions (Figure 3-4). Of course, proteins also perform many other, noncatalytic cellular functions, made possible by the chemical diversity of their amino acid building blocks.

Figure 3-3: The helical structure of DNA and RNA. In these representations, the sugar–phosphate backbone is shown as a solid bar with the bases extending away from the backbone and available for base pairing. (a) Ribbon model of a DNA double helix, consisting of two strands of DNA. Base pairs in the helix twist around the central axis. (b) Ribbon models of three RNA molecules, each consisting of a single strand of RNA: phenylalanine-tRNA from yeast, a self-cleaving RNA from the hepatitis delta virus (HDV), and a self-splicing intron from Tetrahymena. Each RNA includes short stretches of helical structure that fold into a three-dimensional shape. As discussed in the text, the differences in chemical structure between DNA and RNA are the basis for the differences in the three-dimensional structures that they form.

Figure 3-4: Examples of protein structures. Proteins can form a wide range of three-dimensional structures due to the variety of chemical properties of the 20 common amino acids. Shown here are (a) calmodulin, a Ca²⁺-binding protein; (b) Dicer, an enzyme that cleaves double-stranded RNA; and (c) hemoglobin, the oxygen carrier in red blood cells. See Section 4.3 and Figure 4-10 for an explanation of how the molecular structures of proteins are represented throughout this book.

Chemical Composition Can Be Altered by Postsynthetic Changes

Chemical modifications of nucleotides and amino acids often occur after a DNA, RNA, or protein molecule has been synthesized. Sometimes these modifications are required for the molecule to attain its biologically active structure or to bind other molecules.

The primary modification of DNA nucleotides is the addition of methyl (–CH₃) groups to the C, A, and G bases (Figure 3-5a). DNA base methylation is critical for accurate DNA replication and, in bacteria, for the protection of DNA from degradative enzymes; in human and other eukaryotic cells, it is essential for activating and silencing gene expression. RNA molecules can be modified in a greater variety of ways, including the addition of methyl groups to the nucleotide bases or to the 2′-hydroxyl group of the ribose and the substitution of less-common bases for the usual A, C, G, or U (Figure 3-5b). Such chemical changes affect the ability of RNA molecules to fold into their correct three-dimensional structure and to interact with proteins.

Figure 3-5: Chemical modification of nucleotide sugars and bases. (a) Examples of methylation modifications in DNA nucleotides. The extra methyl group on 5-methylcytidine, N⁶-methyladenosine, and N²-methylguanosine is highlighted in red. (b) Examples of modifications in RNA nucleotides. Loss of the amino group attached to the guanine ring (the exocyclic amine) gives rise to inosine; uridine can be methylated at the 2′ position to produce 2′-O-methyluridine. Modification sites are indicated in red.

Proteins are often modified by the addition of chemical groups to specific amino acid residues within a polypeptide chain. More than 300 types of amino acid modifications are known to occur in proteins; a few are particularly common. For example, addition of phosphate groups to hydroxyl groups in the side chains of serine, tyrosine, and threonine can dramatically change a protein’s shape and function. The phosphorylation and dephosphorylation of proteins is an important mechanism by which signals are transmitted within and among cells. Proteins are sometimes modified by the addition of sugars (glycosylation) or methyl groups, with functional consequences (Figure 3-6). For example, glycosylated proteins provide chemical signatures on the surfaces of cells that help distinguish “self” from “nonself.” Another common protein modification is the addition of acetyl groups to lysine side chains. Lysine acetylation—and deacetylation—plays a central role in the production of proteins from particular genes. All these chemical modifications can substantially change the behavior of proteins, as we discuss in Chapters 5, 18, and 19 when taking a closer look at the varied functions of proteins.

Figure 3-6: Chemical modification of some amino acid residues. Arginine can be methylated; the hydroxyl (–OH) group of serine is a frequent site of phosphorylation, as are those of tyrosine and threonine (not shown); lysine can be acetylated; asparagine can be glycosylated; and proline is sometimes hydroxylated. In each case, the modification alters the behavior of the protein containing the changed amino acid residue. Modification sites are indicated in red.

SECTION 3.1 SUMMARY

Polymeric molecules play crucial roles in all organisms.
The nucleic acids, DNA and RNA, are polymers of nucleotides. Each nucleotide has three components: a deoxyribose (in DNA) or ribose (in RNA) pentose sugar, a phosphate group, and a nitrogenous base. The four bases in DNA are adenine, guanine, cytosine, and thymine; the four bases in RNA are adenine, guanine, cytosine, and uracil.
DNA and RNA are chemically similar, with two small differences that have significant functional consequences, including different helical geometries, three-dimensional shapes, and protein-binding abilities. The ribose of RNA has a hydroxyl (–OH) group on the 2′carbon of the sugar ring, but the deoxyribose of DNA does not; and, instead of thymine, RNA nucleotides contain uracil, an unmethylated form of the thymine base. The nucleotides of DNA or RNA are linked into chains by phosphodiester bonds.
Proteins are polymers of amino acids. Twenty amino acid building blocks are commonly found in proteins, each consisting of a central α-carbon atom bonded to four different groups: a carboxyl group, an amino group, an R group, and a hydrogen atom. The R groups, or side chains, have chemical properties that contribute to the functional and structural diversity of proteins. The amino acid residues in proteins are linked by peptide bonds.
DNA molecules form a two-stranded double helix, whereas RNA molecules are mainly found as single polynucleotide strands that fold back on themselves to create various three-dimensional shapes. Protein structures are even more diverse, due in part to the different chemical properties of the amino acid side chains.
Postsynthetic chemical modifications of DNA, RNA, and proteins can dramatically affect the structure and biological activity of these macromolecules. In DNA, methylation of bases A, C, and G is common and leads to changes in gene expression. In RNA, modifications are more varied and include methylation of bases and/or ribose and other, more substantial alterations of bases. Protein modifications include the addition of phosphate, sugar, methyl, acetyl, and hydroxyl groups to specific amino acid side chains.