4.1 A Nucleic Acid Consists of Four Kinds of Bases Linked to a Sugar–Phosphate Backbone

The nucleic acids DNA and RNA are well suited to function as the carriers of genetic information by virtue of their covalent structures. These macromolecules are linear polymers built up from similar units connected end to end (Figure 4.1). Each monomer unit within the polymer is a nucleotide. A single nucleotide unit consists of three components: a sugar, a phosphate, and one of four bases. The sequence of bases in the polymer uniquely characterizes a nucleic acid and constitutes a form of linear information—information analogous to the letters that spell a person’s name.

Figure 4.1: Polymeric structure of nucleic acids.

RNA and DNA differ in the sugar component and one of the bases

The sugar in deoxyribonucleic acid (DNA) is deoxyribose. The prefix deoxy indicates that the 2′-carbon atom of the sugar lacks the oxygen atom that is linked to the 2′-carbon atom of ribose, as shown in Figure 4.2. Note that sugar carbons are numbered with primes to differentiate them from atoms in the bases. The sugars in both nucleic acids are linked to one another by phosphodiester bridges. Specifically, the 3′-hydroxyl (3′-OH) group of the sugar moiety of one nucleotide is esterified to a phosphate group, which is, in turn, joined to the 5′-hydroxyl group of the adjacent sugar. The chain of sugars linked by phosphodiester bridges is referred to as the backbone of the nucleic acid (Figure 4.3). Whereas the backbone is constant in a nucleic acid, the bases vary from one monomer to the next. Two of the bases of DNA are derivatives of purine—adenine (A) and guanine (G)—and two of pyrimidine—cytosine (C) and thymine (T), as shown in Figure 4.4.

Figure 4.2: Ribose and deoxyribose. Atoms in sugar units are numbered with primes to distinguish them from atoms in bases (see Figure 4.4).
Figure 4.3: Backbones of DNA and RNA. The backbones of these nucleic acids are formed by 3′-to-5′ phosphodiester linkages. A sugar unit is highlighted in red and a phosphate group in blue.
Figure 4.4: Purines and pyrimidines. Atoms within bases are numbered without primes. Uracil is present in RNA instead of thymine.

107

Ribonucleic acid (RNA), like DNA, is a long unbranched polymer consisting of nucleotides joined by 3′-to-5′ phosphodiester linkages (Figure 4.3). The covalent structure of RNA differs from that of DNA in two respects. First, the sugar units in RNA are riboses rather than deoxyriboses. Ribose contains a 2′-hydroxyl group not present in deoxyribose. Second, one of the four major bases in RNA is uracil (U) instead of thymine (T).

Note that each phosphodiester bridge has a negative charge. This negative charge repels nucleophilic species such as hydroxide ions, which are capable of hydrolytic attack on the phosphate backbone. This resistance is crucial for maintaining the integrity of information stored in nucleic acids. The absence of the 2′-hydroxyl group in DNA further increases its resistance to hydrolysis. The greater stability of DNA probably accounts for its use rather than RNA as the hereditary material in all modern cells and in many viruses.

Nucleotides are the monomeric units of nucleic acids

The building blocks of nucleic acids and the precursors of these building blocks play many other roles throughout the cell—for instance, as energy currency and as molecular signals. Consequently, it is important to be familiar with the nomenclature of nucleotides and their precursors. A unit consisting of a base bonded to a sugar is referred to as a nucleoside. The four nucleoside units in RNA are called adenosine, guanosine, cytidine, and uridine, whereas those in DNA are called deoxyadenosine, deoxyguanosine, deoxycytidine, and thymidine. In each case, N-9 of a purine or N-1 of a pyrimidine is attached to C-1′ of the sugar by an N-glycosidic linkage (Figure 4.5). The base lies above the plane of the sugar when the structure is written in the standard orientation; that is, the configuration of the N-glycosidic linkage is β (Section 11.1). Note that thymidine contains deoxyribose; by convention, the prefix deoxy is not added because thymine-containing nucleosides are only rarely found in RNA.

Figure 4.5: β-Glycosidic linkage in a nucleoside.

A nucleotide is a nucleoside joined to one or more phosphoryl groups by an ester linkage. Nucleotide triphosphates, nucleosides joined to three phosphoryl groups, are the monomers—the building blocks—that are linked to form RNA and DNA. The four nucleotide units that link to form DNA are nucleotide monophosphates called deoxyadenylate, deoxyguanylate, deoxycytidylate, and thymidylate. Similarly, the most common nucleotides that link to form RNA are nucleotide monophosphates adenylate, guanylate, cytidylate and uridylate.

108

This nomenclature does not describe the number of phosphoryl groups or the site of attachment to carbon of the ribose. A more precise nomenclature is also commonly used. A compound formed by the attachment of a phosphoryl group to C-5′ of a nucleoside sugar (the most common site of phosphate esterification) is called a nucleoside 5-phosphate or a 5-nucleotide. In this naming system for nucleotides, the number of phosphoryl groups and the attachment site are designated. Look, for example, at adenosine 5-triphosphate (ATP; Figure 4.6). This nucleotide is tremendously important because, in addition to being a building block for RNA, it is the most commonly used energy currency. The energy released from cleavage of the triphosphate group is used to power many cellular processes (Chapter 15). Another nucleotide is deoxyguanosine 3′-monophosphate (3′-dGMP; Figure 4.6). This nucleotide differs from ATP in that it contains guanine rather than adenine, deoxyribose rather than ribose (indicated by the prefix “d”), and one rather than three phosphoryl groups. In addition, the phosphoryl group is esterified to the hydroxyl group in the 3′ rather than the 5′ position.

Figure 4.6: Nucleotides adenosine 5′-triphosphate (5′-ATP) and deoxyguanosine 3′-monophosphate (3′-dGMP).

DNA molecules are very long and have directionality

Scientific communication frequently requires the sequence of a nucleic acid—in some cases, a sequence thousands of nucleotides in length—to be written like that in Section 1.4. Rather than writing the cumbersome chemical structures, scientists have adopted the use of abbreviations. The abbreviated notations pApCpG or ACG denote a trinucleotide of DNA consisting of the building blocks deoxyadenylate monophosphate, deoxycytidylate monophosphate, and deoxyguanylate monophosphate linked by a phosphodiester bridge, where “p” denotes a phosphoryl group (Figure 4.7). The 5′ end will often have a phosphoryl group attached to the 5′-OH group. Note that, like a polypeptide (Section 2.2), a DNA chain has directionality, commonly called polarity. One end of the chain has a free 5′-OH group (or a 5′-OH group attached to a phosphoryl group) and the other end has a free 3′-OH group, neither of which is linked to another nucleotide. By convention, the base sequence is written in the 5-to-3direction. Thus, ACG indicates that the unlinked 5′-OH group is on deoxyadenylate, whereas the unlinked 3′-OH group is on deoxyguanylate. Because of this polarity, ACG and GCA correspond to different compounds.

Figure 4.7: Structure of a DNA strand. The strand has a 5′ end, which is usually attached to a phosphoryl group, and a 3′ end, which is usually a free hydroxyl group.

109

A striking characteristic of naturally occurring DNA molecules is their length. A DNA molecule must comprise many nucleotides to carry the genetic information necessary for even the simplest organisms. For example, the DNA of a virus such as polyoma, which can cause cancer in certain organisms, consists of two paired strands of DNA, each 5100 nucleotides in length. The E. coli genome is a single DNA molecule consisting of two strands of 4.6 million nucleotides each (Figure 4.8).

Figure 4.8: Electron micrograph of part of the E. coli genome.
[Dr. Gopal Murti/Science Photo Library/Photo Researchers.]

The DNA molecules of higher organisms can be much larger. The human genome comprises approximately 3 billion nucleotides in each strand of DNA, divided among 24 distinct molecules of DNA called chromosomes (22 autosomal chromosomes plus the X and Y sex chromosomes) of different sizes. One of the largest known DNA molecules is found in the Indian muntjac, an Asiatic deer; its genome is nearly as large as the human genome but is distributed on only 3 chromosomes (Figure 4.9). The largest of these chromosomes has two strands of more than 1 billion nucleotides each. If such a DNA molecule could be fully extended, it would stretch more than 1 foot in length. Some plants contain even larger DNA molecules.

Figure 4.9: The Indian muntjac and its chromosomes. Cells from a female Indian muntjac (right) contain three pairs of very large chromosomes (stained orange). The cell shown is a hybrid containing a pair of human chromosomes (stained green) for comparison.
[(Left) Hugh Lansdown/Shutterstock. (Right) J.–Y. Lee, M. Koi, E. J. Stanbridge, M. Oshimura, A. T. Kumamoto, and A. P. Feinberg. Nat. Genet. 7:30, 1994.]