Eukaryotic organisms differ dramatically in the amount of DNA per cell, a quantity termed an organism’s C value (Table 11.3). Each cell of a fruit fly, for example, contains 35 times the amount of DNA found in a cell of the bacterium E. coli. In general, eukaryotic cells contain more DNA than prokaryotic cells do, but variability in the C values of different eukaryotes is huge. Human cells contain more than 10 times the amount of DNA found in Drosophila cells, whereas some salamander cells contain 20 times as much DNA as that in human cells. Clearly, these differences in C value cannot be explained simply by differences in organismal complexity. So, what is all of the extra DNA in eukaryotic cells doing? This question has been termed the C-value paradox. We do not yet have a complete answer to the C-value paradox, but eukaryotic DNA sequences reveal a complexity that is absent from prokaryotic DNA.
Organism | Approximate Genome Size (bp) |
λ (bacteriophage) | 50,000 |
Escherichia coli (bacterium) | 4,640,000 |
Saccharomyces cerevisiae (yeast) | 12,000,000 |
Arabidopsis thaliana (plant) | 125,000,000 |
Drosophila melanogaster (insect) | 170,000,000 |
Homo sapiens (human) | 3,200,000,000 |
Zea mays (corn) | 4,500,000,000 |
Amphiuma (salamander) | 765,000,000,000 |
The first clue that eukaryotic DNA contains several types of sequences not present in prokaryotic DNA came from studies in which double-stranded DNA was separated and then allowed to reassociate. When double-stranded DNA in solution is heated, the hydrogen bonds that hold the two strands together are weakened and, with enough heat, the two nucleotide strands separate completely, a process called denaturation or melting. The temperature at which DNA denatures, called the melting temperature (Tm), depends on the base sequence of the particular sample of DNA: G–C base pairs have three hydrogen bonds, whereas A–T base pairs only have two; so the separation of G–C pairs requires more heat (energy) than does the separation of A–T pairs.
The denaturation of DNA by heating is reversible; if single-stranded DNA is slowly cooled, single strands will collide and hydrogen bonds will form again between complementary base pairs, producing double-stranded DNA. This reaction is called renaturation or reannealing.
Two single-stranded molecules of DNA from different sources, such as different organisms, will anneal if they are complementary, a process termed hybridization. For hybridization to take place, the two strands do not have to be complementary at all their bases—just at enough bases to hold the two strands together. The extent of hybridization can be used to measure the similarity of nucleic acids from two different sources and for assessing evolutionary relationships. The rate at which hybridization takes place also provides information about the sequence complexity of DNA. TRY PROBLEM 26
Eukaryotic DNA consists of at least three types of sequences: unique-sequence DNA, moderately repetitive DNA, and highly repetitive DNA. Unique-sequence DNA consists of sequences that are present only once or, at most, a few times in the genome. This DNA includes sequences that encode proteins, as well as a great deal of DNA whose function is unknown. Genes that are present in a single copy constitute from roughly 25% to 50% of the protein-encoding genes in most multicellular eukaryotes. Other genes within unique-sequence DNA are present in several similar, but not identical, copies and together are referred to as a gene family. Most gene families arose through duplication of an existing gene and include just a few member genes, but some, such as those that encode immunoglobulin proteins in vertebrates, contain hundreds of members. The genes that encode β-like globins are another example of a gene family. In humans, there are seven β-globin genes, clustered together on chromosome 11. The polypeptides encoded by these genes join with α-globin polypeptides to form hemoglobin molecules, which transport oxygen in the blood.
309
Other sequences exist in many copies and are called repetitive DNA. Some eukaryotic organisms have large amounts of repetitive DNA; for example, almost half of the human genome consists of repetitive DNA. A major class of repetitive DNA is called moderately repetitive DNA, which typically consists of sequences from 150 to 300 bp in length (although they may be longer) that are repeated many thousands of times. Some of these sequences perform important functions for the cell; for example, the genes for ribosomal RNAs (rRNAs) and transfer RNAs (tRNAs) make up a part of the moderately repetitive DNA. Moderately repetitive DNA itself is of two types of repeats. Tandem repeat sequences appear one after another and tend to be clustered at particular locations on the chromosomes. Interspersed repeat sequences are scattered throughout the genome. An example of an interspersed repeat is the Alu sequence, an approximately 300-bp sequence that is present more than a million times and comprises 11% of the human genome, although it has no obvious cellular function. Short repeats, such as the Alu sequences, are called SINEs (short interspersed elements). Longer interspersed repeats consisting of several thousand base pairs are called LINEs (long interspersed elements). One class of LINE, called LINE1, comprises about 17% of the human genome. Most interspersed repeats are the remnants of transposable elements, sequences that can multiply and move (see Chapter 18).
The other major class of repetitive DNA is highly repetitive DNA. These short sequences, often less than 10 bp in length, are present in hundreds of thousands to millions of copies that are repeated in tandem and clustered in certain regions of the chromosome, especially at centromeres and telomeres. Highly repetitive DNA is sometimes called satellite DNA, because its percentages of the four bases differ from those of other DNA sequences and, therefore, it separates as a satellite fraction when centrifuged at high speeds in a density gradient. Highly repetitive DNA is rarely transcribed into RNA. Although these sequences may contribute to centromere and telomere function, most highly repetitive DNA has no known function.
DNA renaturation reactions and, more recently, direct sequencing of eukaryotic genomes also tell us a lot about how genetic information is organized within chromosomes. We now know that the density of genes varies greatly among and within chromosomes. For example, human chromosome 19 has a high density of genes, with about 26 genes per million base pairs. Chromosome 13, on the other hand, has only about 6.5 genes per million base pairs. Gene density can also vary within different regions of the same chromosome: some parts of the long arm of chromosome 13 have only 3 genes per million base pairs, whereas other parts have almost 30 genes per million base pairs. And the short arm of chromosome 13 contains almost no genes, consisting entirely of heterochromatin.
The functional role of DNA sequences that do not encode proteins, including repetitive DNA, has recently been addressed by the Encyclopedia of DNA Elements (ENCODE) project (see Chapter 20). The purpose of ENCODE was to identify all nucleotides within the human genome that have some function. The project concluded that much of the genome is transcribed and at least 80% of the sequences are functional. Many of the functional sequences appear to help control gene expression.
Eukaryotic DNA comprises three major classes: unique-sequence DNA, moderately repetitive DNA, and highly repetitive DNA. Unique-sequence DNA consists of sequences that exist in one or a few copies; moderately repetitive DNA consists of sequences that may be several hundred base pairs in length and is present in thousands to hundreds of thousands of copies. Highly repetitive DNA consists of very short sequences repeated in tandem and is present in hundreds of thousands to millions of copies. The density of genes varies greatly among and even within chromosomes.
CONCEPT CHECK 7
Most of the genes that encode proteins are found in