Types of DNA Sequences in Eukaryotes

Eukaryotic DNA consists of at least three sequence types: unique-sequence DNA, moderately repetitive DNA, and highly repetitive DNA. Unique-sequence DNA consists of sequences that are present only once or, at most, a few times in the genome. This DNA includes sequences that encode proteins as well as a great deal of DNA whose function is unknown. Genes that are present in a single copy constitute roughly 25% to 50% of the protein-encoding genes in most multicellular eukaryotes. Other genes within unique-sequence DNA are present in several similar, but not identical, copies and together are referred to as a gene family. Most gene families arose through duplication of an existing gene and include just a few member genes, but some, such as those that encode immunoglobulin proteins in vertebrates, contain hundreds of members. The genes that encode β-like globins are another example of a gene family. In humans, there are seven β-globin genes, clustered together on chromosome 11. The polypeptides encoded by these genes join with α-globin polypeptides to form hemoglobin molecules, which transport oxygen in the blood.

Other sequences, called repetitive DNA, exist in many copies. Some eukaryotic organisms have large amounts of repetitive DNA; for example, almost half the human genome consists of repetitive DNA. A major class of repetitive DNA is moderately repetitive DNA, which typically consists of sequences from 150 to 300 bp in length (although they may be longer) that are repeated many thousands of times. Some of these sequences perform important functions for the cell; for example, the genes for ribosomal RNAs (rRNAs) and transfer RNAs (tRNAs) make up a part of the moderately repetitive DNA. However, the function of much moderately repetitive DNA is unknown, and indeed, it may have no function.

Moderately repetitive DNA itself encompasses two types of repeats. Tandem repeat sequences appear one after another and tend to be clustered at particular locations on the chromosomes. Interspersed repeat sequences are scattered throughout the genome. An example of an interspersed repeat is the Alu sequence, an approximately 300-bp sequence that is present more than a million times and constitutes 11% of the human genome, although it has no obvious cellular function. Short repeats, such as the Alu sequences, are called SINEs (short interspersed elements). Longer interspersed repeats consisting of several thousand base pairs are called LINEs (long interspersed elements). One class of LINE, called LINE1, constitutes about 17% of the human genome. Most interspersed repeats are the remnants of transposable elements, sequences that can multiply and move (see Chapter 13).

The other major class of repetitive DNA is highly repetitive DNA. These short sequences, often less than 10 bp in length, are present in hundreds of thousands to millions of copies that are repeated in tandem and clustered in certain regions of the chromosome, especially at centromeres and telomeres. Highly repetitive DNA is rarely transcribed into RNA, and most highly repetitive DNA has no known function.