The genomes of several hundred different eukaryotic species have been completely sequenced, including a number of fungi and protists, several insects, almost 30 species of plants, and numerous vertebrates. Sequenced eukaryotes include papayas, corn, rice, sorghum, grapevines, silkworms, several fruit flies, aphids, mosquitoes, anemones, mice, rats, dogs, cows, horses, orangutans, chimpanzees, and humans. Even the genomes of some extinct organisms have now been sequenced, including the genomes of the woolly mammoth and Neanderthal people. Hundreds of additional eukaryotic genomes are in the process of being sequenced. It is important to note, however, that even though the genomes of these organisms have been “completely sequenced,” many of the final assembled sequences contain gaps, and regions of heterochromatin (highly condensed DNA that contains few genes) may not have been sequenced at all. Thus, the sizes of eukaryotic genomes are often estimates, and the number of base pairs given as the genome size of a particular species may vary. Predicting the number of genes that are present in a eukaryotic genome is also difficult, and estimates may vary depending on the assumptions made and the particular gene-
GENOME SIZE AND NUMBER OF GENES The genomes of eukaryotic organisms (Table 15.2) are larger than those of prokaryotes, and in general, multicellular eukaryotes have more DNA than do simple, single-
Species | Genome size (millions of base pairs) | Number of predicted genes |
---|---|---|
Saccharomyces cerevisiae (yeast) | 12 | 6,144 |
Physcomitrella patens (moss) | 480 | 38,354 |
Arabidopsis thaliana (plant) | 125 | 25,706 |
Zea mays (corn) | 2,400 | 32,000 |
Caenorhabditis elegans (nematode) | 103 | 20,598 |
Drosophila melanogaster (fruit fly) | 170 | 13,525 |
Anopheles gambiae (mosquito) | 278 | 14,707 |
Danio rerio (zebrafish) | 1,465 | 22,409 |
Takifugu rubripes (tiger pufferfish) | 329 | 22,089 |
Xenopus tropicalis (clawed frog) | 1,510 | 18,429 |
Anolis carolinensis (anole lizard) | 1,780 | 17,792 |
Mus musculus (mouse) | 2,627 | 26,762 |
Pan troglodytes (chimpanzee) | 2,733 | 22,524 |
Homo sapiens (human) | 3,223 | 20,000 |
Source: Data from Ensembl website: http:/ |
In general, eukaryotic genomes also contain more genes than do the genomes of prokaryotes (although some large bacteria have more genes than single-
Eukaryotic genomes contain multiple copies of many genes, indicating that gene duplication has been an important process in genome evolution. Many genes in eukaryotes are interrupted by introns. In the more complex eukaryotes, both the number and the length of the introns are greater.
NONCODING DNA Most eukaryotic organisms contain vast amounts of DNA that do not encode proteins. For example, only about 1.5% of the human genome consists of DNA that directly specifies the amino acids of proteins. The function of the remaining DNA sequences, called noncoding DNA, has long been in question. Some research has suggested that much of the genome is “junk DNA” with no function. For example, Marcelo Nóbrega and his colleagues genetically engineered mice that were missing a large chromosomal region with no protein-
Other research, however, has suggested that gene deserts may contain sequences that have a functional role. For example, genome-
In 2002, the Encyclopedia of DNA Elements (ENCODE) project was undertaken to determine whether noncoding DNA had any function. Researchers cataloged all nucleotides within the genome that provide some function, including sequences that encode proteins and RNA molecules and those that serve as control sites for gene expression. This 10- TRY PROBLEM 21
TRANSPOSABLE ELEMENTS A substantial part of the genomes of most multicellular organisms consists of moderately and highly repetitive sequences (see Chapter 8), and the percentage of repetitive sequences is usually higher in those species with larger genomes (Table 15.3). Most of these repetitive sequences appear to have arisen through transposition. In the human genome, 45% of the DNA is derived from transposable elements, many of which are defective and no longer able to move. In corn, 85% of the genome is derived from transposable elements.
Organism | Percentage of genome |
---|---|
Arabidopsis thaliana (plant) | 10.5 |
Zea mays (corn) | 85.0 |
Caenorhabditis elegans (nematode) | 6.5 |
Drosophila melanogaster (fruit fly) | 3.1 |
Takifugu rubripes (tiger pufferfish) | 2.7 |
Homo sapiens (human) | 44.4 |
PROTEIN DIVERSITY In spite of only a modest increase in gene number, vertebrates have considerably more protein diversity than do invertebrates. One way to measure protein diversity is by counting the number of protein domains, which are characteristic parts of proteins that are often associated with a function. Vertebrate genomes do not encode more protein domains than do invertebrate genomes; for example, there are 1262 domains in humans, compared with 1035 in fruit flies. However, the existing domains in humans are assembled into more combinations, leading to many more types of proteins.
Genome size varies greatly among eukaryotic species. For multicellular eukaryotic organisms, there is no clear relation between organismal complexity and amount of DNA or gene number. A substantial part of the genome in eukaryotic organisms consists of repetitive DNA, much of which is derived from transposable elements.