Eukaryotic Genomes

The genomes of several hundred different eukaryotic species have been completely sequenced, including a number of fungi and protists, several insects, almost 30 species of plants, and numerous vertebrates. Sequenced eukaryotes include papayas, corn, rice, sorghum, grapevines, silkworms, several fruit flies, aphids, mosquitoes, anemones, mice, rats, dogs, cows, horses, orangutans, chimpanzees, and humans. Even the genomes of some extinct organisms have now been sequenced, including the genomes of the woolly mammoth and Neanderthal people. Hundreds of additional eukaryotic genomes are in the process of being sequenced. It is important to note, however, that even though the genomes of these organisms have been “completely sequenced,” many of the final assembled sequences contain gaps, and regions of heterochromatin (highly condensed DNA that contains few genes) may not have been sequenced at all. Thus, the sizes of eukaryotic genomes are often estimates, and the number of base pairs given as the genome size of a particular species may vary. Predicting the number of genes that are present in a eukaryotic genome is also difficult, and estimates may vary depending on the assumptions made and the particular gene-finding software used.

416

GENOME SIZE AND NUMBER OF GENES The genomes of eukaryotic organisms (Table 15.2) are larger than those of prokaryotes, and in general, multicellular eukaryotes have more DNA than do simple, single-celled eukaryotes such as yeast. However, there is no close relation between genome size and complexity among the multicellular eukaryotes. For example, the mosquito (Anopheles gambiae) and fruit fly (Drosophila melanogaster) are both insects with similar structural complexity, yet the mosquito has 60% more DNA than the fruit fly.

TABLE 15.2 Characteristics of representative eukaryotic genomes that have been completely sequenced
Species Genome size (millions of base pairs) Number of predicted genes
Saccharomyces cerevisiae (yeast) 12 6,144
Physcomitrella patens (moss) 480 38,354
Arabidopsis thaliana (plant) 125 25,706
Zea mays (corn) 2,400 32,000
Caenorhabditis elegans (nematode) 103 20,598
Drosophila melanogaster (fruit fly) 170 13,525
Anopheles gambiae (mosquito) 278 14,707
Danio rerio (zebrafish) 1,465 22,409
Takifugu rubripes (tiger pufferfish) 329 22,089
Xenopus tropicalis (clawed frog) 1,510 18,429
Anolis carolinensis (anole lizard) 1,780 17,792
Mus musculus (mouse) 2,627 26,762
Pan troglodytes (chimpanzee) 2,733 22,524
Homo sapiens (human) 3,223 20,000

Source: Data from Ensembl website: http://useast.ensembl.org/index.html and plants.ensembl.org/index.html.

In general, eukaryotic genomes also contain more genes than do the genomes of prokaryotes (although some large bacteria have more genes than single-celled yeast), and the genomes of multicellular eukaryotes have more genes than do the genomes of single-celled eukaryotes. In contrast to prokaryotes, there is no correlation between genome size and the number of genes in eukaryotes. Nor is the number of genes among multicellular eukaryotes obviously related to phenotypic complexity: humans have more genes than do invertebrates, but only twice as many as fruit flies and fewer than the plant A. thaliana. The nematode C. elegans has more genes than D. melanogaster, but is less complex. The pufferfish has only about one-tenth the amount of DNA present in humans and mice, but about as many genes.

Eukaryotic genomes contain multiple copies of many genes, indicating that gene duplication has been an important process in genome evolution. Many genes in eukaryotes are interrupted by introns. In the more complex eukaryotes, both the number and the length of the introns are greater.

NONCODING DNA Most eukaryotic organisms contain vast amounts of DNA that do not encode proteins. For example, only about 1.5% of the human genome consists of DNA that directly specifies the amino acids of proteins. The function of the remaining DNA sequences, called noncoding DNA, has long been in question. Some research has suggested that much of the genome is “junk DNA” with no function. For example, Marcelo Nóbrega and his colleagues genetically engineered mice that were missing a large chromosomal region with no protein-encoding genes (called a gene desert). In one experiment, they created mice that were missing a 1,500,000-base-pair gene desert from mouse chromosome 3; in another, they created mice missing an 845,000-base-pair gene desert from chromosome 19. Remarkably, these mice appeared healthy and were indistinguishable from control mice. The researchers concluded that large regions of the mammalian genome can be deleted without major phenotypic effects and may, in fact, be superfluous.

417

Other research, however, has suggested that gene deserts may contain sequences that have a functional role. For example, genome-wide association studies demonstrated that DNA sequences contained within a gene desert on human chromosome 9 are associated with coronary artery disease, and subsequent studies have demonstrated the presence of 33 enhancers in this gene desert.

In 2002, the Encyclopedia of DNA Elements (ENCODE) project was undertaken to determine whether noncoding DNA had any function. Researchers cataloged all nucleotides within the genome that provide some function, including sequences that encode proteins and RNA molecules and those that serve as control sites for gene expression. This 10-year project was carried out by a team of over 400 scientists from around the world. In a series of papers published in 2012, the ENCODE team concluded that at least 80% of the human genome is involved in some type of gene function. Many of the functional sequences consisted of sites where proteins bind and influence the expression of genes. The ENCODE study suggests that there is little nonfunctional DNA in the human genome, but other researchers have questioned this conclusion. image TRY PROBLEM 21

TRANSPOSABLE ELEMENTS A substantial part of the genomes of most multicellular organisms consists of moderately and highly repetitive sequences (see Chapter 8), and the percentage of repetitive sequences is usually higher in those species with larger genomes (Table 15.3). Most of these repetitive sequences appear to have arisen through transposition. In the human genome, 45% of the DNA is derived from transposable elements, many of which are defective and no longer able to move. In corn, 85% of the genome is derived from transposable elements.

TABLE 15.3 Percentage of genome consisting of interspersed repeats derived from transposable elements
Organism Percentage of genome
Arabidopsis thaliana (plant) 10.5
Zea mays (corn) 85.0
Caenorhabditis elegans (nematode) 6.5
Drosophila melanogaster (fruit fly) 3.1
Takifugu rubripes (tiger pufferfish) 2.7
Homo sapiens (human) 44.4

PROTEIN DIVERSITY In spite of only a modest increase in gene number, vertebrates have considerably more protein diversity than do invertebrates. One way to measure protein diversity is by counting the number of protein domains, which are characteristic parts of proteins that are often associated with a function. Vertebrate genomes do not encode more protein domains than do invertebrate genomes; for example, there are 1262 domains in humans, compared with 1035 in fruit flies. However, the existing domains in humans are assembled into more combinations, leading to many more types of proteins.

CONCEPTS

Genome size varies greatly among eukaryotic species. For multicellular eukaryotic organisms, there is no clear relation between organismal complexity and amount of DNA or gene number. A substantial part of the genome in eukaryotic organisms consists of repetitive DNA, much of which is derived from transposable elements.