Genome sequences of model organisms provide important information

Most of our information about eukaryotic genomes has come from model organisms that have been studied extensively: the yeast Saccharomyces cerevisiae, the nematode (roundworm) Caenorhabditis elegans, the fruit fly Drosophila melanogaster, and the thale cress plant (Arabidopsis thaliana). Model organisms have been chosen because they are relatively easy to grow and study in a laboratory, their genetics are well studied, and they exhibit characteristics that represent a larger group of organisms.

image YEAST: THE BASIC EUKARYOTIC MODEL Yeasts are single-celled eukaryotes. Like most eukaryotes, they have membrane-enclosed organelles, such as the nucleus and endoplasmic reticulum, and a haplontic life cycle that alternates between haploid and diploid generations (see Figure 11.14). So it is not surprising that single-celled yeast has a larger genome with more protein-coding genes than a single-celled bacterium (see Table 17.2). Gene inactivation studies similar to those carried out for M. genitalium (see Figure 17.6) indicate that fewer than 10 percent of the yeast’s genes are essential to survival. The most striking difference between the yeast genome and that of E. coli is the number of genes for targeting proteins to organelles (Table 17.3). Both of these single-celled organisms appear to use about the same number of genes to perform the basic functions of cell survival. It is the compartmentalization of the eukaryotic yeast cell into organelles that requires it to have many more genes. This finding is direct, quantitative confirmation of something we have known for a century: the eukaryotic cell is structurally more complex than the prokaryotic cell.

table 17.3 Comparison of the Genomes of E. coli and S. cerevisiae
image

E. coli

Yeast

image
Genome length (base pairs) 4,640,000 12,157,000
Number of protein-coding genes 4,288 6,275
Proteins with roles in:
   Metabolism 650 650
   Energy production/storage 240 175
   Membrane transport 280 250
   DNA replication/repair/recombination 115 175
   Transcription 55 400
   Translation 182 350
   Protein targeting/secretion 35 430
   Cell structure 180 250

image THE NEMATODE: UNDERSTANDING EUKARYOTIC DEVELOPMENT A simple organism in which to study multicellularity is Caenorhabditis elegans, a 1-mm-long nematode (roundworm) that normally lives in the soil. It can also live in the laboratory, where it has become a favorite model organism of developmental biologists (see Key Concept 19.2). The nematode has a transparent body that develops over 3 days from a fertilized egg to an adult worm made up of nearly 1,000 cells. In spite of its small number of cells, the nematode has a nervous system, digests food, reproduces sexually, and ages. So it is not surprising that an intense effort was made to sequence the genome of this model organism.

370

The C. elegans genome (100 million bp) is 8 times larger than that of the yeast Saccharomyces cerevisiae and has 3.3 times as many protein-coding genes (see Table 17.2). Gene inactivation studies have shown that the worm can survive in laboratory cultures with only 10 percent of these genes. So the minimal genome of the worm is about twice the size of that of the yeast (about 5,000 genes), which in turn is about twelve times the size of the minimal genome for Mycoplasma. What do these extra genes do? All cells must have genes for survival, growth, and division. In addition, the cells of multicellular organisms must have genes for holding cells together to form tissues, for cell differentiation, and for intercellular communication. Looking at Table 17.4, you will recognize functions that we discussed in earlier chapters, including gene regulation (see Chapter 16) and cell communication (see Chapter 7).

table 17.4 C. elegans Genes Essential to Multicellularity image
Function Protein/domain Number of genes
Transcription control Zinc finger; homeobox 540
RNA processing RNA binding domains 100
Nerve impulse transmission Gated ion channels 80
Tissue formation Collagens 170
Cell interactions Extracellular domains; glycotransferases 330
Cell–cell signaling G protein–linked receptors; protein kinases; protein phosphatases 1,290

image DROSOPHILA MELANOGASTER: RELATING GENETICS TO GENOMICS The fruit fly Drosophila melanogaster is a famous model organism. Studies of fruit flies resulted in the formulation of many basic principles of genetics (see Key Concept 12.4). More than 2,500 mutations of D. melanogaster have been described, and this fact alone was a good reason for sequencing the fruit fly’s DNA. The fruit fly is a much larger organism than C. elegans, both in size (it has ten times more cells) and complexity, and it undergoes complicated developmental transformations from egg to larva to pupa to adult. Figure 17.7 summarizes the functions of the Drosophila genes that have been characterized so far; this distribution is typical of complex eukaryotes.

image
Figure 17.7 Functions of the Eukaryotic Genome The distribution of gene functions in Drosophila melanogaster shows a pattern that is typical of many complex organisms.

image ARABIDOPSIS: STUDYING THE GENOMES OF PLANTS About 250,000 species of flowering plants dominate the land and fresh water. But in the context of the history of life, the flowering plants are fairly young, having evolved only about 200 million years ago. The genomes of some plants are huge—for example, the genome of corn is about 3 billion bp, and that of wheat is 17 billion bp. So although we are naturally most interested in the genomes of plants we use as food and fiber, it is not surprising that scientists first chose to sequence a simpler flowering plant.

Arabidopsis thaliana, thale cress, is a member of the mustard family and has long been a favorite model organism of plant biologists. It is small (hundreds could grow and reproduce in the space occupied by this page), it is easy to manipulate, and it has a relatively small genome of 125 million bp. The Arabidopsis genome has more than 27,000 protein-coding genes (see Table 17.2), but remarkably, many of these genes are duplicates and probably originated by chromosomal rearrangements. When these duplicate genes are subtracted from the total, about 15,000 unique genes are left—similar to the number of genes found in fruit flies. Indeed, many of the genes found in fruit flies have homologs (related genes) in Arabidopsis and other plants, suggesting that plants and animals have a common ancestor.

Arabidopsis does, of course, have some genes that distinguish it as a plant (Table 17.5). From what you know about plants, you can guess what these are: genes involved in photosynthesis, in the transport of water into the root and throughout the plant, in the assembly of the cell wall, in the uptake and metabolism of inorganic substances from the environment, and in the synthesis of specific molecules used for defense against microbes and herbivores (organisms that eat plants). The plant-specific genes in Arabidopsis are also found in the genomes of other plants, including rice (Oryza sativa), the first major crop plant whose sequence was determined. Rice is the world’s most important crop; it is a staple in the diet of 3 billion people. Despite its larger genome, rice has a set of genes remarkably similar to those of Arabidopsis. More recently the genome of the poplar tree Populus trichocarpa was sequenced. This rapidly growing tree is widely used for manufacturing paper and is a potential source of fixed carbon for making fuel. A comparison of four plant genomes shows many genes in common, comprising the basic minimal plant genome (Figure 17.8).

image
Figure 17.8 Plant Genomes Four plant genomes share a common set of approximately 12,000 genes that appear to comprise the minimal plant genome.
table 17.5 Arabidopsis Genes Unique to Plants
Function image Number of genes
Cell wall and growth 42
Water channels 300
Photosynthesis 139
Defense and metabolism 94