14.5 The Comparative Genomics of Humans with Other Species

Fundamentally, much of the science of genomics entails a comparative approach. For instance, most of what we know about the function of human proteins is based on the function of those proteins as analyzed in model species. And many of the questions that may be addressed through genomics are comparative. For example, we often want to know, as in the case of Nicholas Volker, how an individual with a trait or disease differs genetically from those without it.

Comparative genomics also has the potential to reveal how species diverge. Species evolve and traits change through changes in DNA sequence. The genome thus contains a record of the evolutionary history of a species. Comparisons among species’ genomes can reveal events unique to particular lineages that may contribute to differences in physiology, behavior, or anatomy. Such events could include, for example, the gain and loss of individual genes or groups of genes. Here, we will explore the key principles underlying comparative genomics and look at a few examples of how comparisons reveal what is similar and different among humans and other species. In the next section we will examine how differences are identified among individual humans.

Phylogenetic inference

The first step in comparing species’ genomes is to decide which species to compare. In order for comparisons to be informative, it is crucial to understand the evolutionary relationships among the species to be compared. The evolutionary history of a group is called a phylogeny. Phylogenies are useful because they allow us to infer how species’ genomes have changed over time.

528

The second step in comparing genomes is the identification of the most closely related genes, called homologs. Genes that are homologs can be recognized by similarities in their DNA sequences and in the amino acid sequences of the proteins they encode. It is important to distinguish here two classes of homologous genes. Some homologs are genes at the same genetic locus in different species. These genes would have been inherited from a common ancestor and are referred to as orthologs. However, many homologous genes belong to families that have expanded (and contracted) in number in the course of evolution. These homologous genes are at different genetic loci in the same organism. They arose when genes within a genome were duplicated. Genes that are related by gene-duplication events in a genome are called paralogs. The history of gene families can be quite revealing about the evolutionary history of a group.

For example, suppose we would like to know how the mammalian genome has evolved over the history of the group. We would like to know whether mammals as a group might have acquired some unique genes, whether mammals with different lifestyles might possess different sets of genes, and what the fate was of genes that existed in mammalian ancestors.

Fortunately, we now have a large and expanding set of mammal genome sequences to compare that includes representatives of the three main branches of mammals—monotremes (for example, platypus), marsupials (for example, wallaby, opossum), and eutherian mammals (for example, human, dog, cat, mouse). The relationships between these groups, some members within these groups, and other amniote vertebrates (amniotes are mostly land-dwelling vertebrates that have a terrestrially adapted egg) are shown in Figure 14-15.

Figure 14-15: Phylogeny of living mammals and other amniotes
Figure 14-15: Phylogeny of living mammals and other amniotes. The phylogenetic tree depicts the evolutionary relationships among the three major groups of mammals (monotremes, marsupials, and eutherians) and other amniotes, including birds and various reptiles. By mapping the presence or absence of genes in particular groups onto known phylogenies, one can infer the direction of evolutionary change (gain or loss) in particular lineages.

To illustrate the importance of understanding phylogenies and how to utilize them, we consider the platypus genome. Monotremes differ from other mammals in that they lay eggs. Inspection of the platypus genome revealed that it contains one egg-yolk gene called vitellogenin. Analyses of marsupial and eutherian genomes revealed no such functional yolk genes. The presence of vitellogenin in the platypus and its absence from other mammals could be explained in two ways: (1) vitellogenin is a novel invention of the platypus, or (2) vitellogenin existed in a common ancestor of monotremes, marsupials, and eutherians but was subsequently lost from marsupials and eutherians. The direction of evolutionary change is opposite in these two alternatives.

A simple pair-wise comparison between the platypus and another mammal does not distinguish between these alternatives. To do that, first we have to infer whether vitellogenin was likely to be present in the last common ancestor of the platypus, marsupials, and eutherians. We make this phylogenetic inference by examining whether vitellogenin is found in taxa outside of this entire group of mammals, what is referred to as an evolutionary outgroup. Indeed, three homologous vitellogenin genes exist in the chicken. Next, we consider the relationship of the chicken to mammals. Chickens belong to another major branch of the amniotes. Looking at the evolutionary tree in Figure 14-15, we can explain the presence of vitellogenins in chickens and the platypus as the result of two independent acquisitions (in the platypus lineage and the chicken lineage, respectively) or as the result of just one acquisition in a common ancestor of the platypus and chicken (which, based on the tree, would be a common ancestor of all amniotes) followed by the loss of vitellogenin genes in marsupials and eutherians.

How do we decide between these alternatives? When studying infrequent events such as the invention of a gene, evolutionary biologists prefer to rely on the principle of parsimony, that is, to favor the simplest explanation involving the smallest number of evolutionary changes. Therefore, the preferred explanation for the pattern of vitellogenin evolution in mammals is that this egg-yolk protein and corresponding gene were present in some egg-laying amniote ancestor and were retained in the egg-laying platypus and lost from non-egg-laying mammals.

529

As it turns out, there is one additional and very compelling piece of evidence that supports this inference. While inspection of eutherian genomes does not reveal any intact, functional vitellogenin genes, there are traces of vitellogenin gene sequences detectable in the human and dog genomes at positions that are in the same position as (syntenic to) the vitellogenin genes of the platypus and chicken (Figure 14-16). These sequences are molecular relics of our egg-laying ancestors. As our mammalian ancestors shifted away from yolky eggs, natural selection was relaxed on the vitellogenin gene sequences such that they have been nearly eroded away by mutations over tens of millions of years. Our genome contains numerous relics of genes that once functioned in our ancestors, and as we will see again in this section, the identities of those pseudogenes reflect how human biology has diverged from that of our ancestors.

Figure 14-16: The human genome carries relics of our egg-laying ancestors
Figure 14-16: Strings of genes along chicken chromosome 8 and human chromosome 1 and in the platypus are in the same relative order (boxes). Whereas the chicken genome has three genes that encode egg-yolk proteins, the egg-laying platypus has one functional gene and two pseudogenes, and humans have fragmented, very short remnants of the yolk genes.

530

Of course, evolution is also about the acquisition of new traits. For example, milk production is a shared trait among all mammals. A family of genes encoding the casein milk proteins are unique to mammals and tightly clustered together in their genomes, including that of the platypus. Just this brief glance at a few mammalian genomes informs us that, yes indeed, some mammals have genes that others do not, some genes are shared by all mammals, and the presence or absence of certain genes correlates with mammals’ lifestyle. The latter is a pervasive finding in comparative genomics.

KEY CONCEPT

Determining which genomic elements have been gained or lost during evolution requires knowledge of the phylogeny of the species being compared. The presence or absence of genes often correlates with organism lifestyles.

Let’s look at a few more examples that illuminate the evolutionary history of our genome and how we are different from, and similar to, other mammals.

Of mice and humans

The sequence of the mouse genome has been particularly informative for understanding the human genome because of the mouse’s long-standing role as a model genetic species, the vast knowledge of its classical genetics, and the mouse’s evolutionary relationship to humans. The mouse and human lineages diverged approximately 75 million years ago, which is sufficient time for mutations to cause their genomes to differ, on average, at about one of every two nucleotides. Thus, sequences common to the mouse and human genomes are likely to indicate common functions.

Homologs are identified because they have similar DNA sequences. Analysis of the mouse genome indicates that the number of protein-coding genes that it contains is similar to that of the human genome. Further inspection of the mouse genes reveals that at least 99 percent of all mouse genes have some homolog in the human genome and that at least 99 percent of all human genes have some homolog in the mouse genome. Thus, the kinds of proteins encoded in each genome are essentially the same. Furthermore, about 80 percent of all mouse and human genes are clearly identifiable orthologs.

531

The similarities between the genomes extend well beyond the inventory of protein-coding genes to overall genome organization. More than 90 percent of the mouse and human genomes can be partitioned into corresponding regions of conserved synteny, where the order of genes within variously sized blocks is the same as their order in the most recent common ancestor of the two species. This synteny is very helpful in relating the maps of the two genomes. For example, human chromosome 17 is orthologous to a single mouse chromosome (chromosome 11). Although there have been extensive intrachromosomal rearrangements in the human chromosome, there are 23 segments of colinear sequences more than 100 kb in size (Figure 14-17).

KEY CONCEPT

The mouse and human genomes contain similar sets of genes, often arranged in similar order.
Figure 14-17: The mouse and human genome have large syntenic blocks of genes in common
Figure 14-17: Synteny between human chromosome 17 and mouse chromosome 11. Large conserved syntenic blocks 100 kb or greater in size are shown in human chromosome 17, mouse chromosome 11, and the inferred chromosome of their last common ancestor (reconstructed by analysis of other mammalian genomes). Direct blocks of synteny are shown in light purple; inverted blocks are shown in green. Chromosome sizes are indicated in megabases (Mb).
[Data from M. C. Zody et al., “DNA Sequence of Human Chromosome 17 and Analysis of Rearrangement in the Human Lineage,” Nature 440, 2006, 1045–1049, Fig. 2.]

There are some detectable differences between the inventories of mouse and human genes. In one family of genes involved in color vision, the opsins, humans possess one additional paralog. The presence of this opsin has equipped humans with so-called trichromatic vision, so that we can perceive colors across the entire spectrum of visible light—violet, blue, green, red—whereas mice cannot. But again, the presence of this additional paralog in humans and its absence in mice does not alone tell us whether it was gained in the human lineage or lost in the mouse lineage. Analysis of other primate and mammalian genomes has revealed that Old World primates such as chimpanzees, gorillas, and the colobus monkey possess this gene but that all nonprimate mammals lack it. We can safely infer from this phylogenetic distribution of the additional opsin gene that it evolved in an ancestor of Old World primates (that includes humans).

On the other hand, the mouse genome contains more functional copies of some genes that reflect its lifestyle. Mice have about 1400 genes involved in olfaction—this is the largest single functional category of genes in its genome. Dogs, too, have a large number of olfactory genes. This certainly makes sense for the species’ lifestyles. Mice and dogs rely heavily on their sense of smell, and they encounter different odors from those encountered by humans. And the set of human olfactory genes, compared to that of mice and dogs, is strikingly inferior. We have a lot of olfactory genes, but a very large fraction of them are pseudogenes that bear inactivating mutations. For example, in just one class of olfactory genes called V1r genes, mice have about 160 functional genes, but just 5 out of the 200 or so V1r genes in the human genome are functional.

532

Still, these differences in gene content are relatively modest in light of the vast differences in anatomy and behavior. The overall similarity in the mouse and human genomes corresponds to the picture we get from examining the genetic toolkit controlling development in different taxa (see Chapter 13)—that great differences can evolve from genomes containing similar sets of genes. This same theme is illustrated by comparing our genome with that of our closest living relative, the chimpanzee.

Comparative genomics of chimpanzees and humans

Chimpanzees and humans last had a common ancestor about 5 to 6 million years ago. Since that time, genetic differences have accumulated by mutations that have occurred in each lineage. Genome sequencing has revealed that there are about 35 million single-nucleotide differences between chimpanzees and humans, corresponding to about a 1.06 percent degree of divergence. In addition, about 5 million insertions and deletions, ranging in length from just a single nucleotide to more than 15 kb, contribute a total of about 90 Mb of divergent DNA sequence (about 3 percent of the overall genome). Most of these insertions or deletions lie outside of coding regions.

Overall, the proteins encoded by the human and chimpanzee genomes are extremely similar. Twenty-nine percent of all orthologous proteins are identical in sequence. Most proteins that differ do so by only about two amino acid replacements. There are some detectable differences between chimpanzees and humans in the sets of functional genes. About 80 or so genes that were functional in their common ancestor are no longer functional in humans, owing to their deletion or to the accumulation of mutations. Some of these changes may contribute to differences in physiology.

In addition to changes in particular genes, duplications of chromosome segments in a single lineage have contributed to genome divergence. More than 170 genes in the human genome and more than 90 genes in the chimpanzee genome are present in large duplicated segments. These duplications are responsible for a greater amount of the total genome divergence than all single-nucleotide mutations combined. However, whether they contribute to major phenotypic differences is not yet clear.

Of course, all genetic differences between species originate as variations within species. The sequencing of the human genome and the advent of faster and less expensive high-throughput sequencing methods have opened the door to the detailed analysis of human genetic variation.