20.6 The Origin of New Genes and Protein Functions

Evolution consists of more than the substitution of one allele for another at loci of defined function. A large fraction of protein-coding and RNA-encoding genes belong to gene families, groups of genes that are related in sequence and typically in biochemical function as well. For example, there are over 1000 genes encoding structurally related olfactory receptors in a mouse and three structurally related opsin genes that encode proteins necessary for color vision in humans. Within families such as these, new functions have evolved that have made possible new capabilities. These new functions may be expansions of existing capabilities. In the examples above, new receptors appeared in mice with the ability to detect new chemicals in the environment, or in the case of humans and their Old World primate relatives, new opsin proteins appeared that can detect wavelengths of light that other mammals cannot. In other cases, the evolution of new gene families may lead to entirely novel functions that open up new ways of living, such as the acquisition of antifreeze proteins in polar fish. Here, we will ask, Where does the DNA for new genes come from? What are the fates of new genes? And how do new protein functions evolve?

787

Expanding gene number

There are several genetic mechanisms that can expand the number of genes or parts of genes. One large-scale process for the expansion of gene number is the formation of polyploids, individuals with more than two chromosome sets. Polyploids result from the duplication of the entire genome. Much more common in plants than in animals (see Chapter 17), the formation of polyploids has played a major role in the evolution of plant species. Consider the frequency distribution of haploid chromosome numbers among the dicotyledonous plant species shown in Figure 20-18. Above a chromosome number of about 12, even numbers are much more common than odd numbers—a consequence of frequent polyploidy.

Figure 20-18: Even chromosome numbers are more common than odd numbers
Figure 20-18: Frequency distribution of haploid chromosome numbers in dicotyledonous plants.
[Data from Verne Grant, The Origin of Adaptations. Columbia University Press, 1963.]

A second mechanism that can increase gene number is gene duplication. Misreplication of DNA during meiosis can cause segments of DNA to be duplicated. The lengths of the segments duplicated can range from just one or two nucleotides up to substantial segments of chromosomes containing scores or even hundreds of genes. Detailed analyses of human-genome variation has revealed that individual humans commonly carry small duplications that result in variation in gene-copy number.

A third mechanism that can generate gene duplications is transposition. Sometimes, when a transposable element is transposed to another part of the genome, it may carry along additional host genetic material and insert a copy of some part of the genome into another location (see Chapter 15).

788

A fourth mechanism that can expand gene number is retrotransposition. Many animal genomes harbor retroviral-like genetic elements (see Chapter 15) that encode reverse-transcriptase activity. Retrotransposons themselves make up approximately 40 percent of the human genome. Occasionally, host genome mRNA transcripts are reverse transcribed into cDNA and inserted back in the genome, producing an intronless gene duplicate.

The fate of duplicated genes

It was once thought that because the ancestral function is provided by the original gene, duplicate genes are essentially spare genetic elements that are free to evolve new functions (termed neofunctionalization), and that would be a common fate. However, the detailed analysis of genomes and population-genetic considerations has led to a better understanding of the alternative fates of new gene duplicates, with the evolution of new function being just one pathway.

For simplicity’s sake, let’s consider a duplication event that results in the duplication of the entire coding and regulatory region of a gene (Figure 20-19a). Many different outcomes can unfold from such a duplication. The simplest result is that the allele bearing the duplicate is lost from the population before it rises to any significant frequency, as is the fate of many new mutations (see Chapter 18). But let’s consider next the more interesting scenarios: suppose the duplication survives and new mutations begin to occur within the duplicate gene pair. Keeping in mind that the original and duplicated genes are initially exact copies and therefore redundant, once new mutations arise, there are several possible fates:

  1. An inactivating mutation may occur in the coding region of either duplicate. The inactivated paralog is called a pseudogene and will generally be invisible to natural selection. Thus, it will accumulate more mutations and evolve by random genetic drift, while natural selection will maintain the functional paralog (Figure 20-19b).

  2. Mutations may occur that alter the regulation of one duplicate or the activity of one encoded protein. These alleles may then become subject to positive selection and acquire a new function (neofunctionalization) (Figure 20-19c).

  3. In cases where the ancestral gene has more than one function and more than one regulatory element, as for most toolkit genes, a third possible outcome is that initial mutations inactivate or alter one regulatory element in each duplicate. The original gene function is now divided between the duplicates, which complement each other. In order to preserve the ancestral function, natural selection will maintain the integrity of both gene-coding regions. Loci that follow this path of duplication and mutation that produce complementary paralogs are said to be subfunctionalized (Figure 20-19d).

Figure 20-19: The alternative fates of duplicated genes
Figure 20-19: The alternative fates of duplicated genes. (a) The duplication of a gene. The orange, yellow, and pink boxes denote cis-regulatory elements; the beige box denotes the coding region. After duplication, several alternative fates of the duplicates are possible: (b) any inactivating mutation in a coding region will render that duplicate into a pseudogene, and purifying selection will then operate on the remaining paralog; (c) mutations may arise that alter the function of a protein and may be favored by positive selection (neofunctionalization); (d) mutations may affect a subfunction of either duplicate, and so long as the two paralogs together provide the ancestral functions, different subfunctions may be retained, resulting in the evolution of two complementary loci (subfunctionalization).

Some of these alternative fates of gene duplicates are illustrated in the history of the evolution of human globin genes. The evolution of our lineage, from fish ancestors to terrestrial amniotes that laid eggs to placental mammals, has required a series of innovations in tissue oxygenation. These include the evolution of additional globin genes with novel patterns of regulation and the evolution of hemoglobin proteins with distinct oxygen-binding properties.

Adult hemoglobin is a tetramer consisting of two α polypeptide chains and two β chains, each with its bound heme molecule. The gene encoding the adult α chain is on chromosome 16, and the gene encoding the β chain is on chromosome 11. The two chains are about 49 percent identical in their amino acid sequences; this similarity reflects their common origin from an ancestral globin gene deep in evolutionary time. The α chain gene resides in a cluster of five related genes (α and ζ) on chromosome 16, while the β chain resides in a cluster of six related genes on chromosome 11 (ε, β, δ, and γ) (Figure 20-20). Each cluster contains a pseudogene, Ψα and Ψβ, respectively, that has accumulated random, inactivating mutations.

Figure 20-20: Some duplicates of the hemoglobin genes evolved into nonfunctional pseudogenes (Ψα and Ψβ)
Figure 20-20: Chromosomal distribution of the genes for the α family of globins on chromosome 16 and the β family of globins on chromosome 11 in humans. Gene structure is shown by black bars (exons) and colored bars (introns).

789

Each cluster contains genes that have evolved distinct expression profiles, a distinct function, or both. Of greatest interest are the two γ genes. These genes are expressed during the last seven months of fetal development to produce fetal hemoglobin (also known as hemoglobin F), which is composed of two α chains and two γ chains. Fetal hemoglobin has a greater affinity for oxygen than does adult hemoglobin, which allows the fetus to extract oxygen from the mother’s circulation via the placenta. At birth, up to 95 percent of hemoglobin is the fetal type, then expression of the adult β form replaces γ and a small amount of δ globin is also produced. The order of appearance of globin chains during development is orchestrated by a complex set of cis-acting regulatory sequences and, remarkably, follows the order of genes on each chromosome.

790

The γ genes are restricted to placental mammals. Their distinct developmental regulation and protein products mean that these duplicates have evolved differences in function that have contributed to the evolution of the placental lifestyle. Interestingly, regulatory variants of these genes are known that cause expression of the fetal hemoglobin to persist into childhood and adulthood. These naturally occurring variants appear to moderate the severity of sickle-cell anemia by suppressing the levels of HbS produced. One widespread strategy for the treatment of sickle-cell anemia is to administer drugs that stimulate the reactivation of fetal-hemoglobin expression.