Protein-Coding Genes May Be Solitary or Belong to a Gene Family

The nucleotide sequences within chromosomal DNA can be classified on the basis of their structure and function, as shown in Table 8-1. Here we examine the properties of each class, beginning with protein-coding genes, which comprise two groups.

image
*The sum of “Fraction of the Human Genome (%)” totals more than 100% because mobile DNA elements are counted twice: once to show the different classes of human mobile DNA elements; and second as part of the intergenic regions and protein coding genes where they are located in introns and 3′ untranslated regions of terminal exons.
**Complete transcription units including exons and introns.
† Total length of all exons. Protein-coding regions total 1.2 percent of the genome.
‡ Length of each repeat in a tandemly repeated sequence.
SOURCE: Data from International Human Genome Sequencing Consortium, 2001, Nature 409:860 and 2004, Nature 431:931.

In multicellular organisms, roughly 25–50 percent of the protein-coding genes are represented only once in the haploid genome and thus are termed solitary genes. A well-studied example of a solitary protein-coding gene is the chicken lysozyme gene. Lysozyme, an enzyme that cleaves the polysaccharides in bacterial cell walls, is an abundant component of chicken egg-white protein and is also found in human tears. Its activity helps to keep the egg and the surface of the eye sterile. The 15-kb DNA sequence encoding chicken lysozyme constitutes a simple transcription unit containing four exons and three introns. The flanking regions, extending about 20 kb upstream and downstream from the transcription unit, do not encode any detectable mRNAs, and are thus examples of intergenic regions.

Duplicated genes constitute the second group of protein-coding genes. These genes have close but nonidentical sequences and are often located within 5–50 kb of one another. A set of duplicated genes that encodes proteins with similar but nonidentical amino acid sequences is called a gene family; the encoded, closely related, homologous proteins constitute a protein family. A few protein families, such as protein kinases, vertebrate immunoglobulins, and olfactory receptors, include hundreds of members. Most protein families, however, include from just a few to 30 or so members; common examples are cytoskeletal proteins, the myosin heavy chain, and the α-likes and β-likeas globins in vertebrates.

306

The genes encoding the β-likeas globins are a good example of a gene family. As shown in Figure 8-4a, the β-likeas globin gene family contains five functional genes, designated HBB (encoding the most abundant adult β-globin), HBD (a minor adult β-globin), HBG1 and HBG2 (fetal β-globins), and HBE1 (embryonic β-globin). Two identical β-likeas globin polypeptides combine with two identical α-likes globin polypeptides (encoded by another gene family expressed during embryonic, fetal, and adult stages of development) and four heme prosthetic groups to form a hemoglobin molecule (see Figures 3-14 and 12-20). All the hemoglobins formed from the different α-likes and β-likeas globins carry oxygen in the blood, but they exhibit somewhat different properties that are suited to their specific functions in human physiology. For example, hemoglobins containing either the HBG1- or HBG2-encoded polypeptides are expressed only during fetal life. Because these fetal hemoglobins have a higher affinity for oxygen than adult hemoglobins, they can effectively extract oxygen from the maternal circulation in the placenta. The lower oxygen affinity of adult hemoglobins, which are expressed after birth, permits better release of oxygen to the tissues, especially muscles, which have a high demand for oxygen during exercise. The embryonic hemoglobin assembled from polypeptides encoded by the HBE1 gene and the embryonic α-likes globin gene HBZ has an even higher affinity for oxygen than the fetal and adult hemoglobins.

image
FIGURE 8-4 Comparison of gene density in higher and lower eukaryotes. (a) In this diagram of the β-globin gene cluster on human chromosome 11, the green boxes represent exons of β-globin–related genes. Exons spliced together to form one mRNA are connected by caret-like spikes. The human β-globin gene cluster contains two pseudogenes (white); these regions are related to the functional β-globin genes but are not transcribed. Each red arrow indicates the location of an Alu sequence, a roughly 300-bp noncoding repeated sequence that is abundant in the human genome. See F. S. Collins and S. M. Weissman, 1984, Prog. Nucl. Acid Res. Mol. Biol. 31:315. (b) In this diagram of yeast DNA from chromosome III, the green boxes indicate open reading frames. Most of these potential protein-coding sequences are functional genes without introns. Note the much higher proportion of noncoding to coding sequences in the human DNA than in the yeast DNA. See S. G. Oliver et al., 1992, Nature 357:28.

The different β-like globin genes arose by duplication of an ancestral gene, most likely as the result of unequal crossing over during meiotic recombination in a developing germ cell (egg or sperm) (see Figure 8-2b). Over evolutionary time, the two copies of the gene that resulted accumulated random mutations, resulting in sequence drift. Beneficial mutations that conferred some refinement in the basic oxygen-carrying function of hemoglobin were retained by natural selection. Repeated gene duplications and subsequent sequence drift and selection are thought to have generated the contemporary β-likeas globin genes observed in humans and other mammals today.

307

Two regions in the human β-like globin gene cluster contain nonfunctional sequences, called pseudogenes, that are similar to the functional β-like globin genes (see Figure 8-4a). Sequence analysis shows that these pseudogenes have the same apparent exon–intron structure as the functional β-like globin genes, suggesting that they arose by duplication of the same ancestral gene. However, there was little selective pressure to maintain the function of these genes. Consequently, sequence drift during evolution generated sequences that either terminate translation or block mRNA processing, rendering these regions nonfunctional. Because such pseudogenes are not deleterious, they remain in the genome and mark the location of a gene duplication that occurred in one of our ancestors.

Duplications of segments of a chromosome (called segmental duplication) occurred fairly often during the evolution of multicellular plants and animals. As a result, a large fraction of the genes in these organisms today have been duplicated, allowing the process of sequence drift to generate gene families and pseudogenes. The extent of sequence divergence between duplicated copies of the genome and characterization of the homologous genomic sequences in related organisms allow us to estimate the time in evolutionary history when the duplication occurred. For example, the human fetal globin genes (HGB1 and HGB2) evolved following the duplication of a 5.5-kb region in the β-globin locus that included the single HGB-globin gene in the common ancestor of catarrhine primates (Old World monkeys, apes, and humans) and platyrrhine primates (New World monkeys) about 50 million years ago.

Although members of gene families that arose relatively recently in evolution, such as the genes of the human β-globin family, are often found near one another on the same chromosome, members of gene families may also be found on different chromosomes in the same organism. This is the case for the human α-like globin genes, which were separated from the β-globin genes by an ancient chromosomal translocation. Both the α- and β-globin genes evolved from a single ancestral globin gene that was duplicated (see Figure 8-2b) to generate the predecessors of the contemporary α- and β-globin genes in mammals. Both the primordial α- and β-globin genes then underwent further duplications to generate the different genes of the α- and β-globin gene clusters found in mammals today.

Several different gene families encode the various proteins that make up the cytoskeleton. These proteins are present in varying amounts in almost all cells. In vertebrates, the major cytoskeletal proteins are the actins, tubulins, and intermediate filament proteins such as the keratins, discussed in Chapters 17, 18, and 20. We examine the origin of one such family, the tubulin family, in Section 8.4. Although the physiological rationale for the cytoskeletal protein families is not as obvious as it is for the globins, the different members of a family probably have similar but subtly different functions suited to the particular type of cell in which they are expressed.