The Number of Protein-Coding Genes in an Organism’s Genome Is Not Directly Related to Its Biological Complexity

The combination of genomic sequencing and gene-finding computer algorithms has yielded the complete inventory of protein-coding genes for a variety of organisms. Figure 8-22 shows the total number of protein-coding genes in several eukaryotic genomes that have been completely sequenced. The functions of about half the proteins encoded in these genomes are known or have been predicted on the basis of sequence comparisons. One of the surprising features of this comparison is that the number of protein-coding genes within different organisms does not seem proportional to our intuitive sense of their biological complexity. For example, the roundworm C. elegans apparently has more genes than the fruit fly Drosophila, which has a much more complex body plan and more complex behavior. And humans have only about 5 percent more protein-coding genes than C. elegans. When it first became apparent that humans have so few more protein-coding genes than the simple roundworm, it was difficult to understand how such a small increase in the number of proteins could generate such a staggering difference in complexity.

image
FIGURE 8-22 Comparison of the number and types of proteins encoded in the genomes of different eukaryotes. For each organism, the area of the entire pie chart represents the total number of protein-coding genes, all shown at roughly the same scale. In most cases, the functions of the proteins encoded by about half the genes are still unknown (light blue). The functions of the remainder are known or have been predicted by sequence similarity to genes of known function.
[Data from ENCODE Project Consortium, 2012, Nature 489:57; J. D. Hollister, 2014, Chromosome Res. 22:103; L. W. Hillier et al., 2005, Genome Res. 15:1651; FlyBase: FB2015_02 Release Notes, http://flybase.org/static_pages/docs/release_notes.html; Saccharomyces Genome Data Base 2015, http://www.yeastgenome.org/genomesnapshot.]

327

Clearly, simple quantitative differences in the number of protein-coding genes in the genomes of different organisms are inadequate for explaining differences in biological complexity. However, several phenomena can generate more complexity in the expressed proteins of higher eukaryotes than is predicted from their genomes. First, alternative splicing of a pre-mRNA can yield multiple functional mRNAs corresponding to a particular gene (see Chapter 10). In humans, the mean number of alternatively spliced mRNAs expressed per gene is about 6. Second, variations in the post-translational modification of many proteins may produce functional differences. Finally, increased biological complexity results from increased numbers of cells built of the same kinds of proteins. Larger numbers of cells can interact in more complex combinations, as we can see by comparing the cerebral cortices of mouse and human. Similar cells are present in the mouse and in the human cerebral cortex, but in humans more of them make more complex connections. Evolution of the increasing biological complexity of multicellular organisms probably required increasingly complex regulation of cell replication and temporal and spatial regulation of gene expression in the cells that make up the organisms, leading to increasing complexity of embryological development.

The specific functions of many genes and proteins identified by analysis of genomic sequences still have not been determined. As researchers unravel the functions of individual proteins in different organisms and further detail their interactions with other proteins, the resulting advances will become immediately applicable to all homologous proteins in other organisms. When the function of every protein is known, no doubt, a more sophisticated understanding of the molecular basis of complex biological systems will emerge.