The information required for an organism to function—the “blueprint” for its existence—is contained in the organism’s genome, which as we noted earlier is the sum total of all the information encoded by its genes. The presence of genetic information and the processes by which organisms “decode” and use it to build the proteins that underlie a body’s structure and function involve fundamental principles that we will discuss and expand on throughout the book, especially in Chapters 10–14.
Early in the chapter we noted the importance of self-replicating nucleic acids in the origin of life. Nucleic acid molecules contain long sequences of four subunits called nucleotides. The sequence of these nucleotides in deoxyribonucleic acid, or DNA, allows the organism to assemble proteins. Each gene is a specific segment of DNA whose sequence carries the information for building, or controlling the building of, one or more proteins (FIGURE 1.9). Proteins, in turn, are the molecules that govern the chemical reactions within cells and form much of an organism’s structure. For these reasons, in biology we often say that genes “encode” proteins.
By analogy with a book, the nucleotides of DNA are like the letters of an alphabet. The sentences in the book are genes that encode proteins, which means that the genes provide instructions for making the proteins at a particular time or place. If you were to write out your own genome using four letters to represent the four DNA nucleotides, you would write more than 3 billion letters. Using the size type you are reading now, your genome would fill more than 1,000 books the size of this one.
All the cells of a given multicellular organism contain the same genome, yet the different cells have different functions and form different proteins. For example, oxygen-carrying hemoglobin occurs in red blood cells, gut cells produce digestive proteins, and so on. Therefore different types of cells in an organism must express, or use, different parts of the genome. How any given cell controls which genes it expresses, or uses (and which genes it suppresses, or doesn’t use), is a major focus of current biological research.
The genome of an organism contains thousands of genes. If mutations alter the nucleotide sequence of a gene, the protein that the gene encodes is often altered as well. Mutations may occur spontaneously, as happens when mistakes take place during replication of DNA. Mutations can also be caused by certain chemicals (such as those in cigarette smoke) and radiation (including UV radiation from the sun). Most mutations either are harmful or have no effect. Occasionally a mutation improves the functioning of the organism under the environmental conditions the individual encounters. Mutations are the raw material of evolution.
10
Scientists determined the first complete DNA sequence of an organism’s genome in 1976. This first sequence belonged to a virus, and viral genomes are very small compared with those of most cellular organisms. It was another two decades before the first bacterial genome was sequenced, in 1995. The first animal genome to be sequenced was a relatively small one—that of a roundworm—and was determined in 1998. A massive effort to sequence the complete human genome began in 1990 and finished 13 years later.
Since then, scientists have used the methods developed in these pioneering projects, as well as new DNA sequencing technologies that appear each year, to sequence genomes of hundreds of species. As methods have improved, the cost and time for sequencing a complete genome have dropped dramatically. The day is rapidly approaching when the sequencing of genomes from individual organisms will be commonplace for many biological applications.
What are we learning from genome sequencing? One surprise came when some genomes turned out to contain many fewer genes than expected. For example, there are only about 21,000 different genes that encode proteins in a human genome, but most biologists had expected many times that number. Gene sequence information is a boon to many areas of biology, making it possible to study the genetic basis of everything from physical structures to inherited diseases. Biologists can also compare genomes from many species to learn how and why one species differs from another. Such comparative genomic studies allow biologists to trace the evolution of genes through time and to document how particular changes in gene sequences result in changes in structure and function.
The vast amount of information being collected from genome studies has led to rapid development of the field of bioinformatics, the study of biological information. In this emerging field, biologists and computer scientists work together closely to develop new computational tools to organize, process, and study databases used in comparing genomes.