13.2 Genome Annotation

275

A goal of biology is to identify all the macromolecules in biological systems and understand their individual functions and the ways in which they interact. This research has practical applications: Increased understanding of the molecular and cellular basis of disease, for example, can lead to improved diagnosis and treatment.

The value of genome sequencing in identifying macromolecules is that the genome sequence contains, in coded form, the nucleotide sequence of all RNA molecules transcribed from the DNA as well as the amino acid sequence of all proteins. There is a catch, however. A genome sequence is merely an extremely long list of A’s, T’s, G’s, and C’s that represent the order in which nucleotides occur along the DNA in one strand of the double helix. (Because of complementarity, knowing the sequence of one strand specifies the other.) The catch is that in multicellular organisms, not all the DNA is transcribed into RNA, and not all the RNA that is transcribed is translated into protein. Therefore, genome sequencing is just the first step in understanding the function of any particular DNA sequence. Following genome sequencing, the next step is to identify the locations and functions of the various types of sequence present in the genome.