Organization of Genes Differs in Prokaryotic and Eukaryotic DNA
Having outlined the process of transcription, we now briefly consider the large-scale arrangement of information in DNA and how this arrangement dictates the requirements for RNA synthesis so that information transfer goes smoothly. In recent years, sequencing of entire genomes from multiple organisms has revealed not only large variations in the number of protein-coding genes, but also differences in their organization in bacteria and in eukaryotes.
The most common arrangement of protein-coding genes in bacteria has a powerful and appealing logic: genes encoding proteins that function together—for example, the enzymes required to synthesize the amino acid tryptophan—are most often found in a contiguous array in the DNA. Such an arrangement of genes in a functional group is called an operon because it operates as a unit from a single promoter. Transcription of an operon produces a continuous strand of mRNA that carries the message for a related series of proteins (Figure 5-13a). Each section of the mRNA represents the unit (or gene) that encodes one of the proteins in the series. This arrangement results in the coordinate expression of all the genes in the operon. Every time an RNA polymerase molecule initiates transcription at the promoter of the operon, all the genes of the operon are transcribed and translated. In prokaryotic DNA the genes are closely packed with very few noncoding gaps, and the DNA is transcribed directly into mRNA. Because DNA is not sequestered in a nucleus in prokaryotes, ribosomes have immediate access to the translation start sites in the mRNA as they emerge from the surface of the RNA polymerase. Consequently, translation of the mRNA begins even while the 3′ end of the mRNA is still being synthesized at the active site of the RNA polymerase.
FIGURE 5-13 Gene organization in prokaryotes and in eukaryotes. (a) The tryptophan (trp) operon is a continuous segment of the E. coli chromosome containing five genes (blue) that encode the enzymes necessary for the stepwise synthesis of tryptophan. The entire operon is transcribed from one promoter into one long continuous trp mRNA (red). Translation of this mRNA begins at five different start sites, yielding five proteins (green). The order of the genes in the bacterial genome parallels the sequential function of the encoded proteins in the tryptophan synthesis pathway. (b) The five genes encoding the enzymes required for tryptophan synthesis in baker’s yeast (Saccharomyces cerevisiae) are carried on four different chromosomes. Each gene is transcribed from its own promoter to yield a primary transcript that is processed into a functional mRNA encoding a single protein. The lengths of the various chromosomes are given in kilobases (103 bases).
This economical clustering of genes devoted to a single metabolic function is rarely found in eukaryotes, even simple ones such as yeasts, which can be metabolically similar to bacteria. Rather, eukaryotic genes encoding proteins that function together are most often physically separated in the DNA; indeed, such genes are usually located on different chromosomes. Each gene is transcribed from its own promoter, producing one mRNA, which is generally translated to yield a single polypeptide (Figure 5-13b).
Early research on the structure of eukaryotic genes involved studies of viruses that infect animals. When researchers analyzed the regions of a viral DNA molecule that encode viral mRNAs, they were surprised to observe that the sequence of a single viral mRNA was encoded in several regions of the viral DNA separated by DNA sequences that are not present in the mRNA. Later, the development of gene cloning and DNA sequencing (see Chapter 6) allowed researchers to compare the genomic DNA sequences of multicellular organisms with the sequences of their mRNAs. This research revealed that most cellular mRNAs are also encoded in several separate regions of genomic DNA, called exons, separated by sequences of DNA called introns. Further studies showed that a gene is first transcribed into a long primary transcript that includes both exon sequences and the intron sequences that separate them. Subsequently, the introns are removed and the exons are spliced together (see Chapter 10). Although introns are common in multicellular eukaryotes, they are extremely rare in bacteria and archaea and uncommon in many unicellular eukaryotes, such as baker’s yeast.