What is a gene? As noted in Chapter 3, the definition of a gene often changes as we explore different aspects of heredity. A gene was defined in Chapter 3 as an inherited factor that determines a characteristic. This definition may have seemed vague because it says only what a gene does rather than what a gene is. Nevertheless, this definition was appropriate at the time because our focus was on how genes influence the inheritance of traits. We did not have to consider the physical nature of the gene in learning the rules of inheritance.
Knowing something about the chemical structure of DNA and the process of transcription now enables us to be more precise about what a gene is. Chapter 10 described how genetic information is encoded in the base sequence of DNA: a gene consists of a set of DNA nucleotides. But how many nucleotides constitute a gene, and how is the information in these nucleotides organized? In 1902, Archibald Garrod suggested, correctly, that genes encode proteins. Proteins are made of amino acids, so a gene contains the nucleotides that specify the amino acids of a protein. Therefore, for many years the working definition of a gene was a set of nucleotides that specifies the amino acid sequence of a protein. As geneticists learned more about the structure of genes, however, it became clear that this concept of a gene was an oversimplification.
Early work on gene structure was carried out largely through the examination of mutations in bacteria and viruses. This research led Francis Crick in 1958 to propose that genes and proteins are colinear—that there is a direct correspondence between the nucleotide sequence of DNA and the amino acid sequence of a protein (Figure 14.1). The concept of colinearity suggests that the number of nucleotides in a gene should be proportional to the number of amino acids in the protein encoded by that gene. In a general sense, this concept is true for genes found in bacterial cells and many viruses, although these genes are slightly longer than would be expected if colinearity were strictly applied, because the mRNAs encoded by the genes contain sequences at their ends that do not specify amino acids. At first, eukaryotic genes and proteins also were generally assumed to be colinear, but there were hints that eukaryotic gene structure is fundamentally different. Eukaryotic cells contain far more DNA than is required to encode proteins (see Chapter 11). Furthermore, many large RNA molecules observed in the nucleus were absent from the cytoplasm, suggesting that nuclear RNAs undergo some type of change before they are exported to the cytoplasm.
Most geneticists were nevertheless surprised by the announcement in the 1970s that not all genes are continuous. Researchers observed four coding sequences in a gene from a eukaryotic virus that were interrupted by nucleotides that did not specify amino acids. This discovery was made when the viral DNA was hybridized with the mRNA transcribed from it and the hybridized structure was examined using an electron microscope (Figure 14.2). The DNA was clearly much longer than the mRNA because regions of DNA looped out from the hybridized molecules. These regions contained nucleotides in the DNA that were absent from the coding nucleotides in the mRNA. Many other examples of interrupted genes were subsequently discovered; it quickly became apparent that most eukaryotic genes consist of stretches of coding and noncoding nucleotides.
385
When a continuous sequence of nucleotides in DNA encodes a continuous sequence of amino acids in a protein, the two are said to be colinear. In eukaryotes, not all genes are colinear with the proteins that they encode.
CONCEPT CHECK 1
What evidence indicated that eukaryotic genes are not colinear with their proteins?
Many eukaryotic genes contain coding regions called exons and noncoding regions called intervening sequences or introns. For example, the gene encoding the protein ovalbumin has eight exons and seven introns; the gene for cytochrome b has five exons and four introns (Figure 14.3). The average human gene contains from eight to nine introns. All the introns and the exons are initially transcribed into RNA but, after transcription, the introns are removed by splicing and the exons are joined to yield the mature RNA.
Introns are common in eukaryotic genes but are rare in bacterial genes. For a number of years after their discovery, introns were thought to be entirely absent from prokaryotic genomes, but they have now been observed in archaea, bacteriophages, and even some eubacteria. Introns are present in mitochondrial and chloroplast genes as well as the nuclear genes of eukaryotes. In eukaryotic genomes, the size and number of introns appear to be directly related to increasing organismal complexity: yeast genes contain only a few short introns; Drosophila introns are longer and more numerous; and most vertebrate genes are interrupted by long introns. All classes of eukaryotic genes—those that encode rRNA, tRNA, and proteins—may contain introns. The number and size of introns vary widely: some eukaryotic genes have no introns, whereas others may have more than 60; intron length varies from fewer than 200 nucleotides to more than 50,000. Introns tend to be longer than exons, and most eukaryotic genes contain more noncoding nucleotides than coding nucleotides. Finally, most introns do not encode proteins: an intron of one gene is not usually an exon for a different gene.
386
Geneticists have long debated the evolutionary origin of introns. One idea, called the intron late hypothesis, proposes that introns were absent from ancient organisms but were later acquired by eukaryotes. Another idea, termed the intron early hypothesis, suggests that early ancestors to bacteria, archaea, and eukaryotes possessed introns that were later lost by prokaryotes and simple eukaryotes. Evidence suggests that introns have been lost and gained through evolutionary time. Many researchers now assume that the earliest eukaryotes possessed introns, because divergent eukaryotes have introns in the same positions in their genes, suggesting that these introns were present in the ancestors to all eukaryotes.
There are four major types of introns, differentiated by how the intron is removed (Table 14.1). Group I introns, found in some genes of eubacteria, bacteriophages, and eukaryotes, are self-splicing: they can catalyze their own removal. Group II introns are present in some genes of mitochondria, chloroplasts, archaea, and a few eubacteria; they also are self-splicing, but their mechanism of splicing differs from that of the group I introns. Nuclear pre-mRNA introns are the best studied; they include introns located in the protein-encoding genes of the eukaryotic nucleus. The splicing mechanism by which these introns are removed is similar to that of the group II introns, but nuclear introns are not self-splicing; their removal requires snRNAs (discussed later in this chapter) and a number of proteins. Transfer RNA introns, found in tRNA genes of eubacteria, archaea, and eukaryotes, utilize yet another splicing mechanism that relies on enzymes to cut and reseal the RNA. In addition to these major groups, there are several other types of introns.
Type of Intron | Location | Splicing Mechanism |
---|---|---|
Group I | genes of eubacteria, bacteriophages, and eukaryotes | Self-splicing |
Group II | genes of eubacteria, archaea, and eukaryotic organelles | Self-splicing |
Nuclear pre-mRNA | Protein–encoding genes in the nucleus of eukaryotes | Spliceosomal |
tRNA | tRNA genes of eubacteria, archaea, and eukaryotes | Enzymatic |
Note: There are also several types of minor introns, including group III introns, twintrons, and archaeal introns. |
We’ll take a detailed look at the chemistry and mechanics of RNA splicing later in this chapter. For now, we should keep in mind two general characteristics of the splicing process: (1) the splicing of all pre-mRNA introns takes place in the nucleus; and (2) the order of exons in DNA is usually maintained in the spliced RNA: the coding sequences of a gene may be split up, but they are not usually jumbled up. TRY PROBLEM
Many eukaryotic genes contain exons and introns. Both are transcribed into RNA, but introns are later removed by RNA processing. The number and size of introns vary from gene to gene; they are common in many eukaryotic genes but uncommon in bacterial genes.
CONCEPT CHECK 2
What are the four major types of introns?
387
How does the presence of introns affect our concept of a gene? To define a gene as a sequence of nucleotides that encodes amino acids in a protein no longer seems appropriate because this definition excludes introns, which do not specify amino acids. This definition also excludes nucleotides that encode the 5′ and 3′ ends of an mRNA molecule, which are required for translation but do not encode amino acids. Defining a gene in these terms also excludes sequences that encode rRNA, tRNA, and other RNAs that do not encode proteins. Given our current understanding of DNA structure and function, we need a more precise definition of gene.
Many geneticists have broadened the concept of a gene to include all sequences in DNA that are transcribed into a single RNA molecule. Defined this way, a gene includes all exons, introns, and those sequences at the beginning and end of the RNA that are not translated into a protein. This definition also includes DNA sequences that encode rRNAs, tRNAs, and other types of nonmessenger RNA. Some geneticists have expanded the definition of a gene even further, to include the entire transcription unit—the promoter, the RNA coding sequence, and the terminator. However, new evidence now calls into question even this definition. Recent research suggests that much of the genome is transcribed into RNA, although it is unclear what, if anything, much of this RNA does. What is certain is that the process of transcription is more complex than formerly thought, and defining a gene as a sequence that is transcribed into an RNA molecule is not as straightforward as formerly thought. The more we learn about the nature of genetic information, the more elusive the definition of a gene seems to become.
The discovery of introns forced a reevaluation of the definition of the gene. Today, a gene is often defined as a DNA sequence that encodes an RNA molecule or the entire DNA sequence required to transcribe and encode an RNA molecule.