Chapter Introduction

15: Transcription: DNA-Dependent Synthesis of RNA

519

  • 15.1 RNA Polymerases and Transcription Basics

  • 15.2 Transcription in Bacteria

  • 15.3 Transcription in Eukaryotes

MOMENT OF DISCOVERY

Robert Tjian

In the early 1980s, it was clear that specialized proteins must exist for accurate and regulated mRNA synthesis from particular genes in mammalian cells. However, nobody had been able to identify such “transcription factors” or determine how this process of transcriptional activation works. The breakthrough in my laboratory came when we found out that human cell extracts contained a factor that can discriminate between two templates and somehow program the enzyme that reads DNA to choose the right promoter DNA and ignore all others. But how?

We decided to use a short piece of the active promoter DNA sequence as “bait” to fish out proteins that selectively bind this site. The challenge was to purify this activity away from the other 3,000 DNA-binding proteins present in human cell extracts! After months of struggling with this problem, I vividly recall walking into the lab and my coworkers, Jim Kadonaga and Kathy Jones, saying, “We think we know which protein it is!” They had cleverly treated the human cell extract with a huge excess of sheared calf-thymus DNA to remove most nonspecific DNA-binding proteins, enriching the treated extract for the protein we wanted. Sequence-specific DNA affinity resin was then used to bind the transcription factor in the treated extracts, leading to the purification of a single protein. We called this protein specificity protein 1 (Sp1), the first of many sequence-specific transcription factors that were to prove critical for human gene regulation.

I’ll never forget the feeling of profound excitement at having shared the discovery of such a fundamental protein in biology and, at the same time, having devised with my lab members a new means to isolate hundreds more of these key gene-regulatory proteins.

—Robert Tjian, on discovering the first specific eukaryotic transcription factor

520

Information encoded in the DNA of cells and viruses provides the instructions for making the RNA and protein molecules that carry out the activities essential for life. The first step in expression of this information is transcription, the enzymatic production of an exact complementary strand of RNA from a DNA template. Transcription thus involves the transfer of genetic information from DNA to RNA. For protein-coding regions of DNA, transcription begins the gene expression pathway leading to the production of protein through translation of a messenger RNA (translation of mRNA into protein is discussed in Chapter 18). For non-protein-coding regions of DNA, transcription produces RNA molecules that, in many cases, are components of RNA-protein complexes, or ribonucleoproteins. Some of these are enzymes, but the majority play nonenzymatic roles in controlling gene expression on many levels. Increasing evidence shows that a much greater proportion of an organism’s transcribed DNA is non-protein-coding than protein-coding. The functions of many such transcripts are just beginning to be defined.

All RNA molecules, except for the RNA genomes of certain viruses, are derived from information stored in DNA. Transcription produces three major kinds of RNA, and many other types of RNA are generated in smaller amounts. As described in Chapter 6, messenger RNAs (mRNAs) encode the amino acid sequence of one or more polypeptides specified by a gene or set of genes. Transfer RNAs (tRNAs) read the information encoded in the mRNA and provide the appropriate amino acid to a growing polypeptide chain during protein synthesis. Ribosomal RNAs (rRNAs) are constituents of ribosomes, the intricate cellular machines that synthesize proteins. Other specialized RNAs have regulatory or catalytic functions or are precursor forms of the three main classes of RNA (Highlight 15-1).

HIGHLIGHT 15-1 A CLOSER LOOK: The ABCs of RNA: Complexity of the Transcriptome

When complete eukaryotic genome sequences became available, molecular biologists were excited to discover the extent of the transcriptome, the entire set of RNA transcripts produced in a cell. Initially, researchers focused on characterizing the transcription products of known genes. These included mRNAs and known stable noncoding RNAs (ncRNAs) such as rRNAs, tRNAs, small nuclear RNAs (snRNAs) involved in pre-mRNA splicing, and small nucleolar RNAs (snoRNAs), which guide chemical modifications in the ribosome.

Unexpected levels of complexity began to emerge, however, beginning with the discovery of naturally occurring interfering RNAs, such as small interfering RNAs (siRNAs) and microRNAs (miRNAs), which have roles in the regulation of translation (as we describe in Chapter 22). Using a combination of microarrays and RNA-Seq (see Chapter 8), researchers could detect RNA transcripts without being biased by prior expectations. Use of these techniques showed that a lot of transcription was occurring that had previously been ignored. These new technologies revealed that the transcription landscape in higher eukaryotes is much more complex than expected. Surprisingly, a large fraction of transcripts originate from intergenic regions—regions between the coding sequences of genes—that had been thought to be silent, or from sequences that run in the opposite direction (antisense) to genes. Transcription that does not map to protein-coding genes or to known ncRNA genes also occurs in yeast.

In a parallel set of experiments, arrays of synthetic DNA oligonucleotides representing all nonrepetitive sequences in human chromosomes 21 and 22 were used to map the binding sites for three human transcription factors—Sp1, cMyc, and p53—that activate the transcription of many protein-coding genes involved in cell growth and differentiation. The experiments revealed far more transcription factor–binding sites than would be predicted from the number of protein-coding genes in these chromosomes. Of these binding sites, more than one-third lie within or immediately 3′ to well-characterized genes and seem to correlate with the transcription of ncRNAs. These findings have changed our thinking about transcription: much more of it goes on than previously suspected. Just what is all that RNA doing?

Some possible roles of previously undetected transcripts have emerged. For example, long noncoding RNAs (lncRNAs) are produced from regions that are either intergenic or antisense to genes. The functional significance of lncRNAs is not known, although several studies suggest roles for these transcripts in gene regulation. Shorter transcripts, particularly those that originate near gene promoters, fall into two somewhat arbitrary categories: molecules 20 to 200 nucleotides long are called small RNAs (sRNAs), and molecules of 200 to 1,000 nucleotides are called long RNAs (lRNAs). These categories include numerous subfamilies of transcripts defined by their abundance, longevity, and genomic origin. Although the sources of these transcripts are not yet fully worked out, at least some sRNAs may result from aborted or prematurely terminated transcription.

Whether pervasive transcription is simply a consequence of low-level background transcription or has functional significance is a subject of active research. In animals, it is unclear whether transcripts that are initiated near a particular promoter are functional. One possibility is that promoter-associated transcripts may help maintain chromatin in an open state that is more accessible to the transcription and regulatory machinery. In addition, it could keep a pool of RNA polymerase available for rapid deployment to make mRNAs. This will clearly be an expanding area of research, and more surprises are certainly in store.

Unlike DNA replication, which involves copying the entire chromosome, transcription is selective. Only particular genes or groups of genes are transcribed at any given time, and some portions of the DNA genome may be transcribed rarely or not at all. The cell directs the transcription machinery to express genetic information as it is needed. Specific regulatory sequences mark the beginning and end of the DNA segments to be transcribed and designate which strand of the double-stranded DNA is to be used as the template for RNA synthesis. The regulation of transcription is an important and exciting aspect of gene expression. We discuss regulation of gene expression in this chapter, and in further detail in Chapters 1922.

As one of the central cellular processes studied by molecular biologists, transcription—enzymatic RNA synthesis directed by a DNA template—has been worked out in some detail, yet some fascinating puzzles remain. We begin by examining the enzymes responsible for transcription, and then address the mechanics of transcription in bacteria and in eukaryotic cells.