A Diverse Set of Proteins with Conserved RNA-Binding Domains Associate with Pre-mRNAs

As noted earlier, neither nascent RNA transcripts of protein-coding genes nor the intermediates of mRNA processing, collectively referred to as pre-mRNA, exist as free RNA molecules in the nuclei of eukaryotic cells. From the time nascent transcripts first emerge from RNA polymerase II until mature mRNAs are transported into the cytoplasm, the RNA molecules are associated with an abundant set of nuclear proteins. These proteins are the major protein components of heterogeneous ribonucleoprotein particles (hnRNPs), which contain heterogeneous nuclear RNA (hnRNA), a collective term referring to pre-mRNA and other nuclear RNAs of various sizes. These hnRNP proteins contribute to further steps in RNA processing, including splicing, polyadenylation, and export through nuclear pore complexes to the cytoplasm.

Researchers identified hnRNP proteins by first exposing cultured cells to high-dose UV irradiation, which causes covalent cross-links to form between RNA bases and closely associated proteins. Chromatography of nuclear extracts from treated cells on an oligo-dT cellulose column, which binds RNAs with a poly(A) tail, was used to recover the proteins that had become cross-linked to nuclear polyadenylated RNA. Subsequent treatment of cell extracts from nonirradiated cells with monoclonal antibodies specific for the major proteins identified by this cross-linking technique revealed a complex set of abundant hnRNP proteins ranging in size from 30 to 120 kDa.

Like transcription factors, most hnRNP proteins have a modular structure. They contain one or more RNA-binding domains and at least one other domain that interacts with other proteins. Several different RNA-binding motifs have been identified by creating hnRNP proteins with missing amino acid sequences and testing their ability to bind RNA.

422

Functions of hnRNP Proteins The association of pre-mRNAs with hnRNP proteins prevents the pre-mRNAs from forming short secondary structures by base pairing of complementary regions, thereby making the pre-mRNAs accessible for interaction with other RNA molecules or proteins. Pre-mRNAs associated with hnRNP proteins present a more uniform substrate for subsequent processing steps than would free, unbound pre-mRNAs, each of which would form a unique secondary structure due to its specific sequence.

Binding studies with purified hnRNP proteins indicate that different hnRNP proteins associate with different regions of a newly made pre-mRNA molecule. For example, the hnRNP proteins A1, C, and D bind preferentially to the pyrimidine-rich sequences at the 3′ ends of introns (see Figure 10-7 below). Some hnRNP proteins interact with the RNA sequences that specify RNA splicing or cleavage/polyadenylation and contribute to the structure recognized by RNA-processing factors. Finally, cell-fusion experiments have shown that some hnRNP proteins remain localized in the nucleus, whereas others cycle in and out of the cytoplasm, suggesting that they function in the export of mRNA from the nucleus to the cytoplasm (Figure 10-4).

image
FIGURE 10-4 Human hnRNP A1 protein can cycle in and out of the nucleus, but human hnRNP C protein cannot. Cultured HeLa cells and Xenopus cells were fused by treatment with polyethylene glycol, producing heterokaryons containing nuclei from each cell type. These hybrid cells were treated with cycloheximide immediately after fusion to prevent protein synthesis. After 2 hours, the cells were fixed and stained with fluorescent-labeled antibodies specific for human hnRNP C and A1 proteins. These antibodies do not bind to the homologous Xenopus proteins. (a) A fixed preparation viewed by phase-contrast microscopy includes unfused HeLa cells (arrowhead) and Xenopus cells (dotted arrow), as well as fused heterokaryons (solid arrow). In the heterokaryon in this micrograph, the round HeLa-cell nucleus is to the right of the oval-shaped Xenopus nucleus. (b, c) When the same preparation was viewed by fluorescence microscopy, the stained hnRNP C protein appeared green and the stained hnRNP A1 protein appeared red. Note that the unfused Xenopus cell on the left is unstained, confirming that the antibodies are specific for the human proteins. In the heterokaryon, hnRNP C protein appears only in the HeLa-cell nucleus (b), whereas the A1 protein appears in both the HeLa-cell nucleus and the Xenopus nucleus (c). Since protein synthesis was blocked after cell fusion, some of the human hnRNP A1 protein must have left the HeLa-cell nucleus, moved through the cytoplasm, and entered the Xenopus nucleus in the heterokaryon.
[Reprinted by permission of Nature Publishing Group, from: Piñol-Roma S., and Dreyfuss, G., “Shuttling of pre-mRNA binding proteins between nucleus and cytoplasm,” Nature, 1992, 355(6362):730–2; permission conveyed through the Copyright Clearance Center, Inc.]

Conserved RNA-Binding Motifs The RNA recognition motif (RRM), also called the RNP motif and the RNA-binding domain (RBD), is the most common RNA-binding domain in hnRNP proteins. This 80-residue domain, which occurs in many other RNA-binding proteins as well, contains two highly conserved sequences (RNP1 and RNP2) that are found across organisms ranging from yeast to humans—indicating that, like many DNA-binding domains, it evolved early in eukaryotic evolution.

Structural analyses have shown that the RRM domain consists of a four-stranded β sheet flanked on one side by two α helices. To interact with the negatively charged RNA phosphates, the β sheet forms a positively charged surface. The conserved RNP1 and RNP2 sequences lie side by side on the two central β strands, and their side chains make multiple contacts with a single-stranded region of RNA that lies across the surface of the β sheet (Figure 10-5).

image
FIGURE 10-5 Structure of the RRM domain and its interaction with RNA. (a) Ribbon diagram of the RRM domain found in hnRNP proteins, showing the two α helices (green) and four β strands (red) that characterize this motif. The conserved RNP1 and RNP2 regions are located in the two central β strands. (b, c) Ribbon diagram and surface representation of the two RRM domains in Drosophila Sex-lethal (Sxl) protein (b) and the polypyrimidine tract-binding protein (PTB) (c). In both (b) and (c), positively charged regions are shown in shades of blue; negatively charged regions, in shades of red; RNA is yellow. The two RRMs in Sxl are oriented like the two parts of an open pair of castanets, with the β sheets of the RRMs facing toward each other. The pre-mRNA is bound to the surfaces of the positively charged β sheets, making most of its contacts with the RNP1 and RNP2 regions of each RRM. PTB has a strikingly different orientation of RRM domains, illustrating that RRMs are oriented in different relative positions in different hnRNPs. The p(Y)-tract is a polypyrimidine tract. In PTB, the two RRMs associate through their α helices so that the positively charged β sheets face away from each other, upward for RRM3 and downward for RRM4. The structure of CUCUCU single-stranded RNA bound to each of the two RRMs was determined, explaining how PTB can bind to two tracts of six pyrimidines in a single RNA if they are separated by a loop of 15 or more nucleotides. This ability of PTB to form a small loop in a pre-mRNA probably contributes to its ability to function as a splicing repressor at exons where the upstream 3′ splice site or the downstream 5′ splice site is flanked by two polypyrimidine tracts. See K. Nagai et al., 1995, Trends Biochem. Sci. 20:235.
[Part (b) data from N. Harada et al., 1999, Nature 398:579, PDB ID 1b7f. Part (c) data from F. C. Oberstrass et al., 2006, Science 309:2054, PDB ID 2adb, 2adc.]

The 45-residue KH motif is found in the hnRNP K protein and several other RNA-binding proteins. The three-dimensional structure of representative KH domains is similar to that of the RRM domain but smaller, consisting of a three-stranded β sheet supported from one side by two α helices. Nonetheless, the KH domain interacts with RNA much differently than does the RRM domain. RNA binds to the KH domain by interacting with a hydrophobic surface formed by the α helices and one β strand. The RGG box, another RNA-binding motif found in hnRNP proteins, contains five Arg-Gly-Gly (RGG) repeats with several interspersed aromatic amino acids. A recent structural analysis indicates that in one example of RNA binding, an RGG-containing peptide binds in the major groove of a G-rich RNA duplex region (see Figure 5-4b). KH domains and RGG repeats are often interspersed in two or more sets in a single RNA-binding protein.