Long Noncoding RNAs Direct Epigenetic Repression in Metazoans
Repressive complexes have been discovered that are composed of multiple repressing proteins bound to RNAs many kilobases in length that do not contain long open reading frames and are consequently called long noncoding RNAs or lncRNAs. In some cases, these lncRNA-protein complexes repress genes on the same chromosome from which the RNA is transcribed, as in the case of X-chromosome inactivation in female mammals. In other cases, these repressive RNA-protein complexes act in trans, repressing genes on chromosomes other than those from which the lncRNA is transcribed.
X-Chromosome Inactivation in Mammals The phenomenon of X-chromosome inactivation in female mammals (see Chapter 8) is one of the most intensely studied examples of epigenetic repression mediated by a lncRNA. X inactivation is controlled by a roughly 100-kb domain on the X chromosome called the X-inactivation center. Remarkably, this region encodes several lncRNAs required for the random inactivation of one entire X chromosome early in the development of female mammals. The functions of these lncRNAs are only partially understood. The most intensively studied are transcribed from the complementary DNA strands near the middle of the X-inactivation center: the 40-kb TSIX lncRNA and the XIST RNA, which is spliced and polyadenylated into an RNA of about 17 kb that is not exported to the cytoplasm (Figure 9-50a).
FIGURE 9-50 The Xist long noncoding RNA encoded in the X-inactivation center coats the inactive X chromosome in cells of mammalian females, repressing transcription of most genes on the inactive X. (a) The region of the human X-inactivation center encoding the noncoding RNAs Xist (transcribed from the inactive X), and Tsix (transcribed from the active X). Numbers are base pairs from the left end of the X chromosome. (b) A cultured fibroblast from a human female was analyzed by in situ hybridization with a probe complementary to Xist RNA labeled with a red fluorescent dye (left), a chromosome paint set of probes for the X chromosome labeled with a green fluorescent dye (center), and an overlay of the two fluorescent micrographs. The condensed inactive X chromosome is associated with Xist RNA. (c) Model for the spreading of the Xist lncRNA-protein complex on the inactive X chromosome during early differentiation of female embryonic stem cells. See E. Heard and A.-V. Gendrel, 2014, Annu. Rev. Cell Dev. Biol. 30:561. (d) Proteins associated with Xist lncRNA. Question marks indicate that it is not yet known how PRC2 complexes associate with HDAC3 and the RNA-binding protein SHARP. See C. A. McHugh et al., 2015, Nature 521:232.
[Part (b) ©1996 C. M. Clemson et al., The Journal of Cell Biology, 132:259–275. doi: 10.1083/jcb.132.3.259.]
In differentiated female cells, the inactive X chromosome is associated with XIST RNA-protein complexes along its entire length (Figure 9-50b). Targeted deletion of the Xist gene (see Figure 6-39) in cultured embryonic stem cells showed that it is required for X inactivation. Unlike most protein-coding genes on the inactive X chromosome, the Xist gene is actively transcribed. The XIST RNA-protein complexes do not diffuse to interact with the active X or other chromosomes, but remain associated with the inactive X chromosome. Since the full length of the inactive X becomes coated by XIST RNA-protein complexes (see Figure 9-50b), these complexes must spread along the chromosome from the X-inactivation center where XIST is transcribed. In contrast to XIST, TSIX is transcribed from the active X chromosome, not from the inactive X chromosome.
In the early female mouse embryo, made up of embryonic stem cells capable of differentiating into all cell types (see Chapter 21), genes on both X chromosomes are transcribed, and the 40-kb TSIX lncRNA (see Figure 9-50a) is transcribed from both copies of the X chromosome. Experiments employing engineered deletions in the X-inactivation center showed that TSIX transcription prevents significant transcription of the XIST RNA from the complementary DNA strand. Later in development, as cells begin to differentiate, TSIX transcription is repressed on one of the X chromosomes. This repression occurs randomly in different cells on the X chromosome derived from the sperm (Xp) or on the X chromosome derived from the egg (Xm). This inhibition of TSIX transcription determines which of the X chromosomes will be inactivated as the cells differentiate further because inhibition of TSIX transcription allows transcription of the XIST lncRNA on that chromosome.
The transcribed XIST RNA contains RNA sequences that, by unknown mechanisms, cause it to spread along the X chromosome. Recent studies indicate that XIST lncRNA-protein complexes first associate with regions of the X chromosome localized near the X-inactivation center in the three-dimensional, folded structure of the future inactive X (Figure 9-50c), as shown by chromosome conformation capture assays (see Figure 8-34). These initial sites of XIST association are in gene-rich regions of the X chromosome and are postulated to serve as “entry sites” where additional copies of the XIST lncRNA-protein complexes first bind and then spread to neighboring regions. The mechanism of spreading is not currently understood. The inactive X chromosome also becomes associated with PRC2 complexes, which catalyze the trimethylation of histone H3 lysine 27. This methylation results in association of the PRC1 complex and transcriptional repression, as discussed above. These mechanisms of transcriptional repression must be redundant, however, because repression still occurs in the absence of the Polycomb proteins essential for the assembly of PRC1 and PRC2. At the same time, continued transcription of TSIX from the other, active X chromosome continues, represses XIST transcription from that X chromosome, and consequently prevents XIST-mediated repression of the active X. XIST and PRC1 and 2 complexes are then observed to associate with gene-poor regions of the inactive X chromosome as well as with gene-rich regions.
Recent analysis by protein mass spectrometry (see Chapter 3) of proteins associated with XIST lncRNA during the initiation phase of X inactivation in cultured mouse embryonic stem cells revealed that SMRT, a protein first characterized as a co-repressor that interacts with the thyroid hormone nuclear receptor in the absence of hormone, is part of the protein complex that interacts with XIST RNA. SMRT, in turn, interacts with a histone deacetylase (HDAC3). Subsequent knockdown experiments with siRNAs directed against SMRT and HDAC3 showed that they are required for X inactivation, as are other identified RNA- and chromatin-binding proteins that link SMRT to XIST RNA and are required for the association of XIST RNA and PRC2 with the inactive X chromosome (Figure 9-50d). A short time later in development, the DNA of the inactive X also becomes methylated at most of its CpG island promoters. Specialized histone octamers in which histone H2A is replaced by a paralog of H2A called macroH2A also become associated with the inactive X. DNA methylation and macroH2A contribute to the stable repression of the inactive X through the multiple cell divisions that occur later during embryogenesis and throughout adult life.
Trans Repression by Long Noncoding RNAs Another example of transcriptional repression by a long noncoding RNA was discovered recently by researchers studying the function of noncoding RNAs transcribed from a region encoding a cluster of Hox genes, the HOXC locus, in cultured human fibroblasts. Depletion of a 2.2-kb noncoding RNA expressed from the HOXC locus by siRNA (see Figure 6-42) unexpectedly led to derepression of the HOXD locus, a roughly 40-kb region on another chromosome encoding several Hox proteins and multiple other noncoding RNAs, in these cells. Assays similar to chromatin immunoprecipitation showed that this noncoding RNA, named HOTAIR (for Hox Antisense Intergenic RNA), associates with the HOXD loci and with PRC2 complexes. This association results in histone H3 lysine 27 di- and trimethylation, PRC1 association, histone H3 lysine 4 demethylation, histone H2A monoubiquitinylation, and transcriptional repression. This process is similar to the recruitment of Polycomb complexes by Xist RNA, except that Xist RNA functions in cis, remaining in association with the chromosome from which it is transcribed, whereas HOTAIR leads to Polycomb repression in trans on both copies of another chromosome. Once again, redundant mechanisms for repression of these HOXD loci must exist, because extensive, but less complete, repression at the HOXD locus continues in the appropriate cells in mouse embryos with homozygous HOTAIR knockout mutations.
Cis Activation by Long Noncoding RNAs Examples of lncRNAs involved in gene activation have been characterized recently. For example, HOTTIP lncRNA, which is transcribed from the 5′ end of the HOXA locus, is proposed to coordinate the activation of HOXA genes by binding to a histone H3 lysine 4 methylase. In addition, nascent transcripts of lncRNA genes have been reported to activate transcription from promoters several kilobases away by interacting with the Mediator complex and delivering it to the promoter by looping of the intervening chromatin.
In humans, but not in mice, a lncRNA called XACT has been discovered to associate with multiple sites along the full length of the active X chromosome and is postulated to contribute to maintenance of gene activity on that chromosome. XACT is also remarkable for being one of the longest characterized RNAs: 252 kb! It is mostly unspliced.
In Drosophila, equal expression of genes encoded on the X chromosome in males and females (dosage compensation) does not result from inactivating one X chromosome in females. Rather, a generalized twofold increase in transcriptional activation of genes on the single X chromosome in males is controlled by two lncRNAs, roX1 and roX2, transcribed from the X chromosome in males only. The roX1 and roX2 RNAs associate with several proteins encoded by MSL (male-specific-lethal) genes and spread over the X chromosome specifically, much as Xist lncRNA-protein complexes spread over the inactive X in mammals.
Recently, sequencing of total cellular RNA in multiple types of human cells identified roughly 15,000 human lncRNAs. Many of these lncRNAs have sequences that are evolutionarily conserved in most mammals, and about 5000 are found only in primates. This conservation of sequence strongly suggests that these lncRNAs, like XIST, HOTAIR, and HOTTIP, have important functions. Multiple lncRNAs are expressed only in specific cell types at specific times during development. For example, multiple lncRNAs are expressed primarily in differentiating red blood cells. Knockdown (see Figure 6-42 and Chapter 10) of several of these lncRNAs inhibits normal red blood cell development, but precisely how these lncRNAs perform their essential functions is not yet clear. The study of these conserved long noncoding RNAs and how they influence gene expression is another area of intense current investigation.
ENCODE (Encyclopedia of DNA Elements) encompasses a consortium of international research groups organized and funded by the US National Human Genome Research Institute with the goal of building a comprehensive, publically available database of human DNA control elements and the transcription factors that bind to them in different cell types, histone post-translational modifications mapped by ChIP-seq and other related methods, DNase I hypersensitive sites, and regulatory lncRNAs and their sites of association in the genome, as well as newly discovered regulatory elements “that control cells and circumstances in which a gene is active.” Data sets from human cells and cells of model organisms that are too large to be published are also made publically available at a site called GEO (Gene Expression Omnibus) maintained by the US National Center for Bioinformatics (NCBI). Most journals that publish research based on genomic methods such as RNA-seq and ChIP-seq require that authors upload their original data to GEO. Worldwide public access to these data sets is greatly accelerating the pace of discovery in the area of gene regulation.