|
|
|
REVIEW
Cold Spring Harbor Laboratory, Cold Spring Harbor, New York 11724, USA
| Abstract |
|---|
|
|
|---|
[Keywords: RNA; disease; intergenic transcripts; noncoding RNA; nuclear regulatory RNAs]
Recent large-scale studies of the human and mouse genomes have revealed that although there are
21,561 protein-coding genes in human and 21,839 in mouse, significantly larger portions of both genomes are transcribed (69,185 gene predictions in human and 71,259 in mouse) (http://www.ensembl.org). Based on such analyses, eukaryotic genomes appear to harbor fewer protein-coding genes than initially expected, and gene number does not scale with complexity as steeply as originally anticipated (Mattick 2004a
; Mattick and Makunin 2006
). For example, the Drosophila melanogaster genome contains only twice as many genes as some bacterial species, although the former is far more complex in its genome organization than the latter. Similarly, the number of protein-coding genes in human and in the nematode Caenorhabditis elegans is extremely close (http://www. ensembl.org). Such analyses suggest that protein-coding genes alone are not sufficient to account for the complexity of higher eukaryotic organisms. Interestingly, from genomic analysis it is evident that as an organisms complexity increases, the protein-coding contribution of its genome decreases (Mattick 2004a
, b
; Szymanski and Barciszewski 2006
). A portion of this paradox can be resolved through alternative pre-mRNA splicing, whereby diverse mRNA species, encoding different protein isoforms, can be derived from a single gene (Lareau et al. 2004
). In addition, a range of post-translational modifications contributes to the increased complexity and diversity of protein species (Yang 2005
).
It is estimated that
98% of the transcriptional output of the human genome represents RNA that does not encode protein (Mattick 2005
). This suggests that these genomes are either replete with largely useless transcription or that these ncRNAs are fulfilling a wide range of unexpected functions in eukaryotic biology (Huttenhofer et al. 2005
; Mattick 2005
). Recent observations strongly suggest that ncRNAs contribute to the complex networks needed to regulate cell function and could be the ultimate answer to the genome paradox (Mattick 2001
, 2003
, 2004a
, c
; Mattick and Gagen 2001
). Initially the term ncRNA was used primarily to describe eukaryotic RNAs that are transcribed by RNA polymerase II (RNA pol II) and have a 7-methylguanosine cap structure at their 5' end and a poly(A) tail at their 3' end, but lack a single long ORF. However, more recently this classification has been extended to all RNA transcripts that do not have a protein-coding capacity. NcRNAs include introns and independently transcribed RNAs, with the latter accounting for 50%75% of all transcription in higher eukaryotes (Mattick and Gagen 2001
; Shabalina and Spiridonov 2004
). Introns account for at least 30% of the human genome, but they have been largely overlooked due to the general assumption that they are rapidly degraded upon pre-mRNA splicing (Mattick 1994
, 2005
). In mammalian genomes, introns comprise
95% of the sequence within protein-coding genes. Introns have been suggested to play important roles in nucleosome formation and chromatin organization, alternative pre-mRNA splicing, and as scaffold/matrix-attachment regions (Shabalina and Spiridonov 2004
). Intronic sequences have also been shown to harbor independent transcription units, such as microRNAs, small nucleolar RNAs (snoRNAs), and repetitive elements (Mattick and Makunin 2005
).
It is not clear how many ncRNA genes are present in the mammalian genome. The existing catalog of mammalian genes is strongly biased toward protein-coding genes. Novel ncRNA genes are difficult to identify based on sequence analysis due to their sequence divergence across phyla (Pang et al. 2006
). The nature of ncRNA genes, including their variation in length (20 nucleotides [nt] to >100 kb), lack of ORFs, and relative immunity to point mutations makes them difficult targets for genetic screens. Analysis of mouse full-length cDNAs revealed that ncRNAs constitute more than one-third of all identified transcripts (Okazaki et al. 2002
; Numata et al. 2003
; Carninci et al. 2005
). Whole human chromosome analysis using oligonucleotide tiling arrays has demonstrated a significantly large number of genes encoding ncRNAs on most of the analyzed chromosomes (Kapranov et al. 2002
; Cawley et al. 2004
; Kampa et al. 2004
; Cheng et al. 2005
), many of which show extraordinarily complex patterns of interlaced and overlapping transcription (Carninci et al. 2005
; Kapranov et al. 2005
). Current estimates of the number of independent transcription units (
70,000) and protein-coding genes (
21,500) in the mammalian transcriptome suggest that ncRNA genes are highly abundant in the genome (Mattick 2004b
, c
, 2005
; Mattick and Makunin 2006
; Willingham and Gingeras 2006
).
Based on functional relevance, ncRNAs can be subdivided into two classes: (1) housekeeping ncRNAs and (2) regulatory ncRNAs. Housekeeping ncRNAs are generally constitutively expressed and are required for the normal function and viability of the cell. Some examples include transfer RNAs (tRNAs), ribosomal RNAs (rRNAs), small nuclear (snRNAs), snoRNAs, RNase P RNAs, telomerase RNA, etc. These RNAs have been the focus of many reviews (Eddy 2001
; Gesteland et al. 2006
) and will not be considered further here. In contrast, regulatory ncRNAs or riboregulators include those ncRNAs that are expressed at certain stages of development, during cell differentiation, or as a response to external stimuli, which can affect the expression of other genes at the level of transcription or translation. Several recent excellent reviews have focused on small regulatory RNAs, including small interfering RNAs (siRNAs) and microRNAs (Hannon 2002
; He and Hannon 2004
; Mattick and Makunin 2005
; Zamore and Haley 2005
; Petersen et al. 2006
), and therefore will not be extensively discussed here, except for their involvement in various diseases. In the present review, we discuss our current understanding of the roles of other noncoding regulatory RNAs in eukaryotic cells and their involvement in gene organization, regulation, and disease etiology.
| Roles of RNA in dosage compensation and sex determination: everything needs to be equal |
|---|
|
|
|---|
XCI
In mammals, dosage compensation of X-linked gene products between the sexes is achieved by transcriptional silencing of a single X chromosome during early female embryogenesis (Lyon 1961
; Plath et al. 2002
; Heard and Disteche 2006
; Spencer and Lee 2006
). Initiation of XCI requires the counting of X chromosomes. XCI follows the "n 1" rule that leads to transcriptional silencing of all but one X chromosome. In female soma, XCI occurs in early development shortly after uterine implantation of the embryo. This form of XCI is called "random" because silencing can take place on either X chromosome (Spencer and Lee 2006
). However, in the extraembryonic tissues of some placental mammals, such as rodents, XCI takes place in an "imprinted" manner such that the paternal X (Xp) is always silenced (Takagi and Sasaki 1975
). Earlier classical cytogenetics studies suggested that the paternal X only becomes inactivated at the blastocyst stage, accompanying cellular differentiation in the trophoectoderm and primitive endoderm (Takagi et al. 1982
). However, recent studies have revealed that the paternal X has already begun to inactivate by the eight-cell stage (Huynh and Lee 2003
; Mak et al. 2004
; Okamoto et al. 2004
; Okamoto and Heard 2006
) and this inactivation of Xp initiates following Xist RNA coating at the four-cell stage (Okamoto et al. 2004
, 2005
). Imprinted XCI is also observed in marsupials and is believed to be the earliest form of XCI (Graves 1996
). This inactive state is stably maintained through subsequent cell divisions. The X inactivation center (Xic) is a critical region of 80450 kb on the X chromosome that controls XCI initiation and spreading (Heard and Disteche 2006
; Spencer and Lee 2006
). Only the chromosomes carrying the Xic sequence are able to induce XCI, even though the "random" and "imprinted" forms of XCI may differ with respect to the requirement of the Xic sequences (Okamoto et al. 2005
). Interestingly, when Xic sequences are inserted into an autosome, the autosome becomes subject to counting, choice, and inactivation (Spencer and Lee 2006
).
Of the several long ncRNA genes present in Xic, Xist (X-inactive-specific transcript) has been the most extensively studied ncRNA gene. The Xist gene encodes a ncRNA that is associated exclusively with the inactive X chromosome (Fig. 1; Brockdorff et al. 1992
; Brown et al. 1992
). Although potential ORFs exist in Xist RNA, they are short and not conserved between species (Brockdorff et al. 1992
; Brown et al. 1992
). The gene is conserved between species at the level of its genomic organization but shows only weak sequence homology, possibly implying a role for its secondary structure. Xist ncRNAs are 1517 kb long in mice,
19 kb in human, are spliced, polyadenylated, and are restricted to the nuclear compartment. In the female embryo, Xist up-regulation on the putative inactive X chromosome (Xi) and RNA coating of this chromosome constitute the first detectable signs of XCI (Morey and Avner 2004
). Using inducible Xist cDNA transgenes, it was shown that Xist-RNA-induced X-chromosome silencing occurs only during early embryonic stem (ES) cell differentiation (Wutz and Jaenisch 2000
). However, during initial phases of ES cell differentiation, XCI can be reversed by switching off the Xist gene, but subsequently the repressed state becomes locked in and is no longer dependent on Xist. This irreversibility of silencing of Xi can be attributed to changes in chromatin modifications observed on the Xi followed by Xist RNA coating (Heard and Disteche 2006
). The earliest chromatin modifications observed are the loss of histone modifications associated with active chromatin, such as H3K9 acetylation and H3K4 methylation. Subsequently, the X chromosome becomes H4 hypoacetylated and enriched in H3 Lys 27 (H3K27) trimethylation (Plath et al. 2003
; J. Silva et al. 2003
). H3K27 hypermethylation is accompanied by other chromatin changes, including H3K9 hypermethylation and H4K20 monomethylation as well as H2A K119 monoubiquitylation, and all of these modifications appear concomitantly with the transcriptional silencing of the X-linked genes (Morey and Avner 2004
; Heard and Disteche 2006
). The inactive X chromosome is also enriched in the histone variant macroH2A, and Xist RNA is necessary for its localization to the inactive X (Costanzi and Pehrson 1998
). These successive layers of modifications lead to the establishment of silent chromatin and, in turn, lock the inactive X into a stable heterochromatic state throughout the cell cycle. Deletion and transgene analyses have shown that Xist is essential for both imprinted and random XCI and affects only the chromosome that transcribes Xist RNA (Penny et al. 1996
; Marahrens et al. 1997
; Wutz and Jaenisch 2000
). However, Xist alone cannot account for the multiple functions attributed to the Xic, such as "counting," as deletion of one Xist allele still allows the cell to register the presence of less than one Xic, which triggers XCI via the wild-type Xist allele (Penny et al. 1996
). Interestingly, multiple DNA elements 3' to Xist appear to be involved in counting and choice functions (Heard and Disteche 2006
).
|
Other than Xist RNA, the Xic region in mouse also harbors many other ncRNA genes including Tsix, Xite, and Jpx/Enox, several of which are integral to the regulation of XCI. Tsix negatively regulates the expression of Xist RNA and is transcribed in an antisense orientation relative to Xist. Like Xist, Tsix lacks a conserved ORF and is found only in the nucleus (Lee et al. 1999
). In undifferentiated female ES cells, Xist and Tsix are coexpressed on both X chromosomes, although the Tsix levels are in 10- to 100-fold molar excess over Xist RNA (Shibata and Lee 2003
). However, a recent study suggests that Xist is expressed at an extremely low level prior to XCI and that Tsix is the major RNA component detected at the Xist/Tsix locus in undifferentiated ES cells (Sun et al. 2006
). At the onset of cell differentiation, Tsix becomes asymmetrically expressed: Whereas Tsix expression persists transiently on the future active X (Xa), expression ceases on the future inactive X (Xi). The loss of Tsix expression on the future Xi enables the up-regulation and spread of Xist RNA along the chromosome. The persistence of Tsix on the future Xa enables that X to remain active. Once the window for XCI has passed, Tsix is also turned off on the Xa. These results suggest that by controlling the fate of Xist and therefore the X chromosome, Tsix acts as a binary switch for XCI. The reason for this sudden reciprocal expression profile of Xist and Tsix remains unknown. Interestingly, two recent studies have revealed that the Xics transiently colocalize, via the Tsix region, during the onset of XCI, at the time when counting and choice are thought to occur (Bacher et al. 2006
; Xu et al. 2006
). This "cross-talk" between the Xics is thought to be required for the exchange of information between Xist/Tsix that ultimately results in the monoallelic down-regulation of Tsix and up-regulation of Xist on the inactive X chromosome (Heard and Disteche 2006
). Several mechanisms have been proposed to explain how Tsix regulates Xist (Spencer and Lee 2006
). These include (1) a DNA-based mechanism in which DNA sequences at Tsix bind transcription factors that then repress the Xist promoter at long range, or Tsix could also compete with Xist for an enhancer or any other regulatory sequence; (2) a transcription-based mechanism, where antisense transcription across the Xist promoter could interfere with the ability of the Xist promoter to fire by affecting chromatin modification or transcription factor binding; (3) Tsix RNA itself could recruit repressive factors or could form duplex RNA with Xist that would either facilitate the degradation of Xist RNA or prevent binding of necessary silencing factors to Xist RNA. Recent studies have provided clues that suggest either Tsix transcription or Tsix RNA itself has a role in Xist RNA regulation (Spencer and Lee 2006
). It has been observed that overexpression of Tsix always results in an active X in cis (Luikenhuis et al. 2001
; Stavropoulos et al. 2001
). Furthermore, when Tsix RNA is prematurely truncated before it crosses into the Xist gene, Tsix no longer functions as a repressor of Xist, and XCI invariably occurs on the mutated X (Shibata and Lee 2004
). It was also proposed earlier that the modulation of Xist chromatin structure might play a role in how Tsix regulates Xist (Navarro et al. 2005
; Sado et al. 2005
). Interestingly, a recent study has suggested that up-regulation of Xist RNA observed on the future inactive X is not due to the increased stability of the Xist transcript as suggested earlier but is regulated by Tsix (Panning et al. 1997
; Sheardown et al. 1997
; Sun et al. 2006
). Lee and colleagues (Sun et al. 2006
) reported that Tsix down-regulation on the future inactive X induces a transient heterochromatic state to Xist, followed by high levels of Xist expression. This heterochromatic state adopted by the Tsix-deficient chromosome in pre-XCI cells persisted through XCI establishment and reverted to a euchromatic state during XCI maintenance (Sun et al. 2006
).
The mouse Xic harbors yet another functional ncRNA gene, called Xite (X-inactivation intergenic transcription elements). Xite is transcribed at low levels, on the order of 10- to 60-fold less than Tsix levels in mouse ES cells. Although there is some bidirectional transcription, the majority of the transcripts are oriented in the same direction as Tsix. Deleting Xite results in preferential silencing of the X in cis, thereby skewing the normally random probability that any one X would be chosen as the silent one (Ogawa and Lee 2003
). Xite action does not appear to depend on the RNA per se, because truncation of the RNA does not produce any obvious phenotype, suggesting that transcription from the region could be more important. The monoallelic expression of Xist, at least in mice, is controlled by complex regulation of Tsix and Xite as well as cis-regulatory sequences located in the 3' region of Xist (Heard and Disteche 2006
). In the current model, Xite works together with Tsix to designate the Xa where transcription from Xite acts as an enhancer for Tsix by promoting the persistence of Tsix expression during cell differentiation; this in turn prevents the up-regulation and spread of Xist RNA along the chosen Xa (Spencer and Lee 2006
). Ftx is another ncRNA gene located
150 kb upstream of mouse Xist. In mouse and humans, the 5' regions of Ftx are well conserved and contain CpG islands at positions corresponding to the cDNA start sites and are transcribed in opposite orientation relative to Xist/XIST genes (Chureau et al. 2002
). Future investigation of a less-characterized ncRNA gene Jpx (Enox) found around the Xic may also show that this gene participates in the regulatory events of XCI (Spencer and Lee 2006
).
X-chromosome hyperactivation in Drosophila
Unlike the situation in mammals, dosage compensation in Drosophila is achieved by a twofold up-regulation of transcription of genes on the single X chromosome present in males (Kelley 2004
). Intriguingly, the fly dosage compensation system also involves multiple ncRNAs: roX1 and roX2 (RNA on the X). These RNAs are members of the dosage compensation complex (DCC), a huge RNAprotein complex that binds to hundreds of sites along the male X chromosome in a highly reproducible, banded pattern (Meller 2000
; Meller et al. 2000
). In addition to roX1 and roX2 RNAs, the DCC also contains a specific set of proteins that include MLE (maleless); MSL1, MSL2, and MSL3 (male-specific lethal 1, 2, and 3, respectively); and MOF (males absent on the first). Mutations in these genes result in male-specific lethality of larvae, and their products are collectively termed MSL proteins. A characteristic feature of the up-regulated X chromosome is the specific acetylation of histone H4 at Lys 16 (H4Ac16) (Akhtar 2003
).
The two roX genes are transcribed from the X chromosome, produce polyadenylated nuclear retained transcripts, and are expressed only in male adult flies (Fig. 2). The roX RNAs are functionally redundant even though they have very little sequence homology and are distinct in size (3.7 kb for roX1 RNA vs. 0.51.2 kb for roX2 RNA) (Meller and Rattner 2002
). Deletion of either roX gene has no effect on males. However, deletion of both results in male lethality. The MSL-binding pattern on the X chromosome is drastically disrupted in the roX1 roX2 double-mutant males, suggesting that roX RNAs are important for correctly targeting the MSL complex to the X (Meller and Rattner 2002
). The roX genes could be performing two distinct and separable functions in dosage compensation. First, roX RNAs constitute indispensable elements of the DCC responsible for chromatin modifications. Second, the genes themselves provide strong chromatin entry sites for the MSL complex, possibly to ensure rapid recruitment of the MSL proteins for roX RNA binding. The current model suggests that there are different DNA recognition elements on the X chromosome that have different affinities for the MSL complex; high, intermediate, or weak. High-affinity cis elements, such as within the roX genes, would not require additional cis-elements for recruiting MSL complexes, and this interaction is strengthened by roX RNA. Intermediate and weak-affinity cis-elements might require several cis-elements for robust binding, resulting in the ability to attract partial MSL complexes (Oh et al. 2004
).
|
Although there is significant evidence to show that ncRNAs are the major effectors of dosage compensation, the molecular basis of how they regulate these processes is still not clearly understood, and the future is likely to reveal many exciting solutions.
Male hypermethylated (MHM) region in birds
In birds, sex determination and differentiation depend on the sex chromosomes Z and W. Males have two Z chromosomes, whereas females are determined by the ZW karyotype. One of the genes proposed to play a role in sex determination in birds is a homolog of human DMRT1 (doublesex and mab-3-related transcription factor) implicated in testis differentiation. DMRT1 has been mapped to the Z chromosome, and its elevated expression in males has been found to correlate with testis development (Smith et al. 2003
). A MHM region was identified in the Z chromosome in the vicinity of the DMRT1 gene, and the CpG islands in this region are hypermethylated only in males. However, in females, the MHM region is hypomethylated, and transcription from this region produces ncRNAs (the longest transcripts are
9.5 kb), most of which are nonpolyadenylated and accumulate at or very close to the sites of transcription and close to the DMRT1 locus. The female-specific MHM ncRNAs are suggested to play a role as transcriptional repressors of the DMRT1 locus similar to the role played by Xist RNA in XCI (Teranishi et al. 2001
; Szymanski and Barciszewski 2003
).
| Roles of ncRNAs in genomic imprinting: one is enough |
|---|
|
|
|---|
IGF2/H19 locus
The Igf2/H19 domain is perhaps the best characterized of any autosomally imprinted locus (human 11p15.5 and mouse distal 7b). The first imprinted ncRNA locus to be discovered, the H19 gene produces a spliced and polyadenylated ncRNA transcript of
2.3 kb that is expressed only from the maternal allele (Brannan et al. 1990
). H19 is the reciprocally imprinted partner of Igf2 (insulin-like growth factor), and Igf2 is expressed only from the paternal allele. Mutations disrupting the imprinted expression of Igf2 underlie a substantial proportion of cases of congenital growth disorder. Interestingly, in the Igf2/H19 domain, imprinting is achieved through "enhancer competition" mediated by a set of chromatin insulators. Igf2 and H19 share a set of enhancers, but only one gene can engage the enhancer at any time and is regulated by an insulator sequence that lies just upstream of the H19 promoter (Webber et al. 1998
; Kanduri et al. 2000
; Kaffer et al. 2001
). On the maternal chromosome, the insulator sequence is not methylated and, therefore, binds CCCTC-binding factor (CTCF), a vertebrate insulator protein. Binding of CTCF prevents the enhancers from engaging the Igf2 gene and together with the enhancers also trans-activates H19. However, on the paternal chromosome, the insulator sequence is methylated and, therefore, cannot bind the methylation-sensitive CTCF, allowing the enhancer to engage Igf2 (Engel and Bartolomei 2003
). In this way, the insulator sequence upstream of H19 comprises an "imprinting center" that regulates the reciprocal expression of H19 and Igf2. Although H19 is conserved among mammals and highly expressed in embryos, studies carried out over the last 15 yr indicate that the H19 transcript itself has no apparent role in the imprinted expression of its neighboring genes (Jones et al. 1998
) and is also not necessary for normal development in mice (Ripoche et al. 1997
). The chromosomal region containing H19 has also been associated with tumor suppressor activity, and the expression pattern of H19 RNA in several cancer cell types differs from neighboring nonmalignant cells (see "Regulatory RNAs Implicated in Complex Diseases: Dark Side of RNA" below). In addition to H19, other ncRNAs emanating from the Igf2/H19 region have been identified, some of which show imprinting while others are expressed biallelically; however, their functional significance has yet to be determined (Moore et al. 1997
; Drewell et al. 2002b
).
KCNQ1 locus
As with the closely linked Igf2/H19 cluster, the KCNQ1 locus is closely associated with human Beckwith-Wiedemann syndrome (BWS), a syndrome characterized by parental asymmetric overgrowth, enlarged tongue, and cancer such as Wilms tumor (Szymanski and Barciszewski 2003
; ONeill 2005
). The inheritance of BWS is exceptionally complex because the etiology of the disease involves multiple genes in both the KCNQ1 and the Igf2/H19 domains. Interestingly, almost all of the imprinted genes in the KCNQ1 domain are maternally expressed except the paternally expressed ncRNA gene Kcnq1ot1 (Lit1), the antisense counterpart of Kcnq1 (Mitsuya et al. 1999
; Umlauf et al. 2004
). The antisense Kcnq1ot1 gene appears to be critical for establishing the imprinted profile of the nearby genes (Mancini-Dinardo et al. 2006
). Recent studies suggest that the Kcnq1ot1 RNA does so by the recruitment of chromatin changes to the imprinted domain, including H3K9 methylation and H3K27 methylation (Lewis et al. 2004
; Umlauf et al. 2004
). The Kcnq1ot1 promoter lies within a differentially methylated region of the Kcnq1 gene body and is now known to make up the imprinting center for the BWS domain (Spencer and Lee 2006
). Deleting the Kcnq1ot1 CpG island (5' end) results in loss of imprinting in mice, and either the Kcnq1ot1 RNA or transcription through its entire length is required in cis for imprinting of neighboring genes (Cleary et al. 2001
; Thakur et al. 2004
; Mancini-Dinardo et al. 2006
). A transgenic mouse producing a truncated Kcnq1ot1 transcript exhibited correct imprinting but does not result in silencing any of the flanking mRNA genes in the imprinted cluster (Mancini-Dinardo et al. 2006
; Pauler and Barlow 2006
). Interestingly, the most common abnormalities in BWS are epigenetic, involving abnormal methylation of H19 or Kcnq1ot1. Recently, microdeletions either in the H19 or Kcnq1ot1 gene have been shown to be associated with BWS, providing genetic confirmation of the importance of this chromosomal region for the disease (Costa 2005
).
Igf2r (insulin-like growth factor type-2 receptor)/Air (Antisense Igf2r RNA)
The Igf2r/Air locus (proximal chromosome 17) in mice provides yet another example of ncRNA regulation within imprinted loci. A differentially methylated region-2 (DMR2) within the second intron of Igf2r constitutes a critical, bidirectional element controlling silencing of the paternal allele of three protein-coding imprinted genes, Igf2r, Slc22a2, and Slc22a3 (Zwart et al. 2001
). DMR2 resides in a promoter that drives the transcription of a nonprotein-coding antisense transcript, Air, which partially overlaps with Igf2r. Air is an
108-kb, capped, polyadenylated, ncRNA and is transcribed exclusively by RNA pol II from the paternal allele (Wutz et al. 1997
; Braidotti et al. 2004
; Seidl et al. 2006
). The majority of Air transcripts evade cotranscriptional splicing resulting in mature unspliced, highly unstable nuclear transcripts (Seidl et al. 2006
). Like Kcnqlot1, the Air gene is responsible for the bidirectional silencing of neighboring genes in cis, as deleting the Air CpG island results in loss of parental silencing across the entire domain (Wutz et al. 1997
; Zwart et al. 2001
). The silencing of these three genes depends on the unmethylated CpG islands and transcription of Air RNA. Because Air RNA does not overlap with two of the three imprinted genes in the domain (Slc22a2 and Slc22a3), Air RNA cannot work through double-stranded RNA (dsRNA) mechanisms, but because truncating Air RNA leads to a disruption of imprinting, its transcription and/or the RNA itself may be required for imprinting (Sleutels et al. 2002
; Spencer and Lee 2006
). A suggested mechanism of Air action involves two steps. First, Air expression results in the silencing of the overlapping Igf2r by promoter occlusion or cis-acting RNA interference (RNAi). This could result in an induction of the silent chromatin state that would spread and shut off flanking genes. However, studies by Barlow and colleagues (Sleutels et al. 2003
) showed proper imprinting of Slc22a2, Slc22a3, and also Air in mice that lack Igf2r, suggesting that the antisense mechanism followed by spreading of silencing may not be the only mechanism responsible for Igf2r/Air locus imprinting. Alternatively, Air RNA could recruit chromatin modifier proteins to specific regions of the imprinted locus in a manner similar to the role suggested for Xist RNA (Sleutels et al. 2003
). Consistent with this, Igf2r exhibits allele-specific histone modifications (Fournier et al. 2002
). However, RNA FISH analysis using specific probes against Air RNA did not show coating by Air RNA of the imprinted chromosomal region (Braidotti et al. 2004
).
Prader-Willi/Angelman syndrome (PWS/AS) locus
PWS AS are the result of disrupted expression of imprinted genes covering a >4-Mb region of human 15q1113 (mouse proximal 7). The PWS/AS locus in human provided the first example of an imprinted disorder when it was discovered that uniparental disomies (the inheritance of both chromosome copies from the single parent) of chromosome 15 results in an assemblage of congenital problems (ONeill 2005
). Maternal disomies result in PWS, whereas the paternal disomies result in AS. PWS is exemplified in newborns by hypotonia, hypogonadism, and various mental retardation and feeding difficulties, followed later in childhood by hyperphagia (Cassidy et al. 2000
). PWS is a continuous gene disorder manifested by loss of expression of a group of paternally transcribed protein-coding genes including SNURF/SNRPN, MKRN3, MAGEL2, and ZNF127 (ONeill 2005
). IPW (Imprinted in Prader-Willi) was isolated as a novel imprinted ncRNA gene from the PWCR (Prader-Willi chromosome region) that produces a spliced and polyadenylated ncRNA (Wevrick et al. 1994
). The same locus also codes for another ncRNA gene, ZNF127AS, an antisense gene to ZNF127 expressed in brain and lungs (Jong et al. 1999
). AS is characterized by ataxic gate, jerky arm movements, inappropriate laughter, and severe mental retardation (Williams et al. 1995
). Loss-of-function mutations in a maternally transcribed gene at this locus, UBE3A, can cause AS (Albrecht et al. 1997
; Kishino et al. 1997
). The paternal silencing of UBE3A is confined to specific brain subregions; elsewhere it is biallelically expressed (Rougeulle et al. 1997
; Vu and Hoffman 1997
). Additionally, there is paternal-specific expression of a large, alternatively spliced antisense transcript (UBE3A-ATS), spanning
450 kb in human and
1 Mb in mice. Deleting the 5' end of this long antisense transcript results in reduced expression of UBE3A on the paternal chromosome (Chamberlain and Brannan 2001
). Although no role has been ascribed to the large UBE3A antisense transcripts, it has been proposed that these RNAs may be directly linked to the etiology of the diseases (Rougeulle et al. 1998
; Runte et al. 2004
). A second maternal-specific transcript from this region, ATP10C, has also been implicated in the AS phenotype (Meguro et al. 2001
). The PWS/AS locus also contains several clusters of snoRNAs (C/D-box snoRNAs) expressed exclusively from the paternal chromosome. Interestingly, many of these snoRNA genes that overlap UBE3A on the opposite strand were shown to be overexpressed in AS patients (Runte et al. 2001
).
GNAS locus
Transcription of genes at the GNAS imprinted locus (human 20q13 and mouse distal 2) is exceptionally complex. The core gene of this locus is GNAS, which is expressed ubiquitously and biallelically in all but a few tissues. It encodes Gs
, the
-subunit of the heterotrimeric G-protein complex. Constitutive activating mutations in Gs
give rise to McCune-Albright syndrome, characterized variably by café-au-lait spots, gonadotropin-independent sexual precocity, and fibrous dysplasia of bone (Schwindinger et al. 1992
). In certain hormone targeted tissues (renal proximal tissues, gonads, and thyroid in humans), GNAS is transcribed predominantly from the maternal allele. NESP55, encoding a chromogranin-like neurosecretory protein, is also maternally expressed. Unusually, NESP55 incorporates exons 213 of GNAS into its 3' untranslated region (UTR). The ncRNAs transcribed from this locus includes NESPAS, a spliced antisense transcript, and a truncated ncRNA transcript expressed from the GNAS locus by alternative promoter usage. A recent report implicates a possible role for the NESPAS transcript in the transcriptional control of GNAS (Bastepe et al. 2005
). NESPAS RNA expression could repress NESP55 by promoter occlusion, localized heterochromatinization, or competition for shared transcription factors (Wroe et al. 2000
).
Study of the molecular elements that combine to initiate and maintain the imprint and translate it into monoallelic expression has suggested a critical role of ncRNAs in governing gene silencing. Better insight into the mechanism of ncRNA action on the imprinted loci will provide an important paradigm for understanding genomic imprinting.
| Intergenic transcripts: sense in reading between the genes |
|---|
|
|
|---|
Mammalian
-globin locus
In humans, the 70-kb
-globin locus consists of five erythroid-specific genes; embryonic (
), fetal (G
and A
), and adult (
and
), whose expression is under the control of the
-LCR (locus control region). Analysis of nascent transcripts from the
-globin gene cluster revealed that both intergenic regions and LCR constitutively produce specific ncRNAs (Ashe et al. 1997
). Both LCR and intergenic transcripts originate from the same strand as other globin genes and are retained in the nucleus (Ashe et al. 1997
). Expression of ncRNA transcripts from the LCR and intergenic regions are restricted primarily to erythroid cells. Interestingly, transient expression of globin genes in nonerythroid cells can induce transcription from the intergenic region without activating the protein-coding domains (Ashe et al. 1997
). An explanation for the production of intergenic trancripts from the LCR has been suggested by a "tracking model." According to this model, erythroid-specific and ubiquitous transcription factors and cofactors form complexes with the LCR and track along the locus. When this transcription complex encounters the basal transcription machinery, located at the promoter, transcription of the gene is initiated (Q. Li et al. 2002
). During this process, there is a high probability that intergenic transcripts would arise from the cryptic start sites along the locus. It has been proposed that these intergenic transcripts might facilitate the recruitment of trans-acting factors and RNA pol II to the promoters of globin genes via this tracking mechanism (Tuan et al. 1992
). Alternatively, intergenic transcription may be required for the establishment and maintenance of an open chromatin conformation within the globin locus (Gribnau et al. 2000
; Plant et al. 2001
). However, the persistence of DNase I hypersensitivity following deletion of the LCRs in cell lines argues against this role (Epner et al. 1998
; Reik et al. 1998
). Similarly, studies by Haussecker and Proudfoot (2005)
did not observe a positive correlation between intergenic transcript abundance and chromatin activation and/or globin gene expression. Instead, this study suggested that intergenic transcription at the
-globin locus mediates the formation of silent chromatin in the absence of erythrocyte-specific transcription factors (Haussecker and Proudfoot 2005
).
IL-4/IL-13 gene cluster
During differentiation of naive CD4+ precursors to T helper 1 (Th1) or Th2 effector cells, several epigenetic changes occur in a lineage-specific manner at the IFN
or IL4/IL13 loci. Upon activation, a subset of Th2 cells involved in cell-mediated immune responses express IL-4 and IL-13 genes located in tandem on human chromosome 5q (chromosome 11 in mouse) (Frazer et al. 1997
). This cluster is flanked by two constitutively expressed genes: Rad50 and Kif3a. Transcription analysis from this intergenic region in CD4+ T cells has revealed the presence of a 130- to 260-nt polyadenylated nuclear retained ncRNA. Studies in a mouse transgenic model have revealed that the intergenic transcription is restricted to tissues and lineages in which IL-4 and IL-13 are expressed and is up-regulated upon Th2 differentiation (Rogan et al. 2004
). However, these intergenic transcripts are constitutively expressed even in the absence of active IL genes, implying that they are derived from independent transcription units. Although the role of these intergenic transcripts is not clear, one possible explanation is that they result from the chromatin remodeling activity at this locus (Takemoto et al. 2000
). Consistent with this idea, the differentiation of Th2 cells was found to be associated with hyperacetylation of histone H3 and hypomethylation of the CpG islands (Yamashita et al. 2002). Another example of integenic transcription in a lineage-specific gene cluster has been described at the MHC class II locus (Masternak et al. 2003
).
Intergenic transcripts from the Dlx-5/6 region
Vertebrate Dlx genes are members of the homeodomain protein family that play critical roles in differentiation and migration of neurons as well as craniofacial and limb patterning during development (Feng et al. 2006
and references therein). The Dlx genes are expressed in bi-gene clusters, and conserved intergenic enhancers have been identified for the Dlx-5/6 and Dlx-1/2 loci (Zerucha et al. 2000
; Ghanem et al. 2003
). One of the two conserved intergenic regions from mouse, the Dlx-5/6 region transcribes two ncRNAs, Evf-1 and Evf-2, the latter being the alternatively spliced form of Evf-1 (Kohtz and Fishell 2004
; Feng et al. 2006
). Evf-1 is a 2.7-kb polyadenylated RNA, and its expression is developmentally regulated (Kohtz and Fishell 2004
). The Evf-2 ncRNA (3.8 kb) specifically cooperates with the homeodomain protein Dlx-2 to increase the transcriptional activity of the Dlx-5/6 enhancer region in a target- and homeodomain-specific manner. Interestingly, a stable complex containing the Evf-2 ncRNA/Dlx-2 homeodomain protein forms in vivo in the nucleus (Feng et al. 2006
). Together, these data suggest that the Evf-2/Dlx-2 complex stabilizes the interaction between Dlx-2 and target Dlx-5/6 enhancer sequences to increase transcriptional activity. The role of Evf-2 as a transcriptional activator suggests the possibility that a subset of such vertebrate ultraconserved regions may function at the RNA level as key developmental regulators.
Bithorax complex (BX-C) in Drosophila
In Drosophila, the homeotic genes encoded by the BX-C are involved in specifying the segmentation of the embryo and determining the body plan (Lewis 1978
). The correct spatial and temporal expression of the three protein-coding genes Ultrabithorax (Ubx), Abdominal-A (Abd-A), and Abdominal-B (Abd-B) is crucial for the development of thoracic and abdominal segments. The expression pattern of Abd-A and Abd-B depends on an array of regulatory elements located in the intergenic regions between these genes, including seven genetically defined infra-abdominal (iab-28) domains, and mutations in this region are associated with developmental defects affecting abdominal segments (Sanchez-Herrero and Akam 1989
). The iabs are transcribed exclusively in the embryos. A systemic examination of the distribution of these intergenic transcripts from the iab regions revealed that they show highly specific localization along the anteriorposterior axis of the blastoderm embryo and the transcripts are restricted to the nucleus (Bae et al. 2002
). The intergenic transcripts originating from iab-4 revealed 1.7-kb and 2.0-kb polyadenylated ncRNAs that are transcribed in the opposite direction to Abd-A (Cumberledge et al. 1990
). Alteration of transcription in one iab subdomain induces a homeotic transformation of the more posterior segment under its control, suggesting that intergenic transcription plays a crucial role in iab activity (Drewell et al. 2002a
). Intergenic transcription from the iab regions has also been proposed to play a role in the activation of cis-regulatory elements by interfering with the Polycomb-repressing complex, responsible for silencing the homeotic genes (Bender and Fitzgerald 2002
; Hogga and Karch 2002
). The iab-4 region contains a single
100-nt pre-miRNA hairpin structure that encodes two stable miRNAs: mir-iab-4-5p and mir-iab-4-3p (Aravin et al. 2003
). Recent studies revealed that these miRNAs regulate Ubx activity in vivo (Stark et al. 2003
; Grun et al. 2005
; Ronshaugen et al. 2005
).
Intergenic transcription within the BX-C is not limited to the iab regions but also has been reported for the bithoraxoid (bxd) region (Lipshitz et al. 1987
). This region exhibits active transcription twice: once early in embryogenesis and once in later larval and adult stages. The early transcripts (1.11.3 kb, are processed from a 26-kb precursor) appear to be ncRNAs, whereas the late transcripts (0.8 kb) can be translated to produce a protein (Lipshitz et al. 1987
). Recently, an elegant study by Sauer and colleagues (Sanchez-Elsner et al. 2006
) provided direct evidence of the role of intergenic transcripts from the Ubx region in epigenetic activation of gene expression. The Ubx locus contains multiple cis-regulatory elements known as trithorax response elements (TRE) that recruit transcriptional activators such as the trithorax group (trxG) of epigenetic regulators. Interestingly, the same DNA elements can also act as repressor-binding sites, Polycomb response elements (PRE), and facilitate the recruitment of members of the Polycomb (PcG) complex. It has previously been shown that intergenic transcription of ncRNAs from TRE/PRE elements switches a silent PRE to a TRE, which indicates that TRE/PRE transcription plays an important role in epigenetic activation (Lipshitz et al. 1987
; Rank et al. 2002
; Schmitt et al. 2005
). Recent studies by Sanchez-Elsner (Sanchez-Elsner et al. 2006
) further showed that these intergenic transcripts from the TRE at the Ubx locus mediate transcriptional activation of Ubx by recruiting the epigenetic regulator Ash1 to the TRE elements. Ash1 is a histone methyltransferase (HMT) that promotes transcriptional activation by trimethylating H3K4, H3K9, and H4K20 (Beisel et al. 2002
) and is essential for the tissue-specific expression of Ubx (Beisel et al. 2002
and references therein). Therefore, intergenic transcripts derived from the TRE locus mediate the recruitment of Ash1 to the TRE DNA elements of Ubx. These ncRNA transcripts serve as an intermediary between the TRE DNA elements and Ash1 protein (Sanchez-Elsner et al. 2006
). These data further support a model in which an intergenic ncRNA transcribed from the TRE of Ubx is retained at the TRE through DNARNA interactions and plays an important role in providing an RNA scaffold that is recognized by Ash1.
SRG1 in Saccharomyces cerevisiae
Unlike the above examples in which intergenic transcription is involved in the transcriptional activation of the corresponding region, studies in the budding yeast S. cerevisiae have revealed the role of intergenic transcription in transcriptional repression (Martens et al. 2004
, 2005
). Transcription of the intergenic ncRNA gene SRG1 (SER3 regulatory gene 1) across the promoter of the adjacent SER3, a serine biosynthetic gene, represses the transcription of SER3 by transcriptional interference (Martens et al. 2004
). SRG1 transcription is regulated by serine such that in the presence of serine, the serine-dependent activator Cha4 binds to the SRG1 promoter and activates its transcription, thereby negatively regulating the expression of SER3 (Martens et al. 2005
). These studies demonstrate an example where intergenic transcription provides a mechanism for a single protein, Cha4, to simultaneously activate and repress opposing pathways.
The evergrowing list of intergenic transcripts located mostly in the nonprotein-coding regions of the genome has highlighted the importance of intergenic transcription in regulating gene activity. This further highlights the fact that the high proportion of nonprotein-coding regions in the eukaryotic genome is probably not due to the accumulation of nonsense DNA but rather represents the evolution of more complicated gene regulatory mechanisms (Schmitt and Paro 2004
).
| Natural antisense transcripts (NATs): new players in the gene regulatory network |
|---|
|
|
|---|
Transcriptional interference
Transcription by RNA pol II involves both large protein complexes and the unwinding of the duplex DNA. It is unlikely that two overlapping transcriptional units could be transcribed concomitantly by the RNA pol II machinery. Such effects have been well studied with respect to the GAL10 and GAL7 genes in S. cerevisiae (Prescott and Proudfoot 2002
). When arranged convergently, but not overlapping, both genes are transcribed at normal levels. However, when the two transcription units overlap, steady-state mRNA levels are severely reduced due to an inhibition of transcription elongation, suggesting that the expression of cis-NAT partners could be tightly regulated through a process of competitive transcriptional interference. Under such circumstances, cis-NATs might be expected to exhibit reciprocal expression, which holds true for many of the antisense partners in the eukaryotic genome (http://www.narna.ncl.ac.uk).
An antisense transcript that may function as a negative regulator of gene expression by transcriptional interference has been identified in plants (Kapranov et al. 2001
). In the legume Lotus japonicus, the expression of the late nodulin LjNOD16 gene is controlled by a bidirectional promoter located within an intron of the gene LjPLP-IV (LjPLP-IV encodes a phosphatidylinositol transfer-like protein). Transcription from the opposite strand gives rise to an antisense transcript responsible for the control of LjPLP-IV expression in root nodules, where its level is significantly lower than in flowers (Kapranov et al. 2001
). Similarly, during XCI, it was suggested that the Tsix transcripts regulate the asymmetric expression of Xist by an antisense mechanism (Lee et al. 1999
; Sun et al. 2006
). However, this mechanism of transcriptional interference cannot fully explain the repressive effect of the Air/Igf2r and KCNQ1 loci, as genes outside of the region of overlapping antisense Air and Kcnq1ot1 are also transcriptionally repressed.
RNA masking
Formation of RNA duplexes between sense and antisense transcripts might mask key regulatory features within either transcript, thereby inhibiting the interaction of important trans-acting factors. This form of steric inhibition could affect any step in gene expression involving proteinRNA interactions, including pre-mRNA processing, transport, translation, and degradation. An example of this method of antisense regulation is the inhibition of alternative splicing induced by the Rev-ErbA
transcript in different B-cell lines, which overlaps one of two functionally anatagonistic splice forms of the thyroid hormone receptor ErbA
2 mRNA (Hastings et al. 1997
, 2000
). An antisense RNA-based mechanism has also been shown to be responsible for the regulation of the human HFE gene, which is implicated in iron metabolism and involved in the human inherited disorder hereditary hemochromatosis (Thenie et al. 2001
). Although there is no direct evidence for the role of the HFE antisense transcript in vivo, in vitro studies demonstrated that the antisense trans