51
|
Louro R, El-Jundi T, Nakaya HI, Reis EM, Verjovski-Almeida S. Conserved tissue expression signatures of intronic noncoding RNAs transcribed from human and mouse loci. Genomics 2008; 92:18-25. [DOI: 10.1016/j.ygeno.2008.03.013] [Citation(s) in RCA: 44] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2008] [Revised: 03/25/2008] [Accepted: 03/28/2008] [Indexed: 12/15/2022]
|
52
|
Kim M, Patel B, Schroeder KE, Raza A, Dejong J. Organization and transcriptional output of a novel mRNA-like piRNA gene (mpiR) located on mouse chromosome 10. RNA (NEW YORK, N.Y.) 2008; 14:1005-1011. [PMID: 18441047 PMCID: PMC2390792 DOI: 10.1261/rna.974608] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/21/2007] [Accepted: 02/21/2008] [Indexed: 05/26/2023]
Abstract
This letter describes the architecture and transcriptional output of a novel noncoding RNA gene in mouse and rat. The mRNA-like piRNA (mpiR) gene, lies between the Perp and KIAA1244 genes on mouse chromosome 10 and rat chromosome 1. In mouse, the mpiR gene is associated with the production of at least 13 different alternatively spliced and polyadenylated transcripts ranging from 500 nt to over 6 kb. Although these transcripts are structurally similar to conventional mRNAs, only short polypeptides are predicted on each of the three possible reading frames. Intron 2 is unique in that it harbors a novel low copy repeat with homology with the 3'-UTR of the lin-28 gene, while Exon 4 contains an unusual cluster of nine sequence modules that are dispersed throughout the mouse genome. The mpiR gene is expressed at low levels in somatic tissues, but is transcriptionally up-regulated in the testis at day 14 post-partum, a time that coincides with the pachytene stage of meiosis I. Bisulfite methylation analysis shows that expression in brain, liver, and testis is correlated with the methylation status of the promoter region. In addition to mRNA-like transcripts, the mpiR gene is also a precursor to testis-specific piRNAs, and these can be detected by both Northern and PCR-based approaches. Remarkably, piRNAs originate from two specific regions of the gene, one corresponding to Intron 2 and the other to Exon 4. Overall, this work provides a picture of a novel, lineage-specific, noncoding RNA gene and describes its processing into both mRNA-like and piRNA products.
Collapse
|
53
|
He S, Su H, Liu C, Skogerbø G, He H, He D, Zhu X, Liu T, Zhao Y, Chen R. MicroRNA-encoding long non-coding RNAs. BMC Genomics 2008; 9:236. [PMID: 18492288 PMCID: PMC2410135 DOI: 10.1186/1471-2164-9-236] [Citation(s) in RCA: 47] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2007] [Accepted: 05/21/2008] [Indexed: 01/12/2023] Open
Abstract
BACKGROUND Recent analysis of the mouse transcriptional data has revealed the existence of approximately 34,000 messenger-like non-coding RNAs (ml-ncRNAs). Whereas the functional properties of these ml-ncRNAs are beginning to be unravelled, no functional information is available for the large majority of these transcripts. RESULTS A few ml-ncRNA have been shown to have genomic loci that overlap with microRNA loci, leading us to suspect that a fraction of ml-ncRNA may encode microRNAs. We therefore developed an algorithm (PriMir) for specifically detecting potential microRNA-encoding transcripts in the entire set of 34,030 mouse full-length ml-ncRNAs. In combination with mouse-rat sequence conservation, this algorithm detected 97 (80 of them were novel) strong miRNA-encoding candidates, and for 52 of these we obtained experimental evidence for the existence of their corresponding mature microRNA by microarray and stem-loop RT-PCR. Sequence analysis of the microRNA-encoding RNAs revealed an internal motif, whose presence correlates strongly (R2 = 0.9, P-value = 2.2 x 10(-16)) with the occurrence of stem-loops with characteristics of known pre-miRNAs, indicating the presence of a larger number microRNA-encoding RNAs (from 300 up to 800) in the ml-ncRNAs population. CONCLUSION Our work highlights a unique group of ml-ncRNAs and offers clues to their functions.
Collapse
Affiliation(s)
- Shunmin He
- Bioinformatics Laboratory and National Laboratory of Biomacromolecules, Institute of Biophysics, Chinese Academy of Sciences, Beijing, PR China.
| | | | | | | | | | | | | | | | | | | |
Collapse
|
54
|
Abstract
npcRNA (non-protein-coding RNAs) are an emerging class of regulators, so-called riboregulators, and include a large diversity of small RNAs [miRNAs (microRNAs)/siRNAs (small interfering RNAs)] that are involved in various developmental processes in plants and animals. In addition, several other npcRNAs encompassing various transcript sizes (up to several kilobases) have been identified using different genomic approaches. Much less is known about the mechanism of action of these other classes of riboregulators also present in the cell. The organogenesis of nitrogen-fixing nodules in legume plants is initiated in specific root cortical cells that express the npcRNA MtENOD40 (Medicago truncatula early nodulin 40). We have identified a novel RBP (RNA-binding protein), MtRBP1 (M. truncatula RBP 1), which interacts with the MtENOD40 RNA, and is exported into the cytoplasm during legume nodule development in the region expressing MtENOD40. A direct involvement of the MtENOD40 RNA in the relocalization of this RBP into cytoplasmic granules could be demonstrated, revealing a new RNA function in the cell. To extend these results, we searched for npcRNAs in the model plant Arabidopsis thaliana whose genome is completely known. We have identified 86 novel npcRNAs from which 27 corresponded to antisense RNAs of known coding regions. Using a dedicated 'macroarray' containing these npcRNAs and a collection of RBPs, we characterized their regulation in different tissues and plants subjected to environmental stresses. Most of the npcRNAs showed high variations in gene expression in contrast with the RBP genes. Recent large-scale analysis of the sRNA component of the transcriptome revealed an enormous diversity of siRNAs/miRNAs in the Arabidopsis genome. Bioinformatic analysis revealed that 34 large npcRNAs are precursors of siRNAs/miRNAs. npcRNAs, which are a sensitive component of the transcriptome, may reveal novel riboregulatory mechanisms involved in post-transcriptional control of differentiation or environmental responses.
Collapse
|
55
|
Roshan U, Chikkagoudar S, Livesay DR. Searching for evolutionary distant RNA homologs within genomic sequences using partition function posterior probabilities. BMC Bioinformatics 2008; 9:61. [PMID: 18226231 PMCID: PMC2248559 DOI: 10.1186/1471-2105-9-61] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2007] [Accepted: 01/28/2008] [Indexed: 11/11/2022] Open
Abstract
Background Identification of RNA homologs within genomic stretches is difficult when pairwise sequence identity is low or unalignable flanking residues are present. In both cases structure-sequence or profile/family-sequence alignment programs become difficult to apply because of unreliable RNA structures or family alignments. As such, local sequence-sequence alignment programs are frequently used instead. We have recently demonstrated that maximal expected accuracy alignments using partition function match probabilities (implemented in Probalign) are significantly better than contemporary methods on heterogeneous length protein sequence datasets, thus suggesting an affinity for local alignment. Results We create a pairwise RNA-genome alignment benchmark from RFAM families with average pairwise sequence identity up to 60%. Each dataset contains a query RNA aligned to a target RNA (of the same family) embedded in a genomic sequence at least 5K nucleotides long. To simulate common conditions when exact ends of an ncRNA are unknown, each query RNA has 5' and 3' genomic flanks of size 50, 100, and 150 nucleotides. We subsequently compare the error of the Probalign program (adjusted for local alignment) to the commonly used local alignment programs HMMER, SSEARCH, and BLAST, and the popular ClustalW program with zero end-gap penalties. Parameters were optimized for each program on a small subset of the benchmark. Probalign has overall highest accuracies on the full benchmark. It leads by 10% accuracy over SSEARCH (the next best method) on 5 out of 22 families. On datasets restricted to maximum of 30% sequence identity, Probalign's overall median error is 71.2% vs. 83.4% for SSEARCH (P-value < 0.05). Furthermore, on these datasets Probalign leads SSEARCH by at least 10% on five families; SSEARCH leads Probalign by the same margin on two of the fourteen families. We also demonstrate that the Probalign mean posterior probability, compared to the normalized SSEARCH Z-score, is a better discriminator of alignment quality. All datasets and software are available online. Conclusion We demonstrate, for the first time, that partition function match probabilities used for expected accuracy alignment, as done in Probalign, provide statistically significant improvement over current approaches for identifying distantly related RNA sequences in larger genomic segments.
Collapse
Affiliation(s)
- Usman Roshan
- Department of Computer Science, New Jersey Institute of Technology, Newark, NJ, USA.
| | | | | |
Collapse
|
56
|
Abstract
The principal route to understanding the biological significance of the genome sequence comes from discovery and characterization of that portion of the genome that is transcribed into RNA products. We now know that this ;transcriptome' is unexpectedly complex and its precise definition in any one species requires multiple technical approaches and an ability to work on a very large scale. A key step is the development of technologies able to capture snapshots of the complexity of the various kinds of RNA generated by the genome. As the human, mouse and other model genome sequencing projects approach completion, considerable effort has been focused on identifying and annotating the protein-coding genes as the principal output of the genome. In pursuing this aim, several key technologies have been developed to generate large numbers and highly diverse sets of full-length cDNAs and their variants. However, the search has identified another hidden transcriptional universe comprising a wide variety of non-protein coding RNA transcripts. Despite initial scepticism, various experiments and complementary technologies have demonstrated that these RNAs are dynamically transcribed and a subset of them can act as sense-antisense RNAs, which influence the transcriptional output of the genome. Recent experimental evidence suggests that the list of non-protein coding RNAs is still largely incomplete and that transcription is substantially more complex even than currently thought.
Collapse
Affiliation(s)
- Piero Carninci
- Genome Science Laboratory, Discovery and Research Institute, RIKEN Wako Institute, Wako, Saitama, Japan.
| |
Collapse
|
57
|
Xin Y, Quarta G, Gan HH, Schlick T. Estimating the Fraction of Non-Coding RNAs in Mammalian Transcriptomes. Bioinform Biol Insights 2008; 2:75-94. [PMID: 19812767 PMCID: PMC2735967 DOI: 10.4137/bbi.s443] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Open
Abstract
Recent studies of mammalian transcriptomes have identified numerous RNA transcripts that do not code for proteins; their identity, however, is largely unknown. Here we explore an approach based on sequence randomness patterns to discern different RNA classes. The relative z-score we use helps identify the known ncRNA class from the genome, intergene and intron classes. This leads us to a fractional ncRNA measure of putative ncRNA datasets which we model as a mixture of genuine ncRNAs and other transcripts derived from genomic, intergenic and intronic sequences. We use this model to analyze six representative datasets identified by the FANTOM3 project and two computational approaches based on comparative analysis (RNAz and EvoFold). Our analysis suggests fewer ncRNAs than estimated by DNA sequencing and comparative analysis, but the verity of our approach and its prediction requires more extensive experimental RNA data.
Collapse
Affiliation(s)
- Yurong Xin
- Department of Chemistry and 251 Mercer Street, New York University, New York, NY 10012, U.S.A
- Courant Institute of Mathematical Sciences, 251 Mercer Street, New York University, New York, NY 10012, U.S.A
| | - Giulio Quarta
- Department of Chemistry and 251 Mercer Street, New York University, New York, NY 10012, U.S.A
| | - Hin Hark Gan
- Department of Chemistry and 251 Mercer Street, New York University, New York, NY 10012, U.S.A
| | - Tamar Schlick
- Department of Chemistry and 251 Mercer Street, New York University, New York, NY 10012, U.S.A
- Courant Institute of Mathematical Sciences, 251 Mercer Street, New York University, New York, NY 10012, U.S.A
| |
Collapse
|
58
|
Fast pairwise structural RNA alignments by pruning of the dynamical programming matrix. PLoS Comput Biol 2007; 3:1896-908. [PMID: 17937495 PMCID: PMC2014794 DOI: 10.1371/journal.pcbi.0030193] [Citation(s) in RCA: 98] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2007] [Accepted: 08/20/2007] [Indexed: 11/19/2022] Open
Abstract
It has become clear that noncoding RNAs (ncRNA) play important roles in cells, and emerging studies indicate that there might be a large number of unknown ncRNAs in mammalian genomes. There exist computational methods that can be used to search for ncRNAs by comparing sequences from different genomes. One main problem with these methods is their computational complexity, and heuristics are therefore employed. Two heuristics are currently very popular: pre-folding and pre-aligning. However, these heuristics are not ideal, as pre-aligning is dependent on sequence similarity that may not be present and pre-folding ignores the comparative information. Here, pruning of the dynamical programming matrix is presented as an alternative novel heuristic constraint. All subalignments that do not exceed a length-dependent minimum score are discarded as the matrix is filled out, thus giving the advantage of providing the constraints dynamically. This has been included in a new implementation of the FOLDALIGN algorithm for pairwise local or global structural alignment of RNA sequences. It is shown that time and memory requirements are dramatically lowered while overall performance is maintained. Furthermore, a new divide and conquer method is introduced to limit the memory requirement during global alignment and backtrack of local alignment. All branch points in the computed RNA structure are found and used to divide the structure into smaller unbranched segments. Each segment is then realigned and backtracked in a normal fashion. Finally, the FOLDALIGN algorithm has also been updated with a better memory implementation and an improved energy model. With these improvements in the algorithm, the FOLDALIGN software package provides the molecular biologist with an efficient and user-friendly tool for searching for new ncRNAs. The software package is available for download at http://foldalign.ku.dk. FOLDALIGN is an algorithm for making pairwise structural alignments of RNA sequences. It uses a lightweight energy model and sequence similarity to simultaneously fold and align the sequences. The algorithm can make local and global alignments. The power of structural alignment methods is that they can align sequences where the primary sequences have diverged too much for normal alignment methods to be useful. The structures predicted by structural alignment methods are usually better than the structures predicted by single-sequence folding methods since they can take comparative information into account. The main problem for most structural alignment methods is that they are too computationally expensive. In this paper we introduce the dynamical pruning heuristic that makes the FOLDALIGN method significantly faster without lowering the predictive performance. The memory requirements are also significantly lowered, allowing for the analysis of longer sequences. A user-friendly (still command-line based, though) implementation of the algorithm is available at the Web site: http://foldalign.ku.dk
Collapse
|
59
|
Sakakibara Y, Irie T, Suzuki Y, Yamashita R, Wakaguri H, Kanai A, Chiba J, Takagi T, Mizushima-Sugano J, Hashimoto SI, Nakai K, Sugano S. Intrinsic promoter activities of primary DNA sequences in the human genome. DNA Res 2007; 14:71-7. [PMID: 17522093 PMCID: PMC2779894 DOI: 10.1093/dnares/dsm006] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
In order to understand an overview of promoter activities intrinsic to primary DNA sequences in the human genome within a particular cell type, we carried out systematic quantitative luciferase assays of DNA fragments corresponding to putative promoters for 472 human genes which are expressed in HEK (human embryonic kidney epithelial) 293 cells. We observed the promoter activities of them were distributed in a bimodal manner; putative promoters belonging to the first group (with strong promoter activities) were designated as P1 and the latter (with weak promoter activities) as P2. The frequencies of the TATA-boxes, the CpG islands, and the overall G + C-contents were significantly different between these two populations, indicating there are two separate groups of promoters. Interestingly, similar analysis using 251 randomly isolated genomic DNA fragments showed that P2-type promoter occasionally occurs within the human genome. Furthermore, 35 DNA fragments corresponding to putative promoters of non-protein-coding transcripts (ncRNAs) shared similar features with the P2 in both promoter activities and sequence compositions. At least, a part of ncRNAs, which have been massively identified by full-length cDNA projects with no functional relevance inferred, may have originated from those sporadic promoter activities of primary DNA sequences inherent to the human genome.
Collapse
Affiliation(s)
- Yuta Sakakibara
- Graduate School of Frontier Sciences, the University of Tokyo, 4-6-1 Shirokanedai, Minatoku, Tokyo 108-8639, Japan
- Faculty of Industrial Science and Technology, Tokyo University of Science, 2641 Yamazaki, Noda-shi, Chiba 278-8510, Japan
| | - Takuma Irie
- Graduate School of Frontier Sciences, the University of Tokyo, 4-6-1 Shirokanedai, Minatoku, Tokyo 108-8639, Japan
| | - Yutaka Suzuki
- Graduate School of Frontier Sciences, the University of Tokyo, 4-6-1 Shirokanedai, Minatoku, Tokyo 108-8639, Japan
- To whom correspondence should be addressed. Tel/Fax. +81 4-7136-3607. E-mail:
| | - Riu Yamashita
- Human Genome Center, The Institute of Medical Science, the University of Tokyo, 4-6-1 Shirokanedai, Minatoku, Tokyo 108-8639, Japan
| | - Hiroyuki Wakaguri
- Graduate School of Frontier Sciences, the University of Tokyo, 4-6-1 Shirokanedai, Minatoku, Tokyo 108-8639, Japan
| | - Akinori Kanai
- Graduate School of Frontier Sciences, the University of Tokyo, 4-6-1 Shirokanedai, Minatoku, Tokyo 108-8639, Japan
| | - Joe Chiba
- Faculty of Industrial Science and Technology, Tokyo University of Science, 2641 Yamazaki, Noda-shi, Chiba 278-8510, Japan
| | - Toshihisa Takagi
- Graduate School of Frontier Sciences, the University of Tokyo, 4-6-1 Shirokanedai, Minatoku, Tokyo 108-8639, Japan
| | - Junko Mizushima-Sugano
- Graduate School of Frontier Sciences, the University of Tokyo, 4-6-1 Shirokanedai, Minatoku, Tokyo 108-8639, Japan
- Laboratory of Viral Infection II, Kitasato Institute for Life Sciences, Kitasato University, 5-9-1 Sirokane Minato-ku, Tokyo 108-8641, Japan
| | - Shin-ichi Hashimoto
- School of Medicine, the University of Tokyo, 7-3-1 Hongo, Bunkyoku, Tokyo 113-0033, Japan
| | - Kenta Nakai
- Human Genome Center, The Institute of Medical Science, the University of Tokyo, 4-6-1 Shirokanedai, Minatoku, Tokyo 108-8639, Japan
| | - Sumio Sugano
- Graduate School of Frontier Sciences, the University of Tokyo, 4-6-1 Shirokanedai, Minatoku, Tokyo 108-8639, Japan
| |
Collapse
|
60
|
Ponjavic J, Ponting CP, Lunter G. Functionality or transcriptional noise? Evidence for selection within long noncoding RNAs. Genome Res 2007; 17:556-65. [PMID: 17387145 PMCID: PMC1855172 DOI: 10.1101/gr.6036807] [Citation(s) in RCA: 529] [Impact Index Per Article: 31.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
Long transcripts that do not encode protein have only rarely been the subject of experimental scrutiny. Presumably, this is owing to the current lack of evidence of their functionality, thereby leaving an impression that, instead, they represent "transcriptional noise." Here, we describe an analysis of 3122 long and full-length, noncoding RNAs ("macroRNAs") from the mouse, and compare their sequences and their promoters with orthologous sequence from human and from rat. We considered three independent signatures of purifying selection related to substitutions, sequence insertions and deletions, and splicing. We find that the evolution of the set of noncoding RNAs is not consistent with neutralist explanations. Rather, our results indicate that purifying selection has acted on the macroRNAs' promoters, primary sequence, and consensus splice site motifs. Promoters have experienced the greatest elimination of nucleotide substitutions, insertions, and deletions. The proportion of conserved sequence (4.1%-5.5%) in these macroRNAs is comparable to the density of exons within protein-coding transcripts (5.2%). These macroRNAs, taken together, thus possess the imprint of purifying selection, thereby indicating their functionality. Our findings should now provide an incentive for the experimental investigation of these macroRNAs' functions.
Collapse
Affiliation(s)
- Jasmina Ponjavic
- MRC Functional Genetics Unit, Department of Physiology, Anatomy and Genetics, University of Oxford, Oxford OX1 3QX, United Kingdom
| | - Chris P. Ponting
- MRC Functional Genetics Unit, Department of Physiology, Anatomy and Genetics, University of Oxford, Oxford OX1 3QX, United Kingdom
- Corresponding authors.E-mail ; fax 44-1865-282651.E-mail ; fax 44-1865-282651
| | - Gerton Lunter
- MRC Functional Genetics Unit, Department of Physiology, Anatomy and Genetics, University of Oxford, Oxford OX1 3QX, United Kingdom
- Corresponding authors.E-mail ; fax 44-1865-282651.E-mail ; fax 44-1865-282651
| |
Collapse
|
61
|
Zhang Z, Pang AWC, Gerstein M. Comparative analysis of genome tiling array data reveals many novel primate-specific functional RNAs in human. BMC Evol Biol 2007; 7 Suppl 1:S14. [PMID: 17288572 PMCID: PMC1796608 DOI: 10.1186/1471-2148-7-s1-s14] [Citation(s) in RCA: 16] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022] Open
Abstract
Background Widespread transcription activities in the human genome were recently observed in high-resolution tiling array experiments, which revealed many novel transcripts that are outside of the boundaries of known protein or RNA genes. Termed as "TARs" (Transcriptionally Active Regions), these novel transcribed regions represent "dark matter" in the genome, and their origin and functionality need to be explained. Many of these transcripts are thought to code for novel proteins or non-protein-coding RNAs. We have applied an integrated bioinformatics approach to investigate the properties of these TARs, including cross-species conservation, and the ability to form stable secondary structures. The goal of this study is to identify a list of potential candidate sequences that are likely to code for functional non-protein-coding RNAs. We are particularly interested in the discovery of those functional RNA candidates that are primate-specific, i.e. those that do not have homologs in the mouse or dog genomes but in rhesus. Results Using sequence conservation and the probability of forming stable secondary structures, we have identified ~300 possible candidates for primate-specific noncoding RNAs. We are currently in the process of sequencing the orthologous regions of these candidate sequences in several other primate species. We will then be able to apply a "phylogenetic shadowing" approach to analyze the functionality of these ncRNA candidates. Conclusion The existence of potential primate-specific functional transcripts has demonstrated the limitation of previous genome comparison studies, which put too much emphasis on conservation between human and rodents. It also argues for the necessity of sequencing additional primate species to gain a better and more comprehensive understanding of the human genome.
Collapse
Affiliation(s)
- Zhaolei Zhang
- Banting & Best Department of Medical Research, Donnelly CCBR, University of Toronto, Toronto, ON M5S 3E1, Canada.
| | | | | |
Collapse
|
62
|
Prasanth KV, Spector DL. Eukaryotic regulatory RNAs: an answer to the 'genome complexity' conundrum. Genes Dev 2007; 21:11-42. [PMID: 17210785 DOI: 10.1101/gad.1484207] [Citation(s) in RCA: 301] [Impact Index Per Article: 17.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
A large portion of the eukaryotic genome is transcribed as noncoding RNAs (ncRNAs). While once thought of primarily as "junk," recent studies indicate that a large number of these RNAs play central roles in regulating gene expression at multiple levels. The increasing diversity of ncRNAs identified in the eukaryotic genome suggests a critical nexus between the regulatory potential of ncRNAs and the complexity of genome organization. We provide an overview of recent advances in the identification and function of eukaryotic ncRNAs and the roles played by these RNAs in chromatin organization, gene expression, and disease etiology.
Collapse
|
63
|
Geng X, Lavado A, Lagutin OV, Liu W, Oliver G. Expression of Six3 Opposite Strand (Six3OS) during mouse embryonic development. Gene Expr Patterns 2007; 7:252-7. [PMID: 17084678 PMCID: PMC1986792 DOI: 10.1016/j.modgep.2006.09.007] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2006] [Revised: 09/18/2006] [Accepted: 09/19/2006] [Indexed: 10/24/2022]
Abstract
Recently, sequence analyses have identified a large number of opposite strand transcripts in the vertebrate genome. Although the transcripts appear to be spliced and polyadenylated, many of them are predicted to represent noncoding RNAs. High levels of noncoding transcripts of the Six3 Opposite Strand (Six3OS) were recently identified in the embryonic and postnatal retina of the mouse. In this study, we expanded those initial expression analyses, elucidated in detail the developmental expression profile of mouse Six3OS in the brain and visual system, and compared it with that of Six3. Our results show that Six3OS expression overlaps extensively with that of Six3 and is not altered in Six3-null embryos.
Collapse
Affiliation(s)
- Xin Geng
- Department of Genetics and Tumor Cell Biology, St. Jude Children's Research Hospital, Memphis, TN, USA
| | | | | | | | | |
Collapse
|
64
|
Numata K, Okada Y, Saito R, Kiyosawa H, Kanai A, Tomita M. Comparative analysis of cis-encoded antisense RNAs in eukaryotes. Gene 2006; 392:134-41. [PMID: 17250976 DOI: 10.1016/j.gene.2006.12.005] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2006] [Revised: 11/17/2006] [Accepted: 12/06/2006] [Indexed: 10/23/2022]
Abstract
Recent large-scale transcriptomic analyses have identified numerous endogenously encoded cis-antisense RNAs that are thought to play important roles in diverse cellular processes although comprehensive comparative studies among multiple species have yet to be performed. To investigate conserved genomic features across various species that may be related to sense-antisense regulation, we performed comparative analysis of approximately 1000-2000 cis-encoded antisense RNA pairs from five model eukaryotes (Homo sapiens, Mus musculus, Drosophila melanogaster, Arabidopsis thaliana, and Oryza sativa). Analysis of overlapping patterns relative to the exon-intron structure revealed that the number of pairs sharing the 3' part of the transcripts was larger than that of the 5'-sharing pairs except in rice. Moreover, most of the well-conserved sense-antisense pairs between human and mouse exhibited 3'-overlaps, suggesting that regulatory mechanisms involving these regions may be important in sense-antisense transcription. Functional classification using Gene Ontology revealed that genes related to catalytic activity, nucleotide binding, DNA metabolism, and mitochondria were preferentially distributed within the set of exon-overlapping sense-antisense genes compared to the non-exon-overlapping group in animals. Despite the numerous sense-antisense pairs identified in human and mouse individually, the number of conserved pairs was extremely small (6.6% of the entire set). Whereas both genes of most of the conserved sense-antisense pairs had protein-coding potential, nearly half of the non-conserved pairs included a non-coding RNA, suggesting that non-coding sense-antisense RNAs may function in species-specific regulatory pathways.
Collapse
Affiliation(s)
- Koji Numata
- Graduate School of Media and Governance, Bioinformatics Program, Keio University, Fujisawa, 252-8520, Japan
| | | | | | | | | | | |
Collapse
|
65
|
Navarro P, Page DR, Avner P, Rougeulle C. Tsix-mediated epigenetic switch of a CTCF-flanked region of the Xist promoter determines the Xist transcription program. Genes Dev 2006; 20:2787-92. [PMID: 17043308 PMCID: PMC1619945 DOI: 10.1101/gad.389006] [Citation(s) in RCA: 107] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
Initiation of X inactivation depends on the coordinated expression of the sense/antisense pair Xist/Tsix. We show here that a precisely defined Xist promoter region flanked by CTCF is maintained by Tsix in a heterochromatic-like state in undifferentiated embryonic stem (ES) cells and shifts to a pseudoeuchromatic structure upon Tsix truncation. We further demonstrate that the epigenetic state of the Xist 5' region prior to differentiation predicts the efficiency of transcriptional machinery recruitment to the Xist promoter during differentiation. Our results provide mechanistic insights into the Tsix-mediated epigenetic regulation of Xist resulting in Xist promoter activation and initiation of X inactivation in differentiating ES cells.
Collapse
Affiliation(s)
- Pablo Navarro
- Unité de Génétique Moléculaire Murine, Institut Pasteur 75724, Paris Cedex 15, France
| | | | | | | |
Collapse
|
66
|
Abstract
For a long time, molecular evolutionary biologists have been focused on DNA and proteins, whereas RNA has lived in the shadow of its famous chemical cousins as a mere intermediary. Although this perspective has begun to change since genome-wide transcriptional profiling was successfully extended to evolutionary biology, it still echoes in evolutionary literature. In this mini-review, new developments of RNA biochemistry and transcriptomics are brought to the attention of evolutionary biologists. In particular, the unexpected abundance and functional significance of noncoding RNAs is briefly reviewed. Noncoding RNAs control a remarkable range of biological pathways and processes, all with obvious fitness consequences, such as initiation of translation, mRNA abundance, transposon jumping, chromosome architecture, stem cell maintenance, development of brain and muscles, insulin secretion, cancerogenesis and plant resistance to viral infections.
Collapse
Affiliation(s)
- P Michalak
- Department of Biology, The University of Texas at Arlington, Arlington, TX 76010, USA.
| |
Collapse
|
67
|
Nordström KJV, Mirza MAI, Larsson TP, Gloriam DEI, Fredriksson R, Schiöth HB. Comprehensive comparisons of the current human, mouse, and rat RefSeq, Ensembl, EST, and FANTOM3 datasets: Identification of new human genes with specific tissue expression profile. Biochem Biophys Res Commun 2006; 348:1063-74. [PMID: 16904064 DOI: 10.1016/j.bbrc.2006.07.153] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2006] [Accepted: 07/25/2006] [Indexed: 10/24/2022]
Abstract
Our understanding of functional genetic elements in the genomes is continuously growing and new entries are entered in various databases on a regular basis. We have here merged the genetic elements in RefSeq, Ensembl, FANTOM3, HINV, and NCBI:s ESTdb using the genome assemblies in order to achieve a comprehensive picture of the current status of the identity and gene number in human, mouse, and rat. The number of human protein coding genes has not increased (25,043) while the increased sequencing of mouse transcripts has provided the considerably higher number of protein coding genes (31,578) in mouse. The results indicate large discrepancies between the datasets, as considerable numbers of unique transcripts can be found in each dataset. Despite the high number of ncRNA (38,129 in mouse) there are also almost 20,000 EST clusters in both mouse and humans with more than one EST that do not overlap any transcript suggesting that several new genetic elements are still to be found. We also demonstrated presence of new genes by identifying new human ones that have specific tissue profiles, using RT-PCR on rat tissues.
Collapse
Affiliation(s)
- Karl J V Nordström
- Department of Neuroscience, Uppsala University, BMC Box 593, 751 24 Uppsala, Sweden
| | | | | | | | | | | |
Collapse
|
68
|
Hamada M, Tsuda K, Kudo T, Kin T, Asai K. Mining frequent stem patterns from unaligned RNA sequences. Bioinformatics 2006; 22:2480-7. [PMID: 16908501 DOI: 10.1093/bioinformatics/btl431] [Citation(s) in RCA: 38] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION In detection of non-coding RNAs, it is often necessary to identify the secondary structure motifs from a set of putative RNA sequences. Most of the existing algorithms aim to provide the best motif or few good motifs, but biologists often need to inspect all the possible motifs thoroughly. RESULTS Our method RNAmine employs a graph theoretic representation of RNA sequences and detects all the possible motifs exhaustively using a graph mining algorithm. The motif detection problem boils down to finding frequently appearing patterns in a set of directed and labeled graphs. In the tasks of common secondary structure prediction and local motif detection from long sequences, our method performed favorably both in accuracy and in efficiency with the state-of-the-art methods such as CMFinder. AVAILABILITY The software is available upon request.
Collapse
Affiliation(s)
- Michiaki Hamada
- Computational Biology Research Center, National Institute of Advanced Industrial Science and Technology (AIST) 2-43 Aomi, Koto-ku, Tokyo, Japan.
| | | | | | | | | |
Collapse
|
69
|
Lin R, Maeda S, Liu C, Karin M, Edgington TS. A large noncoding RNA is a marker for murine hepatocellular carcinomas and a spectrum of human carcinomas. Oncogene 2006; 26:851-8. [PMID: 16878148 DOI: 10.1038/sj.onc.1209846] [Citation(s) in RCA: 432] [Impact Index Per Article: 24.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
Tumor markers can facilitate understanding molecular cell biology of neoplasia and provide potential targets for the diagnosis and insight for intervention. We here identify a novel murine gene, hepcarcin (hcn), encoding a 7-kb mRNA-like transcript. The gene appears to be the murine ortholog of the human alpha gene, that is, MALAT-1. The gene and homologs lack credible open reading frames, consistent with a highly conserved large noncoding RNA (ncRNA). In all nodules of procarcinogen-induced murine hepatocellular carcinomas (HCCs) and human HCCs, expression was markedly elevated compared to the uninvolved liver. Quantitative analyses indicated a 6-7-fold increased RNA level in HCCs versus uninvolved liver, advancing this as a molecule of interest. This ncRNA was overexpressed in all five non-hepatic human carcinomas analysed, consistent with a potential marker for neoplastic cells and potential participant in the molecular cell biology of neoplasia.
Collapse
Affiliation(s)
- R Lin
- Department of Immunology, The Scripps Research Institute, La Jolla, CA 92037, USA.
| | | | | | | | | |
Collapse
|
70
|
Abstract
Noncoding RNAs (ncRNA) are ubiquitous regulatory factors affecting gene expression in all organisms. In eukaryotes ncRNAs have been shown to operate on virtually every level of transmission of genetic information. They are implicated in processes that are crucial for the correct growth and development of multicellular organisms. Changes in their expression are often related to stress conditions or associated with diseases or developmental disorders.
Collapse
Affiliation(s)
- Maciej Szymanski
- Institute of Bioorganic Chemistry of the Polish Academy of Sciences, Poznan, Poland
| | | |
Collapse
|
71
|
Angeloni D, ter Elst A, Wei MH, van der Veen AY, Braga EA, Klimov EA, Timmer T, Korobeinikova L, Lerman MI, Buys CHCM. Analysis of a new homozygous deletion in the tumor suppressor region at 3p12.3 reveals two novel intronic noncoding RNA genes. Genes Chromosomes Cancer 2006; 45:676-91. [PMID: 16607615 DOI: 10.1002/gcc.20332] [Citation(s) in RCA: 31] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
Homozygous deletions or loss of heterozygosity (LOH) at human chromosome band 3p12 are consistent features of lung and other malignancies, suggesting the presence of a tumor suppressor gene(s) (TSG) at this location. Only one gene has been cloned thus far from the overlapping region deleted in lung and breast cancer cell lines U2020, NCI H2198, and HCC38. It is DUTT1 (Deleted in U Twenty Twenty), also known as ROBO1, FLJ21882, and SAX3, according to HUGO. DUTT1, the human ortholog of the fly gene ROBO, has homology with NCAM proteins. Extensive analyses of DUTT1 in lung cancer have not revealed any mutations, suggesting that another gene(s) at this location could be of importance in lung cancer initiation and progression. Here, we report the discovery of a new, small, homozygous deletion in the small cell lung cancer (SCLC) cell line GLC20, nested in the overlapping, critical region. The deletion was delineated using several polymorphic markers and three overlapping P1 phage clones. Fiber-FISH experiments revealed the deletion was approximately 130 kb. Comparative genomic sequence analysis uncovered short sequence elements highly conserved among mammalian genomes and the chicken genome. The discovery of two EST clusters within the deleted region led to the isolation of two noncoding RNA (ncRNA) genes. These were subsequently found differentially expressed in various tumors when compared to their normal tissues. The ncRNA and other highly conserved sequence elements in the deleted region may represent miRNA targets of importance in cancer initiation or progression.
Collapse
Affiliation(s)
- Debora Angeloni
- Laboratory of Immunobiology, Center for Cancer Research, National Cancer Institute at Frederick, Frederick, MD, USA.
| | | | | | | | | | | | | | | | | | | |
Collapse
|
72
|
Bickel KS, Morris DR. Silencing the transcriptome's dark matter: mechanisms for suppressing translation of intergenic transcripts. Mol Cell 2006; 22:309-16. [PMID: 16678103 DOI: 10.1016/j.molcel.2006.04.010] [Citation(s) in RCA: 16] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023]
Abstract
Large portions of the genomes of higher eukaryotes are transcribed into RNA molecules that are never destined for translation into proteins. Although some of these transcripts have clearly defined biological roles other than protein coding, most arise from genomic regions devoid of functional genes and many are antisense to regions containing annotated genes. A variety of mechanisms exist to prevent adventitious production of proteins from these transcripts, ranging from degradation within the nucleus to translational silencing in the cytosol.
Collapse
Affiliation(s)
- Kellie S Bickel
- Department of Biochemistry, University of Washington, Box 357350, Seattle, 98133, USA
| | | |
Collapse
|
73
|
Costain WJ, Rasquinha I, Graber T, Luebbert C, Preston E, Slinn J, Xie X, MacManus JP. Cerebral ischemia induces neuronal expression of novel VL30 mouse retrotransposons bound to polyribosomes. Brain Res 2006; 1094:24-37. [PMID: 16730676 DOI: 10.1016/j.brainres.2006.03.120] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2005] [Revised: 03/14/2006] [Accepted: 03/23/2006] [Indexed: 01/27/2023]
Abstract
Mammalian genomes are burdened with a large heterogeneous group of endogenous replication defective retroviruses (retrotransposons). Previously, we identified a transcript resembling a virus-like 30S (VL30) retrotransposon increasing in mouse brain following transient cerebral ischemia. Paradoxically, this non-coding RNA was found bound to polyribosomes. Further analysis revealed that multiple retrotransposon species (BVL-1-like and mVL30-1-like) were bound to polyribosomes and induced by ischemia. These VL30 transcripts remained associated with polyribosomes in the presence of 0.5 M KCl, indicating that VL30 mRNA was tightly associated with ribosomal subunits. Furthermore, the profile of BVL-1 distribution on polyribosomal profiles was distinct from those of translated and translationally repressed mRNA. Consistent with expectations, 5.0 kb VL30 transcripts were detected in ischemic brain with a temporal pattern of expression that was distinct from c-fos. Expression of VL30 was localized in neurons using a combination of in situ hybridization and immunocytochemistry. 3'-RACE-PCR experiments yielded two unique sequences (VL30x-1 and VL30x-2) that were homologous to known VL30 genes. Phylogenetic analysis of VL30 promoter sequence (U3 region) resulted in the identification of two large VL30 subgroups. VL30x-1 and VL30x-2 were closely related and classified in a group that was distinct from the well-characterized VL30 genes BVL-1 and mVL30-1. The promoter regions of VL30x-1 and VL30x-2 did not possess the consensus sequences for either hypoxia or anoxia response elements, suggesting an alternative mechanism for induction. This is the first report that demonstrates ischemia-induced, neuronal expression of unique VL30 retrotransposons in mouse brain.
Collapse
Affiliation(s)
- Willard J Costain
- Institute for Biological Sciences M54, National Research Council, Montreal Road Laboratories, Ottawa, ON, Canada K1A 0R6.
| | | | | | | | | | | | | | | |
Collapse
|
74
|
Furuno M, Pang KC, Ninomiya N, Fukuda S, Frith MC, Bult C, Kai C, Kawai J, Carninci P, Hayashizaki Y, Mattick JS, Suzuki H. Clusters of internally primed transcripts reveal novel long noncoding RNAs. PLoS Genet 2006; 2:e37. [PMID: 16683026 PMCID: PMC1449886 DOI: 10.1371/journal.pgen.0020037] [Citation(s) in RCA: 133] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2005] [Accepted: 02/01/2006] [Indexed: 02/07/2023] Open
Abstract
Non-protein-coding RNAs (ncRNAs) are increasingly being recognized as having important regulatory roles. Although much recent attention has focused on tiny 22- to 25-nucleotide microRNAs, several functional ncRNAs are orders of magnitude larger in size. Examples of such macro ncRNAs include Xist and Air, which in mouse are 18 and 108 kilobases (Kb), respectively. We surveyed the 102,801 FANTOM3 mouse cDNA clones and found that Air and Xist were present not as single, full-length transcripts but as a cluster of multiple, shorter cDNAs, which were unspliced, had little coding potential, and were most likely primed from internal adenine-rich regions within longer parental transcripts. We therefore conducted a genome-wide search for regional clusters of such cDNAs to find novel macro ncRNA candidates. Sixty-six regions were identified, each of which mapped outside known protein-coding loci and which had a mean length of 92 Kb. We detected several known long ncRNAs within these regions, supporting the basic rationale of our approach. In silico analysis showed that many regions had evidence of imprinting and/or antisense transcription. These regions were significantly associated with microRNAs and transcripts from the central nervous system. We selected eight novel regions for experimental validation by northern blot and RT-PCR and found that the majority represent previously unrecognized noncoding transcripts that are at least 10 Kb in size and predominantly localized in the nucleus. Taken together, the data not only identify multiple new ncRNAs but also suggest the existence of many more macro ncRNAs like Xist and Air. The human genome has been sequenced, and, intriguingly, less than 2% specifies the information for the basic protein building blocks of our bodies. So, what does the other 98% do? It now appears that the mammalian genome also specifies the instructions for many previously undiscovered “non protein-coding RNA” (ncRNA) genes. However, what these ncRNAs do is largely unknown. In recent years, strategies have been designed that have successfully identified hundreds of short ncRNAs—termed microRNAs—many of which have since been shown to act as genetic regulators. Also known to be functionally important are a handful of ncRNAs orders of magnitude larger in size than microRNAs. The availability of complete genome and comprehensive transcript sequences allows for the systematic discovery of more large ncRNAs. The authors developed a computational strategy to screen the mouse genome and identify large ncRNAs. They detected existing large ncRNAs, thus validating their approach, but, more importantly, discovered more than 60 other candidates, some of which were subsequently confirmed experimentally. This work opens the door to a virtually unexplored world of large ncRNAs and beckons future experimental work to define the cellular functions of these molecules.
Collapse
Affiliation(s)
- Masaaki Furuno
- Mouse Genome Informatics Consortium, The Jackson Laboratory, Bar Harbor, Maine, United States of America
| | - Ken C Pang
- Australian Research Council Special Research Centre for Functional and Applied Genomics, Institute for Molecular Bioscience, University of Queensland, Brisbane, Australia
- T Cell laboratory, Ludwig Institute for Cancer Research, Austin Health, Heidelberg, Victoria, Australia
| | - Noriko Ninomiya
- Genome Exploration Research Group (Genome Network Project Core Group), RIKEN Genomic Sciences Center, RIKEN Yokohama Institute, Yokohama, Japan
| | - Shiro Fukuda
- Genome Exploration Research Group (Genome Network Project Core Group), RIKEN Genomic Sciences Center, RIKEN Yokohama Institute, Yokohama, Japan
| | - Martin C Frith
- Australian Research Council Special Research Centre for Functional and Applied Genomics, Institute for Molecular Bioscience, University of Queensland, Brisbane, Australia
- Genome Exploration Research Group (Genome Network Project Core Group), RIKEN Genomic Sciences Center, RIKEN Yokohama Institute, Yokohama, Japan
| | - Carol Bult
- Mouse Genome Informatics Consortium, The Jackson Laboratory, Bar Harbor, Maine, United States of America
| | - Chikatoshi Kai
- Genome Exploration Research Group (Genome Network Project Core Group), RIKEN Genomic Sciences Center, RIKEN Yokohama Institute, Yokohama, Japan
| | - Jun Kawai
- Genome Exploration Research Group (Genome Network Project Core Group), RIKEN Genomic Sciences Center, RIKEN Yokohama Institute, Yokohama, Japan
- Genome Science Laboratory, Discovery Research Institute, RIKEN Wako Institute, Wako, Japan
| | - Piero Carninci
- Genome Exploration Research Group (Genome Network Project Core Group), RIKEN Genomic Sciences Center, RIKEN Yokohama Institute, Yokohama, Japan
- Genome Science Laboratory, Discovery Research Institute, RIKEN Wako Institute, Wako, Japan
| | - Yoshihide Hayashizaki
- Genome Exploration Research Group (Genome Network Project Core Group), RIKEN Genomic Sciences Center, RIKEN Yokohama Institute, Yokohama, Japan
- Genome Science Laboratory, Discovery Research Institute, RIKEN Wako Institute, Wako, Japan
| | - John S Mattick
- Australian Research Council Special Research Centre for Functional and Applied Genomics, Institute for Molecular Bioscience, University of Queensland, Brisbane, Australia
| | - Harukazu Suzuki
- Genome Exploration Research Group (Genome Network Project Core Group), RIKEN Genomic Sciences Center, RIKEN Yokohama Institute, Yokohama, Japan
- * To whom correspondence should be addressed. E-mail:
| |
Collapse
|
75
|
Inagaki S, Numata K, Kondo T, Tomita M, Yasuda K, Kanai A, Kageyama Y. Identification and expression analysis of putative mRNA-like non-coding RNA in Drosophila. Genes Cells 2006; 10:1163-73. [PMID: 16324153 DOI: 10.1111/j.1365-2443.2005.00910.x] [Citation(s) in RCA: 70] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
Abstract
One of the most surprising results to emerge from mammalian cDNA sequencing projects is that thousands of mRNA-like non-coding RNAs (ncRNAs) are expressed and constitute at least 10% of poly(A)(+) RNAs. In most cases, however, the functions of these RNA molecules remain unclear. To clarify the biological significance of mRNA-like ncRNAs, we computationally screened 11,691 Drosophila melanogaster full-length cDNAs. After eliminating presumable protein-coding transcripts, 136 were identified as strong candidates for mRNA-like ncRNAs. Although most of these putative ncRNAs are found throughout the Drosophila genus, predicted amino acid sequences are not conserved even in related species, suggesting that these transcripts are actually non-coding RNAs. In situ hybridization analyses revealed that 35 of the transcripts are expressed during embryogenesis, of which 27 were detected only in specific tissues including the tracheal system, midgut primordial cells, visceral mesoderm, germ cells and the central and peripheral nervous system. These highly regulated expression patterns suggest that many mRNA-like ncRNAs play important roles in multiple steps of organogenesis and cell differentiation in Drosophila. This is the first report that the majority of mRNA-like ncRNAs in a model organism are expressed in specific tissues and cell types.
Collapse
MESH Headings
- Amino Acid Sequence
- Animals
- Base Sequence
- Cell Differentiation/genetics
- Conserved Sequence
- DNA, Complementary/analysis
- DNA, Complementary/genetics
- Drosophila/embryology
- Drosophila/genetics
- Embryonic Development/genetics
- Evolution, Molecular
- Gene Expression Regulation, Developmental
- Models, Genetic
- Open Reading Frames/genetics
- Organogenesis/genetics
- RNA, Messenger/chemistry
- RNA, Messenger/genetics
- RNA, Untranslated/chemistry
- RNA, Untranslated/genetics
- Species Specificity
- Transcription, Genetic
Collapse
Affiliation(s)
- Sachi Inagaki
- Graduate School of Biological Sciences, Nara Institute of Science and Technology, Takayama, Ikoma, Japan
| | | | | | | | | | | | | |
Collapse
|
76
|
Abstract
The term non-coding RNA (ncRNA) is commonly employed for RNA that does not encode a protein, but this does not mean that such RNAs do not contain information nor have function. Although it has been generally assumed that most genetic information is transacted by proteins, recent evidence suggests that the majority of the genomes of mammals and other complex organisms is in fact transcribed into ncRNAs, many of which are alternatively spliced and/or processed into smaller products. These ncRNAs include microRNAs and snoRNAs (many if not most of which remain to be identified), as well as likely other classes of yet-to-be-discovered small regulatory RNAs, and tens of thousands of longer transcripts (including complex patterns of interlacing and overlapping sense and antisense transcripts), most of whose functions are unknown. These RNAs (including those derived from introns) appear to comprise a hidden layer of internal signals that control various levels of gene expression in physiology and development, including chromatin architecture/epigenetic memory, transcription, RNA splicing, editing, translation and turnover. RNA regulatory networks may determine most of our complex characteristics, play a significant role in disease and constitute an unexplored world of genetic variation both within and between species.
Collapse
Affiliation(s)
- John S Mattick
- Australian Research Council Centre for Functional and Applied Genomics, Institute for Molecular Bioscience, University of Queensland, St Lucia, QLD 4072, Australia.
| | | |
Collapse
|
77
|
Ginger MR, Shore AN, Contreras A, Rijnkels M, Miller J, Gonzalez-Rimbau MF, Rosen JM. A noncoding RNA is a potential marker of cell fate during mammary gland development. Proc Natl Acad Sci U S A 2006; 103:5781-6. [PMID: 16574773 PMCID: PMC1420634 DOI: 10.1073/pnas.0600745103] [Citation(s) in RCA: 146] [Impact Index Per Article: 8.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2005] [Indexed: 12/26/2022] Open
Abstract
PINC is a large, alternatively spliced, developmentally regulated, noncoding RNA expressed in the regressed terminal ductal lobular unit-like structures of the parous mammary gland. Previous studies have shown that this population of cells possesses not only progenitor-like qualities (the ability to proliferate and repopulate a mammary gland) and the ability to survive developmentally programmed cell death but also the inhibition of carcinogen-induced proliferation. Here we report that PINC expression is temporally and spatially regulated in response to developmental stimuli in vivo and that PINC RNA is localized to distinct foci in either the nucleus or the cytoplasm in a cell-cycle-specific manner. Loss-of-function experiments suggest that PINC performs dual roles in cell survival and regulation of cell-cycle progression, suggesting that PINC may contribute to the developmentally mediated changes previously observed in the terminal ductal lobular unit-like structures of the parous gland. This is one of the first reports describing the functional properties of a large, developmentally regulated, mammalian, noncoding RNA.
Collapse
Affiliation(s)
| | - Amy N. Shore
- Program in Developmental Biology, Baylor College of Medicine, 1 Baylor Plaza, Houston, TX 77030; and
| | | | - Monique Rijnkels
- U.S. Department of Agriculture/Agricultural Research Services Children’s Nutrition Research Center, Department of Pediatrics, Baylor College of Medicine, 1100 Bates Street, Houston, TX 77030
| | | | | | | |
Collapse
|
78
|
Hirsch J, Lefort V, Vankersschaver M, Boualem A, Lucas A, Thermes C, d'Aubenton-Carafa Y, Crespi M. Characterization of 43 non-protein-coding mRNA genes in Arabidopsis, including the MIR162a-derived transcripts. PLANT PHYSIOLOGY 2006; 140:1192-204. [PMID: 16500993 PMCID: PMC1435803 DOI: 10.1104/pp.105.073817] [Citation(s) in RCA: 57] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/06/2023]
Abstract
Messenger RNAs that do not contain a long open reading frame (ORF) or non-protein-coding RNAs (npcRNAs) are an emerging novel class of transcripts. Their functions may involve the RNA molecule itself and/or short ORF-encoded peptides. npcRNA genes are difficult to identify using standard gene prediction programs that rely on the presence of relatively long ORFs. Here, we used detailed bioinformatic analyses of expressed sequence tag/cDNA databases to detect a restricted set of npcRNAs in the Arabidopsis (Arabidopsis thaliana) genome and further characterized these transcripts using a combination of bioinformatic and molecular approaches. Compositional analyses revealed strong nucleotide strand asymmetries in the npcRNAs, as well as a biased GC content, suggesting the existence of functional constraints on these RNAs. Thirteen of these transcripts display tissue-specific expression patterns, and three are regulated in conditions affecting root architecture. The npcRNA 78 gene contains the miR162 sequence in an alternative intron and corresponds to the MIR162a locus. Although DICER-LIKE 1 (DCL1) mRNA is known to be regulated by miR162-guided cleavage, its level does not change in a mir162a mutant. Alternative splicing of npcRNA 78 leads to several transcript isoforms, which all accumulate in a dcl1 mutant. This suggests that npcRNA 78 is a genuine substrate of DCL1 and that splicing of this microRNA primary transcript and miR162 processing are competitive nuclear events. Our results provide new insights into Arabidopsis npcRNA biology and the potential roles of these genes.
Collapse
Affiliation(s)
- Judith Hirsch
- Institut des Sciences du Végétal, Centre National de la Recherche Scientifique, 91198 Gif sur Yvette, France
| | | | | | | | | | | | | | | |
Collapse
|
79
|
Abstract
Genome sequence analysis of RNAs presents special challenges to computational biology, because conserved RNA secondary structure plays a large part in RNA analysis. Algorithms well suited for RNA secondary structure and sequence analysis have been borrowed from computational linguistics. These "stochastic context-free grammar" (SCFG) algorithms have enabled the development of new RNA gene-finding and RNA homology search software. The aim of this paper is to provide an accessible introduction to the strengths and weaknesses of SCFG methods and to describe the state of the art in one particular kind of application: SCFG-based RNA similarity searching. The INFERNAL and RSEARCH programs are capable of identifying distant RNA homologs in a database search by looking for both sequence and secondary structure conservation.
Collapse
Affiliation(s)
- S R Eddy
- Howard Hughes Medical Institute and Department of Genetics, Washington University School of Medicine, Saint Louis, Missouri 63108, USA
| |
Collapse
|
80
|
Ravasi T, Suzuki H, Pang KC, Katayama S, Furuno M, Okunishi R, Fukuda S, Ru K, Frith MC, Gongora MM, Grimmond SM, Hume DA, Hayashizaki Y, Mattick JS. Experimental validation of the regulated expression of large numbers of non-coding RNAs from the mouse genome. Genome Res 2005; 16:11-9. [PMID: 16344565 PMCID: PMC1356124 DOI: 10.1101/gr.4200206] [Citation(s) in RCA: 394] [Impact Index Per Article: 20.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023]
Abstract
Recent large-scale analyses of mainly full-length cDNA libraries generated from a variety of mouse tissues indicated that almost half of all representative cloned sequences did not contain an apparent protein-coding sequence, and were putatively derived from non-protein-coding RNA (ncRNA) genes. However, many of these clones were singletons and the majority were unspliced, raising the possibility that they may be derived from genomic DNA or unprocessed pre-mRNA contamination during library construction, or alternatively represent nonspecific "transcriptional noise." Here we show, using reverse transcriptase-dependent PCR, microarray, and Northern blot analyses, that many of these clones were derived from genuine transcripts of unknown function whose expression appears to be regulated. The ncRNA transcripts have larger exons and fewer introns than protein-coding transcripts. Analysis of the genomic landscape around these sequences indicates that some cDNA clones were produced not from terminal poly(A) tracts but internal priming sites within longer transcripts, only a minority of which is encompassed by known genes. A significant proportion of these transcripts exhibit tissue-specific expression patterns, as well as dynamic changes in their expression in macrophages following lipopolysaccharide stimulation. Taken together, the data provide strong support for the conclusion that ncRNAs are an important, regulated component of the mammalian transcriptome.
Collapse
Affiliation(s)
- Timothy Ravasi
- ARC Special Research Centre for Functional and Applied Genomics, Institute for Molecular Bioscience, University of Queensland, Brisbane QLD 4072, Australia
| | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
81
|
Lipovich L, King MC. Abundant novel transcriptional units and unconventional gene pairs on human chromosome 22. Genome Res 2005; 16:45-54. [PMID: 16344557 PMCID: PMC1356128 DOI: 10.1101/gr.3883606] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/02/2023]
Abstract
Novel transcriptional units (TUs) are EST-supported transcribed features not corresponding to known genes. Unconventional gene pairs (UGPs) are pairs of genes and/or TUs sharing exon-to-exon cis-antisense overlaps or putative bidirectional promoters. Computational TU and UGP discovery followed by manual curation was performed in the entire published 34.9-Mb human chromosome 22 euchromatic sequence. Novel TUs (n = 517) were as abundant as known genes (n = 492) and typically did not have nonprimate DNA and protein homologies. One hundred seventy-one (33%) of TUs, but only 13 (3%) of genes, both lacked nonprimate conservation and localized to gaps in the human-mouse BLASTZ alignment. Novel TUs were richer in exonic primate-specific interspersed repetitive elements (P = 0.001) and were more likely to rely on splice junctions provided by them, than were known genes: 19% of spliced TUs, versus 5% of spliced genes, had a splice site within a primate-specific repeat. Hence, novel TUs and known genes may represent different portions of the transcriptome. Two hundred nine (21%) of chromosome 22 transcripts participated in 77 cis-antisense and 42 promoter-sharing UGPs. Transcripts involved simultaneously in both UGP types were more common than was expected (P = 0.01). UGPs were nonrandomly distributed along the sequence: 89 (75%) clustered in distinct regions, the sum of which equaled 4.4 Mb (<13% of the chromosome). Eighty (67%) of the UGPs possessed significant locus structure differences between primates and rodents. Since some TUs may be functional noncoding transcripts and since the cis-regulatory potential of UGPs is well recognized, TUs and UGPs specific to the primate lineage may contribute to the genomic basis for primate-specific phenotypes.
Collapse
Affiliation(s)
- Leonard Lipovich
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195-7730, USA.
| | | |
Collapse
|
82
|
Arvestad L, Visa N, Lundeberg J, Wieslander L, Savolainen P. Expressed sequence tags from the midgut and an epithelial cell line of Chironomus tentans: annotation, bioinformatic classification of unknown transcripts and analysis of expression levels. INSECT MOLECULAR BIOLOGY 2005; 14:689-95. [PMID: 16313569 DOI: 10.1111/j.1365-2583.2005.00600.x] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/05/2023]
Abstract
Expressed sequence tags (ESTs) were generated from two Chironomus tentans cDNA libraries, constructed from an embryo epithelial cell line and from larva midgut tissue. 8584 5'-end ESTs were generated and assembled into 3110 tentative unique transcripts, providing the largest contribution of C. tentans sequences to public databases to date. Annotation using Blast gave 1975 (63.5%) transcripts with a significant match in the major gene/protein databases, 1170 with a best match to Anopheles gambiae and 480 to Drosophila melanogaster. 1091 transcripts (35.1%) had no match to any database. Studies of open reading frames suggest that at least 323 of these contain a coding sequence, indicating that a large proportion of the genes in C. tentans belong to previously unknown gene families.
Collapse
Affiliation(s)
- L Arvestad
- Stockholm Bioinformatics Center, Abanova University Center, Royal Institute of Technology (KTH), Stockholm, Sweden
| | | | | | | | | |
Collapse
|
83
|
Savolainen P, Fitzsimmons C, Arvestad L, Andersson L, Lundeberg J. ESTs from brain and testis of White Leghorn and red junglefowl: annotation, bioinformatic classification of unknown transcripts and analysis of expression levels. Cytogenet Genome Res 2005; 111:79-87. [PMID: 16093725 DOI: 10.1159/000085674] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2004] [Accepted: 11/30/2004] [Indexed: 11/19/2022] Open
Abstract
We report the generation, assembly and annotation of expressed sequence tags (ESTs) from four chicken cDNA libraries, constructed from brain and testis tissue dissected from red junglefowl and White Leghorn. 21,285 5'-end ESTs were generated and assembled into 2,813 contigs and 9,737 singletons, giving 12,549 tentative unique transcripts. The transcripts were annotated using BLAST by matching to known chicken genes or to putative homologues in other species using the major gene/protein databases. The results for these similarity searches are available on www.sbc.su.se/~arve/chicken. 4,129 (32.9%) of the transcripts remained without a significant match to gene/protein databases, a proportion of unmatched transcripts similar to earlier non-mammalian EST studies. To estimate how many of these transcripts may represent novel genes, they were studied for the presence of coding sequence. It was shown that most of the unique chicken transcripts do not contain coding parts of genes, but it was estimated that at least 400 of the transcripts contain coding sequence, indicating that 3.2% of avian genes belong to previously unknown gene families. Further BLAST search against dbEST left 1,649 (13.1%) of the transcripts unmatched to any library. The number of completely unmatched transcripts containing coding sequence was estimated at 180, giving a measure of the number of putative novel chicken genes identified in this study. 84.3% of the identified transcripts were found only in testis tissue, which has been poorly studied in earlier chicken EST studies. Large differences in expression levels were found between the brain and testis libraries for a large number of transcripts, and among the 525 most frequently represented transcripts, there were at least 20 transcripts with significant difference in expression levels between red junglefowl and White Leghorn.
Collapse
Affiliation(s)
- P Savolainen
- Department of Biotechnology, Royal Institute of Technology, Stockholm, Sweden.
| | | | | | | | | |
Collapse
|
84
|
Pang KC, Frith MC, Mattick JS. Rapid evolution of noncoding RNAs: lack of conservation does not mean lack of function. Trends Genet 2005; 22:1-5. [PMID: 16290135 DOI: 10.1016/j.tig.2005.10.003] [Citation(s) in RCA: 481] [Impact Index Per Article: 25.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2005] [Revised: 09/02/2005] [Accepted: 10/14/2005] [Indexed: 01/05/2023]
Abstract
The mammalian transcriptome contains many non-protein-coding RNAs (ncRNAs), but most of these are of unclear significance and lack strong sequence conservation, prompting suggestions that they might be non-functional. However, certain long functional ncRNAs such as Air and Xist are also poorly conserved. In this article, we systematically analyzed the conservation of several groups of functional ncRNAs, including miRNAs, snoRNAs and longer ncRNAs whose function has been either documented or confidently predicted. As expected, miRNAs and snoRNAs were highly conserved. By contrast, the longer functional non-micro, non-sno ncRNAs were much less conserved with many displaying rapid sequence evolution. Our findings suggest that longer ncRNAs are under the influence of different evolutionary constraints and that the lack of conservation displayed by the thousands of candidate ncRNAs does not necessarily signify an absence of function.
Collapse
Affiliation(s)
- Ken C Pang
- ARC Special Research Centre for Functional and Applied Genomics, Institute for Molecular Bioscience, University of Queensland, Brisbane QLD 4072, Australia
| | | | | |
Collapse
|
85
|
Szymanski M, Barciszewska MZ, Erdmann VA, Barciszewski J. A new frontier for molecular medicine: noncoding RNAs. Biochim Biophys Acta Rev Cancer 2005; 1756:65-75. [PMID: 16125325 DOI: 10.1016/j.bbcan.2005.07.005] [Citation(s) in RCA: 46] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2005] [Revised: 07/27/2005] [Accepted: 07/28/2005] [Indexed: 02/06/2023]
Abstract
It is now becoming evident that the variety of noncoding RNA (ncRNA) molecules play important roles in many cellular processes and they are not just mere intermediates in transfer of genetic information from DNA to proteins. Recent data, from the analyses of transcriptional activity of human genome, suggest that it may contain roughly equal numbers of protein- and RNA-encoding transcription units. Many of the ncRNAs described in humans as well as in other mammals have been linked, through specific chromosomal localization or expression patterns, with certain diseases including complex congenital syndromes, neurobehavioral and developmental disorders and cancer. These findings clearly indicate that an expression of genes of which end-products are RNA molecules is crucial for development, differentiation and normal functioning of the cells. The ncRNAs expression patterns can therefore be used as molecular markers for specific diagnostic methods.
Collapse
Affiliation(s)
- Maciej Szymanski
- Institute of Bioorganic Chemistry of the Polish Academy of Sciences, Noskowskiego 12, 61704 Poznan, Poland
| | | | | | | |
Collapse
|
86
|
Kawano M, Storz G, Rao BS, Rosner JL, Martin RG. Detection of low-level promoter activity within open reading frame sequences of Escherichia coli. Nucleic Acids Res 2005; 33:6268-76. [PMID: 16260475 PMCID: PMC1275588 DOI: 10.1093/nar/gki928] [Citation(s) in RCA: 36] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
The search for promoters has largely been confined to sequences upstream of open reading frames (ORFs) or stable RNA genes. Here we used a cloning approach to discover other potential promoters in Escherichia coli. Chromosomal fragments of approximately 160 bp were fused to a promoterless lacZ reporter gene on a multi-copy plasmid. Eight clones were deliberately selected for high activity and 105 clones were selected at random. All eight of the high-activity clones carried promoters that were located upstream of an ORF. Among the randomly-selected clones, 56 had significantly elevated activity. Of these, 7 had inserts which also mapped upstream of an ORF, while 49 mapped within or downstream of ORFs. Surprisingly, the eight promoters selected for high activity matched the canonical sigma70 -35 and -10 sequences no better than sequences from the randomly-selected clones. For six of the nine most active sequences with orientations opposite to that of the ORF, chromosomal expression was detected by RT-PCR, but defined transcripts were not detected by northern analysis. Our results indicate that the E.coli chromosome carries numerous -35 and -10 sequences with weak promoter activity but that most are not productively expressed because other features needed to enhance promoter activity and transcript stability are absent.
Collapse
Affiliation(s)
| | | | - B. Sridhar Rao
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of HealthBethesda, MD 20892-0560, USA
| | - Judah L. Rosner
- Laboratory of Molecular Biology, National Institute of Diabetes and Digestive and Kidney DiseasesBuilding 5, Room 333, Bethesda, MD 20892-0560, USA
| | - Robert G. Martin
- Laboratory of Molecular Biology, National Institute of Diabetes and Digestive and Kidney DiseasesBuilding 5, Room 333, Bethesda, MD 20892-0560, USA
- To whom correspondence should be addressed. Tel: +1 301 496 5466; Fax: +1 301 496 0201;
| |
Collapse
|
87
|
Brosius J. Echoes from the past--are we still in an RNP world? Cytogenet Genome Res 2005; 110:8-24. [PMID: 16093654 DOI: 10.1159/000084934] [Citation(s) in RCA: 37] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2004] [Accepted: 05/04/2004] [Indexed: 11/19/2022] Open
Abstract
Availability of the human genome sequence and those of other species is unmeasured in their value for a comprehensive understanding of the architecture, function and evolution of genomes and cells. Various mechanisms keep genomes in flux and generate intra- and interspecies variation. The conversion of RNA modules into DNA and their more or less random integration into chromosomes (retroposition) is in many lineages including our own the most pervasive and perhaps the most enigmatic. The proclivity of such events in extant multicellular eukaryotes, even in more recent evolutionary times, gives the impression that the transition period from the RNP (ribonucleoprotein) world to the emergence of modern cells, where DNA became the predominant carrier of genetic information, has lasted billions of years and is an endlessly drawn-out process rather than the punctuated event one might expect. Apart from the impact of such RNA-mediated processes as retroposition, the role of RNA in a wide variety of cellular functions has only recently become more widely appreciated.
Collapse
Affiliation(s)
- J Brosius
- Institute of Experimental Pathology, ZMBE, University of Munster, Munster, Germany.
| |
Collapse
|
88
|
Laserson U, Gan HH, Schlick T. Predicting candidate genomic sequences that correspond to synthetic functional RNA motifs. Nucleic Acids Res 2005; 33:6057-69. [PMID: 16254081 PMCID: PMC1270951 DOI: 10.1093/nar/gki911] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Riboswitches and RNA interference are important emerging mechanisms found in many organisms to control gene expression. To enhance our understanding of such RNA roles, finding small regulatory motifs in genomes presents a challenge on a wide scale. Many simple functional RNA motifs have been found by in vitro selection experiments, which produce synthetic target-binding aptamers as well as catalytic RNAs, including the hammerhead ribozyme. Motivated by the prediction of Piganeau and Schroeder [(2003) Chem. Biol., 10, 103–104] that synthetic RNAs may have natural counterparts, we develop and apply an efficient computational protocol for identifying aptamer-like motifs in genomes. We define motifs from the sequence and structural information of synthetic aptamers, search for sequences in genomes that will produce motif matches, and then evaluate the structural stability and statistical significance of the potential hits. Our application to aptamers for streptomycin, chloramphenicol, neomycin B and ATP identifies 37 candidate sequences (in coding and non-coding regions) that fold to the target aptamer structures in bacterial and archaeal genomes. Further energetic screening reveals that several candidates exhibit energetic properties and sequence conservation patterns that are characteristic of functional motifs. Besides providing candidates for experimental testing, our computational protocol offers an avenue for expanding natural RNA's functional repertoire.
Collapse
Affiliation(s)
- Uri Laserson
- Department of Chemistry, New York University251 Mercer Street, New York, NY 10012, USA
- Courant Institute of Mathematical Sciences, New York University251 Mercer Street, New York, NY 10012, USA
| | - Hin Hark Gan
- Department of Chemistry, New York University251 Mercer Street, New York, NY 10012, USA
| | - Tamar Schlick
- Department of Chemistry, New York University251 Mercer Street, New York, NY 10012, USA
- Courant Institute of Mathematical Sciences, New York University251 Mercer Street, New York, NY 10012, USA
- To whom correspondence should be addressed. Tel: +1 212 998 3116; Fax: +1 212 998 4152; E-mail:
| |
Collapse
|
89
|
Willingham AT, Orth AP, Batalov S, Peters EC, Wen BG, Aza-Blanc P, Hogenesch JB, Schultz PG. A strategy for probing the function of noncoding RNAs finds a repressor of NFAT. Science 2005; 309:1570-3. [PMID: 16141075 DOI: 10.1126/science.1115901] [Citation(s) in RCA: 592] [Impact Index Per Article: 31.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/16/2023]
Abstract
Noncoding RNA molecules (ncRNAs) have been implicated in numerous biological processes including transcriptional regulation and the modulation of protein function. Yet, in spite of the apparent abundance of ncRNA, little is known about the biological role of the projected thousands of ncRNA genes present in the human genome. To facilitate functional analysis of these RNAs, we have created an arrayed library of short hairpin RNAs (shRNAs) directed against 512 evolutionarily conserved putative ncRNAs and, via cell-based assays, we have begun to determine their roles in cellular pathways. Using this system, we have identified an ncRNA repressor of the nuclear factor of activated T cells (NFAT), which interacts with multiple proteins including members of the importin-beta superfamily and likely functions as a specific regulator of NFAT nuclear trafficking.
Collapse
Affiliation(s)
- A T Willingham
- Department of Chemistry, Scripps Research Institute, 10550 North Torrey Pines Road, La Jolla, CA 92037, USA
| | | | | | | | | | | | | | | |
Collapse
|
90
|
Abstract
Large numbers of noncoding RNA transcripts (ncRNAs) are being revealed by complementary DNA cloning and genome tiling array studies in animals. The big and as yet largely unanswered question is whether these transcripts are relevant. A paper by Willingham et al. shows the way forward by developing a strategy for large-scale functional screening of ncRNAs, involving small interfering RNA knockdowns in cell-based screens, which identified a previously unidentified ncRNA repressor of the transcription factor NFAT. It appears likely that ncRNAs constitute a critical hidden layer of gene regulation in complex organisms, the understanding of which requires new approaches in functional genomics.
Collapse
Affiliation(s)
- John S Mattick
- Australian Research Council Special Research Centre for Functional and Applied Genomics, Institute for Molecular Bioscience, University of Queensland, Brisbane, QLD 4072, Australia.
| |
Collapse
|
91
|
Costa FF. Non-coding RNAs: New players in eukaryotic biology. Gene 2005; 357:83-94. [PMID: 16111837 DOI: 10.1016/j.gene.2005.06.019] [Citation(s) in RCA: 253] [Impact Index Per Article: 13.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2005] [Revised: 04/28/2005] [Accepted: 06/02/2005] [Indexed: 11/21/2022]
Abstract
The completion of the human, mouse and other eukaryotic genomes were important scientific milestones, but they were just small steps towards the understanding of eukaryotic biology. Recent transcriptome analysis and different experimental approaches have identified a surprisingly large number of non-coding RNAs (ncRNAs) in eukaryotic cells. ncRNAs comprise microRNAs, anti-sense transcripts and other Transcriptional Units containing a high density of stop codons and lacking any extensive "Open Reading Frame". They have been shown to regulate gene expression by novel mechanisms such as RNA interference, gene co-suppression, gene silencing, imprinting and DNA demethylation. It is becoming clear that these novel RNAs perform critical functions during development and cell differentiation. There is also mounting evidence of their involvement in cancer and neurological diseases. Together, all this information indicates that ncRNAs are emerging as a new class of functional transcripts in eukaryotes. Therefore, great challenges lie in the years ahead: understanding the molecular biology of higher organisms will require revealing all proteins (Proteome), all ncRNAs (RNome) and their interactions (Interactome) in the complex molecular scenario within eukaryotic cells.
Collapse
Affiliation(s)
- Fabrício F Costa
- Molecular Neurogenetics Unit, Massachusetts General Hospital and Harvard Medical School, Boston, MA 02129, USA
| |
Collapse
|
92
|
Babak T, Blencowe BJ, Hughes TR. A systematic search for new mammalian noncoding RNAs indicates little conserved intergenic transcription. BMC Genomics 2005; 6:104. [PMID: 16083503 PMCID: PMC1199595 DOI: 10.1186/1471-2164-6-104] [Citation(s) in RCA: 65] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2005] [Accepted: 08/05/2005] [Indexed: 11/10/2022] Open
Abstract
Background Systematic identification and functional characterization of novel types of noncoding (nc)RNA in genomes is more difficult than it is for protein coding mRNAs, since ncRNAs typically do not possess sequence features such as splicing or translation signals, or long open reading frames. Recent "tiling" microarray studies have reported that a surprisingly larger proportion of mammalian genomes is transcribed than was previously anticipated. However, these non-genic transcripts often appear to be low in abundance, and their functional significance is not known. Results To systematically search for functional ncRNAs, we designed microarrays to detect 3,478 intergenic and intronic sequences that are conserved between the human, mouse, and rat genomes, and that score highly by other criteria that characterize ncRNAs. We probed these arrays with total RNA isolated from 16 wild-type mouse tissues. Among 55 candidates for highly-expressed novel ncRNAs tested by northern blotting, eight were confirmed as small, highly-and ubiquitously-expressed RNAs in mouse. Of the eight, five were also detected in rat tissues, but none were detected at appreciable levels in human tissues or cultured cells. Conclusion Since the sequence and expression of most known coding transcripts and functional ncRNAs is conserved between human and mouse, the lack of northern-detectable expression in human cells and tissues of the novel mouse and rat ncRNAs that we identified suggests that they are not functional or possibly have rodent-specific functions. Our results confirm that relatively little of the intergenic sequence conserved between human, mouse and rat is transcribed at high levels in mammalian tissues, possibly suggesting a limited role for transcribed intergenic and intronic sequences as independent functional elements.
Collapse
Affiliation(s)
- Tomas Babak
- Banting and Best Department of Medical Research, 112 College St., Toronto, ON M5G 1L6 Canada
- Department of Medical Genetics and Microbiology, 10 King's College Circle, Toronto, ON M1R 4F9 Canada
| | - Benjamin J Blencowe
- Banting and Best Department of Medical Research, 112 College St., Toronto, ON M5G 1L6 Canada
- Department of Medical Genetics and Microbiology, 10 King's College Circle, Toronto, ON M1R 4F9 Canada
| | - Timothy R Hughes
- Banting and Best Department of Medical Research, 112 College St., Toronto, ON M5G 1L6 Canada
- Department of Medical Genetics and Microbiology, 10 King's College Circle, Toronto, ON M1R 4F9 Canada
| |
Collapse
|
93
|
Shearstone JR, Wang YE, Clement A, Allaire NE, Yang C, Worley DS, Carulli JP, Perrin S. Application of functional genomic technologies in a mouse model of retinal degeneration. Genomics 2005; 85:309-21. [PMID: 15718098 DOI: 10.1016/j.ygeno.2004.11.001] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2004] [Accepted: 11/01/2004] [Indexed: 02/03/2023]
Abstract
Generation of tissue-specific, normalized and subtracted cDNA libraries has the potential to characterize the expression of rare transcriptional units not represented on Affymetrix GeneChips. Initial sequence analysis of our murine cDNA clone collections showed that as much as 86, 45, and 30% of clones are not represented on the Affymetrix Mu11k, MG-U74, and MG-430 chip sets, respectively. A detailed study that compared EST sequences of a subtracted library generated from mouse retina to those of MG-430 consensus sequences was undertaken, using UniGene build 124 as the common reference. A set of 1111 nonredundant transcript regions, not represented on the commercial array, was identified. These clusters were used as the primary filter for analyzing a data set produced by assaying samples from the Pde6b(rd1) mouse model of retinal degeneration on a 12,325-feature retinal cDNA microarray. QRT-PCR validated eight unique transcripts identified by microarray. Seven of the transcripts showed retina-specific expression. Full-length cloning strategies were applied to two of the ESTs. The genes discovered by this approach are the full-length mouse homologue of guanylate cyclase 2F (GUCY2F) and a carboxy-truncated splice variant of retinal S-antigen (SAG), known as regulators of the visual phototransduction G-protein-coupled receptor-mediated signaling pathway. These sequences have been assigned GenBank Accession Nos. and , respectively.
Collapse
Affiliation(s)
- Jeffrey R Shearstone
- Research Molecular Discovery, Biogen Idec, Inc., 14 Cambridge Center, Cambridge, MA 02142, USA.
| | | | | | | | | | | | | | | |
Collapse
|
94
|
Abstract
The past four years have seen an explosion in the number of detected RNA transcripts with no apparent protein-coding potential. This has led to speculation that non-protein-coding RNAs (ncRNAs) might be as important as proteins in the regulation of vital cellular functions. However, there has been significantly less progress in actually demonstrating the functions of these transcripts. In this article, we review the results of recent experiments that show that transcription of non-protein-coding RNA is far more widespread than was previously anticipated. Although some ncRNAs act as molecular switches that regulate gene expression, the function of many ncRNAs is unknown. New experimental and computational approaches are emerging that will help determine whether these newly identified transcription products are evidence of important new biochemical pathways or are merely 'junk' RNA generated by the cell as a by-product of its functional activities.
Collapse
Affiliation(s)
- Alexander Hüttenhofer
- Division of Genomics and RNomics, Innsbruck Medical University-Biocenter, Fritz-Pregl-Strasse 3, 6020 Innsbruck, Austria.
| | | | | |
Collapse
|
95
|
Abstract
There is growing evidence that mammalian genomes produce thousands of transcripts that do not encode proteins, and this RNA class might even rival the complexity of mRNAs. There is no doubt that a number of these non-protein-coding RNAs have important regulatory functions in the cell. However, do all transcripts have a function or are many of them products of fortuitous transcription with no function? The second scenario is mirrored by numerous alternative-splicing events that lead to truncated proteins. Nevertheless, analogous to 'superfluous' genomic DNA, aberrant transcripts or processing products embody evolutionary potential and provide novel RNAs that natural selection can act on.
Collapse
Affiliation(s)
- Jürgen Brosius
- Institute of Experimental Pathology, ZMBE, University of Münster, Von-Esmarch-Str. 56, Münster, Germany.
| |
Collapse
|
96
|
Tupy JL, Bailey AM, Dailey G, Evans-Holm M, Siebel CW, Misra S, Celniker SE, Rubin GM. Identification of putative noncoding polyadenylated transcripts in Drosophila melanogaster. Proc Natl Acad Sci U S A 2005; 102:5495-500. [PMID: 15809421 PMCID: PMC555963 DOI: 10.1073/pnas.0501422102] [Citation(s) in RCA: 107] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Analysis of EST and cDNA collections from a number of metazoan species has identified genes encoding long polyadenylated transcripts that do not contain ORFs of lengths typical for protein-encoding mRNAs. Noncoding functions of such polyadenylated transcripts have been elucidated in only a few examples. The corresponding genes neither contain hallmark sequence motifs nor appear to have been conserved across phyla. Thus, it is impossible to systematically identify new members of this class of gene by using sequence homology and traditional gene-finding algorithms that depend on protein-coding potential. Consequently, even their approximate number has not been established for any metazoan genome. We curated polyadenylated transcripts with limited protein-coding capacity from intergenic regions of the Drosophila melanogaster genome. We used RT-PCR assays, hybridization to RNA blots and whole-mount embryos, and computational analyses to characterize candidate transcripts. We verify the structures and expression of 17 distinct, likely non-protein-coding polyadenylated transcripts. We show that the expression of many of these transcripts is conserved in other Drosophila species, indicating that they have important biological functions.
Collapse
Affiliation(s)
- Jonathan L Tupy
- Berkeley Drosophila Genome Project and Department of Genome Sciences, Lawrence Berkeley National Laboratory, One Cyclotron Road, Mailstop 64-121, Berkeley, CA 94720, USA
| | | | | | | | | | | | | | | |
Collapse
|
97
|
Hubbard SJ, Grafham DV, Beattie KJ, Overton IM, McLaren SR, Croning MDR, Boardman PE, Bonfield JK, Burnside J, Davies RM, Farrell ER, Francis MD, Griffiths-Jones S, Humphray SJ, Hyland C, Scott CE, Tang H, Taylor RG, Tickle C, Brown WRA, Birney E, Rogers J, Wilson SA. Transcriptome analysis for the chicken based on 19,626 finished cDNA sequences and 485,337 expressed sequence tags. Genome Res 2005; 15:174-83. [PMID: 15590942 PMCID: PMC540287 DOI: 10.1101/gr.3011405] [Citation(s) in RCA: 74] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2004] [Accepted: 10/04/2004] [Indexed: 12/22/2022]
Abstract
We present an analysis of the chicken (Gallus gallus) transcriptome based on the full insert sequences for 19,626 cDNAs, combined with 485,337 EST sequences. The cDNA data set has been functionally annotated and describes a minimum of 11,929 chicken coding genes, including the sequence for 2260 full-length cDNAs together with a collection of noncoding (nc) cDNAs that have been stringently filtered to remove untranslated regions of coding mRNAs. The combined collection of cDNAs and ESTs describe 62,546 clustered transcripts and provide transcriptional evidence for a total of 18,989 chicken genes, including 88% of the annotated Ensembl gene set. Analysis of the ncRNAs reveals a set that is highly conserved in chickens and mammals, including sequences for 14 pri-miRNAs encoding 23 different miRNAs. The data sets described here provide a transcriptome toolkit linked to physical clones for bioinformaticians and experimental biologists who wish to use chicken systems as a low-cost, accessible alternative to mammals for the analysis of vertebrate development, immunology, and cell biology.
Collapse
Affiliation(s)
- Simon J Hubbard
- Faculty of Life Sciences, The University of Manchester, Manchester, M60 1QD, United Kingdom
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
98
|
Integration with the human genome of peptide sequences obtained by high-throughput mass spectrometry. Genome Biol 2004; 6:R9. [PMID: 15642101 PMCID: PMC549070 DOI: 10.1186/gb-2004-6-1-r9] [Citation(s) in RCA: 228] [Impact Index Per Article: 11.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2004] [Revised: 10/21/2004] [Accepted: 11/17/2004] [Indexed: 11/21/2022] Open
Abstract
Peptides derived from protein tandem mass spectrometry data have been mapped to the human genome sequence forming an expandable resource for the proteomic data. A crucial aim upon the completion of the human genome is the verification and functional annotation of all predicted genes and their protein products. Here we describe the mapping of peptides derived from accurate interpretations of protein tandem mass spectrometry (MS) data to eukaryotic genomes and the generation of an expandable resource for integration of data from many diverse proteomics experiments. Furthermore, we demonstrate that peptide identifications obtained from high-throughput proteomics can be integrated on a large scale with the human genome. This resource could serve as an expandable repository for MS-derived proteome information.
Collapse
|
99
|
Kasukawa T, Katayama S, Kawaji H, Suzuki H, Hume DA, Hayashizaki Y. Construction of representative transcript and protein sets of human, mouse, and rat as a platform for their transcriptome and proteome analysis. Genomics 2004; 84:913-21. [PMID: 15533708 DOI: 10.1016/j.ygeno.2004.08.011] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2004] [Accepted: 08/16/2004] [Indexed: 10/26/2022]
Abstract
The number of mammalian transcripts identified by full-length cDNA projects and genome sequencing projects is increasing remarkably. Clustering them into a strictly nonredundant and comprehensive set provides a platform for functional analysis of the transcriptome and proteome, but the quality of the clustering and predictive usefulness have previously required manual curation to identify truncated transcripts and inappropriate clustering of closely related sequences. A Representative Transcript and Protein Sets (RTPS) pipeline was previously designed to identify the nonredundant and comprehensive set of mouse transcripts based on clustering of a large mouse full-length cDNA set (FANTOM2). Here we propose an alternative method that is more robust, requires less manual curation, and is applicable to other organisms in addition to mouse. RTPSs of human, mouse, and rat have been produced by this method and used for validation. Their comprehensiveness and quality are discussed by comparison with other clustering approaches. The RTPSs are available at .
Collapse
Affiliation(s)
- Takeya Kasukawa
- Laboratory for Genome Exploration Research Group, RIKEN Genomic Sciences Center, RIKEN Yokohama Institute, Yokohama, Kanagawa 230-0045, Japan.
| | | | | | | | | | | |
Collapse
|
100
|
|