51
|
Marz M, Gruber AR, Höner Zu Siederdissen C, Amman F, Badelt S, Bartschat S, Bernhart SH, Beyer W, Kehr S, Lorenz R, Tanzer A, Yusuf D, Tafer H, Hofacker IL, Stadler PF. Animal snoRNAs and scaRNAs with exceptional structures. RNA Biol 2011; 8:938-46. [PMID: 21955586 DOI: 10.4161/rna.8.6.16603] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022] Open
Abstract
The overwhelming majority of small nucleolar RNAs (snoRNAs) fall into two clearly defined classes characterized by distinctive secondary structures and sequence motifs. A small group of diverse ncRNAs, however, shares the hallmarks of one or both classes of snoRNAs but differs substantially from the norm in some respects. Here, we compile the available information on these exceptional cases, conduct a thorough homology search throughout the available metazoan genomes, provide improved and expanded alignments, and investigate the evolutionary histories of these ncRNA families as well as their mutual relationships.
Collapse
Affiliation(s)
- Manja Marz
- RNA Bioinformatik Gruppe, Institut f¨ur Pharmazeutische Chemie, Philipps Universit¨at Marburg, Marburg, Germany
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
52
|
Daly T, Chen XS, Penny D. How old are RNA networks? ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2011; 722:255-73. [PMID: 21915795 DOI: 10.1007/978-1-4614-0332-6_17] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/11/2023]
Abstract
Some major classes of RNAs (such as mRNA, rRNA, tRNA and RNase P) are ubiquitous in all living systems so are inferred to have arisen early during the origin of life. However, the situation is not so clear for the system of RNA regulatory networks that continue to be uncovered, especially in eukaryotes. It is increasingly being recognised that networks of small RNAs are important for regulation in all cells, but it is not certain whether the origin of these networks are as old as rRNAs and tRNA. Another group of ncRNAs, including snoRNAs, occurs mainly in archaea and eukaryotes and their ultimate origin is less certain, although perhaps the simplest hypothesis is that they were present in earlier stages of life and were lost from bacteria. Some RNA networks may trace back to an early stage when there was just RNA and proteins, the RNP-world; before DNA.
Collapse
Affiliation(s)
- Toni Daly
- Allan Wilson Centre of Molecular Ecology and Evolution, Massey University, Palmerston North, New Zealand
| | | | | |
Collapse
|
53
|
Michaeli S, Doniger T, Gupta SK, Wurtzel O, Romano M, Visnovezky D, Sorek R, Unger R, Ullu E. RNA-seq analysis of small RNPs in Trypanosoma brucei reveals a rich repertoire of non-coding RNAs. Nucleic Acids Res 2011; 40:1282-98. [PMID: 21976736 PMCID: PMC3273796 DOI: 10.1093/nar/gkr786] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
The discovery of a plethora of small non-coding RNAs (ncRNAs) has fundamentally changed our understanding of how genes are regulated. In this study, we employed the power of deep sequencing of RNA (RNA-seq) to examine the repertoire of ncRNAs present in small ribonucleoprotein particles (RNPs) of Trypanosoma brucei, an important protozoan parasite. We identified new C/D and H/ACA small nucleolar RNAs (snoRNAs), as well as tens of putative novel non-coding RNAs; several of these are processed from trans-spliced and polyadenylated transcripts. The RNA-seq analysis provided information on the relative abundance of the RNAs, and their 5'- and 3'-termini. The study demonstrated that three highly abundant snoRNAs are involved in rRNA processing and highlight the unique trypanosome-specific repertoire of these RNAs. Novel RNAs were studied using in situ hybridization, association in RNP complexes, and 'RNA walk' to detect interaction with their target RNAs. Finally, we showed that the abundance of certain ncRNAs varies between the two stages of the parasite, suggesting that ncRNAs may contribute to gene regulation during the complex parasite's life cycle. This is the first study to provide a whole-genome analysis of the large repertoire of small RNPs in trypanosomes.
Collapse
Affiliation(s)
- Shulamit Michaeli
- The Mina and Everard Goodman Faculty of Life Sciences, and Advanced Materials and Nanotechnology Institute, Bar-Ilan University, Ramat-Gan 52900, Israel.
| | | | | | | | | | | | | | | | | |
Collapse
|
54
|
Zou Q, Lin C, Liu XY, Han YP, Li WB, Guo MZ. Novel representation of RNA secondary structure used to improve prediction algorithms. GENETICS AND MOLECULAR RESEARCH 2011; 10:1986-98. [PMID: 21948761 DOI: 10.4238/vol10-3gmr1181] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/03/2022]
Abstract
We propose a novel representation of RNA secondary structure for a quick comparison of different structures. Secondary structure was viewed as a set of stems and each stem was represented by two values according to its position. Using this representation, we improved the comparative sequence analysis method results and the minimum free-energy model. In the comparative sequence analysis method, a novel algorithm independent of multiple sequence alignment was developed to improve performance. When dealing with a single-RNA sequence, the minimum free-energy model is improved by combining it with RNA class information. Secondary structure prediction experiments were done on tRNA and RNAse P RNA; sensitivity and specificity were both improved. Furthermore, software programs were developed for non-commercial use.
Collapse
Affiliation(s)
- Q Zou
- School of Information Science and Technology, Xiamen University, Xiamen, China
| | | | | | | | | | | |
Collapse
|
55
|
Identification and analysis of intermediate size noncoding RNAs in the human fetal brain. PLoS One 2011; 6:e21652. [PMID: 21789175 PMCID: PMC3138756 DOI: 10.1371/journal.pone.0021652] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2010] [Accepted: 06/07/2011] [Indexed: 12/18/2022] Open
Abstract
The involvement of noncoding RNAs (ncRNAs) in the development of the human brain remains largely unknown. Applying a cloning strategy for detection of intermediate size (50–500 nt) ncRNAs (is-ncRNAs) we have identified 82 novel transcripts in human fetal brain tissue. Most of the novel is-ncRNAs are not well conserved in vertebrates, and several transcripts were only found in primates. Northern blot and microarray analysis indicated considerable variation in expression across human fetal brain development stages and fetal tissues for both novel and known is-ncRNAs. Expression of several of the novel is-ncRNAs was conspicuously absent in one or two brain cancer cell lines, and transient overexpression of some transcripts in cancer cells significantly inhibited cell proliferation. Overall, our results suggest that is-ncRNAs play important roles in the development and tumorigenesis of human brain.
Collapse
|
56
|
Fasold M, Langenberger D, Binder H, Stadler PF, Hoffmann S. DARIO: a ncRNA detection and analysis tool for next-generation sequencing experiments. Nucleic Acids Res 2011; 39:W112-7. [PMID: 21622957 PMCID: PMC3125765 DOI: 10.1093/nar/gkr357] [Citation(s) in RCA: 56] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023] Open
Abstract
Small non-coding RNAs (ncRNAs) such as microRNAs, snoRNAs and tRNAs are a diverse collection of molecules with several important biological functions. Current methods for high-throughput sequencing for the first time offer the opportunity to investigate the entire ncRNAome in an essentially unbiased way. However, there is a substantial need for methods that allow a convenient analysis of these overwhelmingly large data sets. Here, we present DARIO, a free web service that allows to study short read data from small RNA-seq experiments. It provides a wide range of analysis features, including quality control, read normalization, ncRNA quantification and prediction of putative ncRNA candidates. The DARIO web site can be accessed at http://dario.bioinf.uni-leipzig.de/.
Collapse
Affiliation(s)
- Mario Fasold
- Interdisciplinary Center for Bioinformatics and Bioinformatics Group, Department of Computer Science, University Leipzig, Germany
| | | | | | | | | |
Collapse
|
57
|
Wang Y, Chen J, Wei G, He H, Zhu X, Xiao T, Yuan J, Dong B, He S, Skogerbø G, Chen R. The Caenorhabditis elegans intermediate-size transcriptome shows high degree of stage-specific expression. Nucleic Acids Res 2011; 39:5203-14. [PMID: 21378118 PMCID: PMC3130273 DOI: 10.1093/nar/gkr102] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/14/2023] Open
Abstract
Earlier studies have revealed a substantial amount of transcriptional activity occurring outside annotated protein-coding genes of the Caenorhabditis elegans genome. One important fraction of this transcriptional activity relates to intermediate-size (70–500 nt) transcripts (is-ncRNAs) of mostly unknown function. Profiling the expression of this segment of the transcriptome on a tiling array through the C. elegans life cycle identified 5866 hitherto unannotated transcripts. The novel loci were distributed across intronic and intergenic space, with some enrichment toward protein-coding gene termini. The majority of the putative is-ncRNAs showed either stage-specific expression, or distinct developmental variation in their expression levels. More than 200 loci showed male-specific expression, and conserved loci were significantly enriched on the X chromosome, both observations strongly suggesting involvement of is-ncRNAs in sex-specific functions. Half of the novel loci were conserved in other nematodes, and numerous loci showed significant conservational correlations to nearby coding genes. Assuming functional roles for most of the novel loci, the data imply a nematode is-ncRNA tool kit of considerable size and variety.
Collapse
Affiliation(s)
- Yunfei Wang
- Bioinformatics Laboratory and National Laboratory of Biomacromolecules, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
58
|
Li D, Wang Y, Zhang K, Jiao Z, Zhu X, Skogerboe G, Guo X, Chinnusamy V, Bi L, Huang Y, Dong S, Chen R, Kan Y. Experimental RNomics and genomic comparative analysis reveal a large group of species-specific small non-message RNAs in the silkworm Bombyx mori. Nucleic Acids Res 2011; 39:3792-805. [PMID: 21227919 PMCID: PMC3089462 DOI: 10.1093/nar/gkq1317] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
Abstract
Accumulating evidences show that small non-protein coding RNAs (ncRNAs) play important roles in development, stress response and other cellular processes. The silkworm is an important model for studies on insect genetics and control of lepidopterous pests. Here, we have performed the first systematic identification and analysis of intermediate size ncRNAs (50–500 nt) in the silkworm. We identified 189 novel ncRNAs, including 141 snoRNAs, six snRNAs, three tRNAs, one SRP and 38 unclassified ncRNAs. Forty ncRNAs showed significantly altered expression during silkworm development or across specific stage transitions. Genomic comparisons revealed that 123 of these ncRNAs are potentially silkworm-specific. Analysis of the genomic organization of the ncRNA loci showed that 32.62% of the novel snoRNA loci are intergenic, and that all the intronic snoRNAs follow the pattern of one-snoRNA-per-intron. Target site analysis predicted a total of 95 2′-O-methylation and pseudouridylation modification sites of rRNAs, snRNAs and tRNAs. Together, these findings provide new clues for future functional study of ncRNA during insect development and evolution.
Collapse
Affiliation(s)
- Dandan Li
- Department of Entomology, College of Plant Protection, Nanjing Agricultural University, Key Laboratory of Monitoring and Management of Crop Diseases and Pest Insects, Ministry of Agriculture, Nanjing 210095, China
| | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
59
|
Abstract
Rapid improvements in high-throughput experimental technologies make it nowadays possible to study the expression, as well as changes in expression, of whole transcriptomes under different environmental conditions in a detailed view. We describe current approaches to identify genome-wide functional RNA transcripts (experimentally as well as computationally), and focus on computational methods that may be utilized to disclose their function. While genome databases offer a wealth of information about known and putative functions for protein-coding genes, functional information for novel non-coding RNA genes is almost nonexistent. This is mainly explained by the lack of established software tools to efficiently reveal the function and evolutionary origin of non-coding RNA genes. Here, we describe in detail computational approaches one may follow to annotate and classify an RNA transcript.
Collapse
Affiliation(s)
- Kristin Reiche
- Fraunhofer Institute for Cell Therapy and Immunology, Leipzig, Germany
| | | | | | | | | |
Collapse
|
60
|
Langenberger D, Bartschat S, Hertel J, Hoffmann S, Tafer H, Stadler PF. MicroRNA or Not MicroRNA? ADVANCES IN BIOINFORMATICS AND COMPUTATIONAL BIOLOGY 2011. [DOI: 10.1007/978-3-642-22825-4_1] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
|
61
|
Zhang L, Li W, Song L, Chen L. A towards-multidimensional screening approach to predict candidate genes of rheumatoid arthritis based on SNP, structural and functional annotations. BMC Med Genomics 2010; 3:38. [PMID: 20727150 PMCID: PMC2939610 DOI: 10.1186/1755-8794-3-38] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2009] [Accepted: 08/20/2010] [Indexed: 11/20/2022] Open
Abstract
Background According to the Genetic Analysis Workshops (GAW), hundreds of thousands of SNPs have been tested for association with rheumatoid arthritis. Traditional genome-wide association studies (GWAS) have been developed to identify susceptibility genes using a "most significant SNPs/genes" model. However, many minor- or modest-risk genes are likely to be missed after adjustment of multiple testing. This screening process uses a strict selection of statistical thresholds that aim to identify susceptibility genes based only on statistical model, without considering multi-dimensional biological similarities in sequence arrangement, crystal structure, or functional categories/biological pathways between candidate and known disease genes. Methods Multidimensional screening approaches combined with traditional statistical genetics methods can consider multiple biological backgrounds of genetic mutation, structural, and functional annotations. Here we introduce a newly developed multidimensional screening approach for rheumatoid arthritis candidate genes that considers all SNPs with nominal evidence of Bayesian association (BFLn > 0), and structural and functional similarities of corresponding genes or proteins. Results Our multidimensional screening approach extracted all risk genes (BFLn > 0) by odd ratios of hypothesis H1 to H0, and determined whether a particular group of genes shared underlying biological similarities with known disease genes. Using this method, we found 6614 risk SNPs in our Bayesian screen result set. Finally, we identified 146 likely causal genes for rheumatoid arthritis, including CD4, FGFR1, and KDR, which have been reported as high risk factors by recent studies. We must denote that 790 (96.1%) of genes identified by GWAS could not easily be classified into related functional categories or biological processes associated with the disease, while our candidate genes shared underlying biological similarities (e.g. were in the same pathway or GO term) and contributed to disease etiology, but where common variations in each of these genes make modest contributions to disease risk. We also found 6141 risk SNPs that were too minor to be detected by conventional approaches, and associations between 58 candidate genes and rheumatoid arthritis were verified by literature retrieved from the NCBI PubMed module. Conclusions Our proposed approach to the analysis of GAW16 data for rheumatoid arthritis was based on an underlying biological similarities-based method applied to candidate and known disease genes. Application of our method could identify likely causal candidate disease genes of rheumatoid arthritis, and could yield biological insights that not detected when focusing only on genes that give the strongest evidence by multiple testing. We hope that our proposed method complements the "most significant SNPs/genes" model, and provides additional insights into the pathogenesis of rheumatoid arthritis and other diseases, when searching datasets for hundreds of genetic variances.
Collapse
Affiliation(s)
- Liangcai Zhang
- Department of Biophysics, College of Bioinformatics Science and Technology; Harbin Medical University, Harbin, Hei Longjiang Province, China
| | | | | | | |
Collapse
|
62
|
Computational RNomics: Structure identification and functional prediction of non-coding RNAs in silico. SCIENCE CHINA-LIFE SCIENCES 2010; 53:548-62. [DOI: 10.1007/s11427-010-0101-9] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/20/2009] [Accepted: 06/28/2009] [Indexed: 01/05/2023]
|
63
|
Kim SH, Spensley M, Choi SK, Calixto CPG, Pendle AF, Koroleva O, Shaw PJ, Brown JWS. Plant U13 orthologues and orphan snoRNAs identified by RNomics of RNA from Arabidopsis nucleoli. Nucleic Acids Res 2010; 38:3054-67. [PMID: 20081206 PMCID: PMC2875012 DOI: 10.1093/nar/gkp1241] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2009] [Revised: 12/23/2009] [Accepted: 12/23/2009] [Indexed: 11/13/2022] Open
Abstract
Small nucleolar RNAs (snoRNAs) and small Cajal body-specific RNAs (scaRNAs) are non-coding RNAs whose main function in eukaryotes is to guide the modification of nucleotides in ribosomal and spliceosomal small nuclear RNAs, respectively. Full-length sequences of Arabidopsis snoRNAs and scaRNAs have been obtained from cDNA libraries of capped and uncapped small RNAs using RNA from isolated nucleoli from Arabidopsis cell cultures. We have identified 31 novel snoRNA genes (9 box C/D and 22 box H/ACA) and 15 new variants of previously described snoRNAs. Three related capped snoRNAs with a distinct gene organization and structure were identified as orthologues of animal U13snoRNAs. In addition, eight of the novel genes had no complementarity to rRNAs or snRNAs and are therefore putative orphan snoRNAs potentially reflecting wider functions for these RNAs. The nucleolar localization of a number of the snoRNAs and the localization to nuclear bodies of two putative scaRNAs was confirmed by in situ hybridization. The majority of the novel snoRNA genes were found in new gene clusters or as part of previously described clusters. These results expand the repertoire of Arabidopsis snoRNAs to 188 snoRNA genes with 294 gene variants.
Collapse
Affiliation(s)
- Sang Hyon Kim
- Genetics Programme, Scottish Crop Research Institute, Invergowrie, Dundee DD2 5DA, Scotland, UK, Division of Biosciences and Bioinformatics, College of Natural Science, Myongji University, Yongin, Kyeongki-do 449-728, Korea, Division of Plant Sciences, University of Dundee at SCRI, Invergowrie, Dundee DD2 5DA, Scotland, Department of Cell and Developmental Biology, John Innes Centre, Colney, Norwich NR4 7UH and School of Biological Sciences, University of Reading, Whiteknights, Reading RG6 6AS, UK
| | - Mark Spensley
- Genetics Programme, Scottish Crop Research Institute, Invergowrie, Dundee DD2 5DA, Scotland, UK, Division of Biosciences and Bioinformatics, College of Natural Science, Myongji University, Yongin, Kyeongki-do 449-728, Korea, Division of Plant Sciences, University of Dundee at SCRI, Invergowrie, Dundee DD2 5DA, Scotland, Department of Cell and Developmental Biology, John Innes Centre, Colney, Norwich NR4 7UH and School of Biological Sciences, University of Reading, Whiteknights, Reading RG6 6AS, UK
| | - Seung Kook Choi
- Genetics Programme, Scottish Crop Research Institute, Invergowrie, Dundee DD2 5DA, Scotland, UK, Division of Biosciences and Bioinformatics, College of Natural Science, Myongji University, Yongin, Kyeongki-do 449-728, Korea, Division of Plant Sciences, University of Dundee at SCRI, Invergowrie, Dundee DD2 5DA, Scotland, Department of Cell and Developmental Biology, John Innes Centre, Colney, Norwich NR4 7UH and School of Biological Sciences, University of Reading, Whiteknights, Reading RG6 6AS, UK
| | - Cristiane P. G. Calixto
- Genetics Programme, Scottish Crop Research Institute, Invergowrie, Dundee DD2 5DA, Scotland, UK, Division of Biosciences and Bioinformatics, College of Natural Science, Myongji University, Yongin, Kyeongki-do 449-728, Korea, Division of Plant Sciences, University of Dundee at SCRI, Invergowrie, Dundee DD2 5DA, Scotland, Department of Cell and Developmental Biology, John Innes Centre, Colney, Norwich NR4 7UH and School of Biological Sciences, University of Reading, Whiteknights, Reading RG6 6AS, UK
| | - Ali F. Pendle
- Genetics Programme, Scottish Crop Research Institute, Invergowrie, Dundee DD2 5DA, Scotland, UK, Division of Biosciences and Bioinformatics, College of Natural Science, Myongji University, Yongin, Kyeongki-do 449-728, Korea, Division of Plant Sciences, University of Dundee at SCRI, Invergowrie, Dundee DD2 5DA, Scotland, Department of Cell and Developmental Biology, John Innes Centre, Colney, Norwich NR4 7UH and School of Biological Sciences, University of Reading, Whiteknights, Reading RG6 6AS, UK
| | - Olga Koroleva
- Genetics Programme, Scottish Crop Research Institute, Invergowrie, Dundee DD2 5DA, Scotland, UK, Division of Biosciences and Bioinformatics, College of Natural Science, Myongji University, Yongin, Kyeongki-do 449-728, Korea, Division of Plant Sciences, University of Dundee at SCRI, Invergowrie, Dundee DD2 5DA, Scotland, Department of Cell and Developmental Biology, John Innes Centre, Colney, Norwich NR4 7UH and School of Biological Sciences, University of Reading, Whiteknights, Reading RG6 6AS, UK
| | - Peter J. Shaw
- Genetics Programme, Scottish Crop Research Institute, Invergowrie, Dundee DD2 5DA, Scotland, UK, Division of Biosciences and Bioinformatics, College of Natural Science, Myongji University, Yongin, Kyeongki-do 449-728, Korea, Division of Plant Sciences, University of Dundee at SCRI, Invergowrie, Dundee DD2 5DA, Scotland, Department of Cell and Developmental Biology, John Innes Centre, Colney, Norwich NR4 7UH and School of Biological Sciences, University of Reading, Whiteknights, Reading RG6 6AS, UK
| | - John W. S. Brown
- Genetics Programme, Scottish Crop Research Institute, Invergowrie, Dundee DD2 5DA, Scotland, UK, Division of Biosciences and Bioinformatics, College of Natural Science, Myongji University, Yongin, Kyeongki-do 449-728, Korea, Division of Plant Sciences, University of Dundee at SCRI, Invergowrie, Dundee DD2 5DA, Scotland, Department of Cell and Developmental Biology, John Innes Centre, Colney, Norwich NR4 7UH and School of Biological Sciences, University of Reading, Whiteknights, Reading RG6 6AS, UK
| |
Collapse
|
64
|
Majer A, Booth SA. Computational methodologies for studying non-coding RNAs relevant to central nervous system function and dysfunction. Brain Res 2010; 1338:131-45. [PMID: 20381467 DOI: 10.1016/j.brainres.2010.03.095] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2009] [Revised: 03/19/2010] [Accepted: 03/26/2010] [Indexed: 12/21/2022]
Abstract
Non-coding RNAs (ncRNAs) are a large and diverse group of transcripts that span the eukaryotic genome, of which less than 2% encodes proteins. Several distinct families of ncRNAs have been described and implicated in many aspects of central nervous system (CNS) function including translation, RNA metabolism, gene regulation, and development. The need to distinguish ncRNAs from sequence data, as well as potentially uncovering novel ncRNA families, has ignited the development of customized computational approaches and bioinformatic resources to handle these tasks. In this review, we provide an overview of the numerous procedures developed to predict ncRNAs based on their primary sequence and predicted secondary structure. These methodologies are broadly grouped into genome scanning algorithms, mixed approaches, and machine learning algorithms. Regulatory ncRNAs, particularly microRNAs (miRNAs), are a major focus of current research efforts and this review will therefore center on the prediction of miRNAs and the putative gene targets they act upon. With the advent of ultra high-throughput sequencing technologies 'deep sequencing' has emerged as the cutting-edge method for ncRNA identification and we will also touch on some computational resources that play a key role in analysis of this type of data.
Collapse
Affiliation(s)
- Anna Majer
- Department of Medical Microbiology and Infectious Diseases, Faculty of Medicine, University of Manitoba, Manitoba, Canada
| | | |
Collapse
|
65
|
Ellis JC, Brown DD, Brown JW. The small nucleolar ribonucleoprotein (snoRNP) database. RNA (NEW YORK, N.Y.) 2010; 16:664-666. [PMID: 20197376 PMCID: PMC2844615 DOI: 10.1261/rna.1871310] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/07/2009] [Accepted: 01/05/2010] [Indexed: 05/26/2023]
Abstract
Small nucleolar ribonucleoproteins (snoRNPs) are widely studied and characterized as guide RNAs for sequence-specific 2'-O-ribose methylation and psuedouridylation of ribosomal RNAs. In addition, snoRNAs have also been shown to interact with some tRNAs and direct alternative splicing in mRNA biogenesis. Recent advances in bioinformatics have resulted in new algorithms able to rapidly identify noncoding RNAs generally and snoRNAs specifically in genomic and metagenomic sequences, resulting in a rapid increase in the number and diversity of identified snoRNA sequences. The snoRNP database is a web-based collection of snoRNA and snoRNA-associated protein sequences from a wide range of species. The database currently contains 8994 snoRNA sequences from Bacteria, Archaea, and Eukaryotes and 589 snoRNA-associated protein sequences. The snoRNP database can be found at: http://evolveathome.com/snoRNA/snoRNA.php.
Collapse
|
66
|
Boria I, Gruber AR, Tanzer A, Bernhart SH, Lorenz R, Mueller MM, Hofacker IL, Stadler PF. Nematode sbRNAs: Homologs of Vertebrate Y RNAs. J Mol Evol 2010; 70:346-58. [DOI: 10.1007/s00239-010-9332-4] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2009] [Accepted: 03/01/2010] [Indexed: 01/20/2023]
|
67
|
Rederstorff M, Bernhart SH, Tanzer A, Zywicki M, Perfler K, Lukasser M, Hofacker IL, Hüttenhofer A. RNPomics: defining the ncRNA transcriptome by cDNA library generation from ribonucleo-protein particles. Nucleic Acids Res 2010; 38:e113. [PMID: 20150415 PMCID: PMC2879528 DOI: 10.1093/nar/gkq057] [Citation(s) in RCA: 37] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023] Open
Abstract
Up to 450 000 non-coding RNAs (ncRNAs) have been predicted to be transcribed from the human genome. However, it still has to be elucidated which of these transcripts represent functional ncRNAs. Since all functional ncRNAs in Eukarya form ribonucleo-protein particles (RNPs), we generated specialized cDNA libraries from size-fractionated RNPs and validated the presence of selected ncRNAs within RNPs by glycerol gradient centrifugation. As a proof of concept, we applied the RNP method to human Hela cells or total mouse brain, and subjected cDNA libraries, generated from the two model systems, to deep-sequencing. Bioinformatical analysis of cDNA sequences revealed several hundred ncRNP candidates. Thereby, ncRNAs candidates were mainly located in intergenic as well as intronic regions of the genome, with a significant overrepresentation of intron-derived ncRNA sequences. Additionally, a number of ncRNAs mapped to repetitive sequences. Thus, our RNP approach provides an efficient way to identify new functional small ncRNA candidates, involved in RNP formation.
Collapse
Affiliation(s)
- Mathieu Rederstorff
- Division of Genomics and RNomics, Innsbruck Biocentre, Innsbruck Medical University, Innsbruck and Institute of Theoretical Chemistry, University of Vienna, Vienna, Austria
| | | | | | | | | | | | | | | |
Collapse
|
68
|
Wang PPS, Ruvinsky I. Computational prediction of Caenorhabditis box H/ACA snoRNAs using genomic properties of their host genes. RNA (NEW YORK, N.Y.) 2010; 16:290-298. [PMID: 20038629 PMCID: PMC2811658 DOI: 10.1261/rna.1876210] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/11/2009] [Accepted: 10/27/2009] [Indexed: 05/28/2023]
Abstract
Identification of small nucleolar RNAs (snoRNAs) in genomic sequences has been challenging due to the relative paucity of sequence features. Many current prediction algorithms rely on detection of snoRNA motifs complementary to target sites in snRNAs and rRNAs. However, recent discovery of snoRNAs without apparent targets requires development of alternative prediction methods. We present an approach that combines rule-based filters and a Bayesian Classifier to identify a class of snoRNAs (H/ACA) without requiring target sequence information. It takes advantage of unique attributes of their genomic organization and improved species-specific motif characterization to predict snoRNAs that may otherwise be difficult to discover. Searches in the genomes of Caenorhabditis elegans and the closely related Caenorhabditis briggsae suggest that our method performs well compared to recent benchmark algorithms. Our results illustrate the benefits of training gene discovery engines on features restricted to particular phylogenetic groups and the utility of incorporating diverse data types in gene prediction.
Collapse
Affiliation(s)
- Paul Po-Shen Wang
- Department of Ecology and Evolution , University of Chicago, Chicago, Illinois 60637, USA
| | | |
Collapse
|
69
|
Jung CH, Hansen MA, Makunin IV, Korbie DJ, Mattick JS. Identification of novel non-coding RNAs using profiles of short sequence reads from next generation sequencing data. BMC Genomics 2010; 11:77. [PMID: 20113528 PMCID: PMC2825236 DOI: 10.1186/1471-2164-11-77] [Citation(s) in RCA: 45] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2009] [Accepted: 02/01/2010] [Indexed: 11/30/2022] Open
Abstract
Background The increasing interest in small non-coding RNAs (ncRNAs) such as microRNAs (miRNAs), small interfering RNAs (siRNAs) and Piwi-interacting RNAs (piRNAs) and recent advances in sequencing technology have yielded large numbers of short (18-32 nt) RNA sequences from different organisms, some of which are derived from small nucleolar RNAs (snoRNAs) and transfer RNAs (tRNAs). We observed that these short ncRNAs frequently cover the entire length of annotated snoRNAs or tRNAs, which suggests that other loci specifying similar ncRNAs can be identified by clusters of short RNA sequences. Results We combined publicly available datasets of tens of millions of short RNA sequence tags from Drosophila melanogaster, and mapped them to the Drosophila genome. Approximately 6 million perfectly mapping sequence tags were then assembled into 521,302 tag-contigs (TCs) based on tag overlap. Most transposon-derived sequences, exons and annotated miRNAs, tRNAs and snoRNAs are detected by TCs, which show distinct patterns of length and tag-depth for different categories. The typical length and tag-depth of snoRNA-derived TCs was used to predict 7 previously unrecognized box H/ACA and 26 box C/D snoRNA candidates. We also identified one snRNA candidate and 86 loci with a high number of tags that are yet to be annotated, 7 of which have a particular 18mer motif and are located in introns of genes involved in development. A subset of new snoRNA candidates and putative ncRNA candidates was verified by Northern blot. Conclusions In this study, we have introduced a new approach to identify new members of known classes of ncRNAs based on the features of TCs corresponding to known ncRNAs. A large number of the identified TCs are yet to be examined experimentally suggesting that many more novel ncRNAs remain to be discovered.
Collapse
Affiliation(s)
- Chol-Hee Jung
- Institute for Molecular Bioscience, University of Queensland, St. Lucia QLD 4072, Australia
| | | | | | | | | |
Collapse
|
70
|
Zhang Y, Liu J, Jia C, Li T, Wu R, Wang J, Chen Y, Zou X, Chen R, Wang XJ, Zhu D. Systematic identification and evolutionary features of rhesus monkey small nucleolar RNAs. BMC Genomics 2010; 11:61. [PMID: 20100322 PMCID: PMC2832892 DOI: 10.1186/1471-2164-11-61] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2009] [Accepted: 01/25/2010] [Indexed: 12/12/2022] Open
Abstract
BACKGROUND Recent studies have demonstrated that non-protein-coding RNAs (npcRNAs/ncRNAs) play important roles during eukaryotic development, species evolution, and in the etiology of disease. Rhesus macaques are the most widely used primate model in both biomedical research and primate evolutionary studies. However, most reports on these animals focus on the functional roles of protein-coding sequences, whereas very little is known about macaque ncRNAs. RESULTS In the present study, we performed the first systematic profiling of intermediate-size ncRNAs (50 to 500 nt) from the rhesus monkey by constructing a cDNA library. We identified 117 rhesus monkey ncRNAs, including 80 small nucleolar RNAs (snoRNAs), 29 other types of known RNAs (snRNAs, Y RNA, and others), and eight unclassified ncRNAs. Comparative genomic analysis and northern blot hybridizations demonstrated that some snoRNAs were lineage- or species-specific. Paralogous sequences were found for most rhesus monkey snoRNAs, the expression of which might be attributable to extensive duplication within the rhesus monkey genome. Further investigation of snoRNA flanking sequences showed that some rhesus monkey snoRNAs are retrogenes derived from L1-mediated integration. Finally, phylogenetic analysis demonstrated that birds and primates share some snoRNAs and host genes thereof, suggesting that both the relevant host genes and the snoRNAs contained therein may be inherited from a common ancestor. However, some rhesus monkey snoRNAs hosted by non-ribosome-related genes appeared after the evolutionary divergence between birds and mammals. CONCLUSIONS We provide the first experimentally-derived catalog of rhesus monkey ncRNAs and uncover some interesting genomic and evolutionary features. These findings provide important information for future functional characterization of snoRNAs during primate evolution.
Collapse
Affiliation(s)
- Yong Zhang
- National Laboratory of Medical Molecular Biology, Institute of Basic Medical Sciences, Chinese Academy of Medical Sciences, School of Basic Medicine, Peking Union Medical College, Beijing, PR China
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
71
|
Abstract
Noncoding RNAs (ncRNAs) are increasingly recognized as important functional molecules in the cell. Here we give a short overview of fundamental computational techniques to analyze ncRNAs that can help us better understand their function. Topics covered include prediction of secondary structure from the primary sequence, prediction of consensus structures for homologous sequences, search for homologous sequences in databases using sequence and structure comparisons, annotation of tRNAs, rRNAs, snoRNAs, and microRNAs, de novo prediction of novel ncRNAs, and prediction of RNA/RNA interactions including miRNA target prediction.
Collapse
|
72
|
Deep sequencing analysis of the Methanosarcina mazei Gö1 transcriptome in response to nitrogen availability. Proc Natl Acad Sci U S A 2009; 106:21878-82. [PMID: 19996181 DOI: 10.1073/pnas.0909051106] [Citation(s) in RCA: 149] [Impact Index Per Article: 9.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/23/2023] Open
Abstract
Methanosarcina mazei and related mesophilic archaea are the only organisms fermenting acetate, methylamines, and methanol to methane and carbon dioxide, contributing significantly to greenhouse gas production. The biochemistry of these metabolic processes is well studied, and genome sequences are available, yet little is known about the overall transcriptional organization and the noncoding regions representing 25% of the 4.01-Mb genome of M. mazei. We present a genome-wide analysis of transcription start sites (TSS) in M. mazei grown under different nitrogen availabilities. Pyrosequencing-based differential analysis of primary vs. processed 5' ends of transcripts discovered 876 TSS across the M. mazei genome. Unlike in other archaea, in which leaderless mRNAs are prevalent, the majority of the detected mRNAs in M. mazei carry long untranslated 5' regions. Our experimental data predict a total of 208 small RNA (sRNA) candidates, mostly from intergenic regions but also antisense to 5' and 3' regions of mRNAs. In addition, 40 new small mRNAs with ORFs of < or = 30 aa were identified, some of which might have dual functions as mRNA and regulatory sRNA. We confirmed differential expression of several sRNA genes in response to nitrogen availability. Inspection of their promoter regions revealed a unique conserved sequence motif associated with nitrogen-responsive regulation, which might serve as a regulator binding site upstream of the common IIB recognition element. Strikingly, several sRNAs antisense to mRNAs encoding transposases indicate nitrogen-dependent transposition events. This global TSS map in archaea will facilitate a better understanding of transcriptional and posttranscriptional control in the third domain of life.
Collapse
|
73
|
Copeland CS, Marz M, Rose D, Hertel J, Brindley PJ, Santana CB, Kehr S, Attolini CSO, Stadler PF. Homology-based annotation of non-coding RNAs in the genomes of Schistosoma mansoni and Schistosoma japonicum. BMC Genomics 2009; 10:464. [PMID: 19814823 PMCID: PMC2770079 DOI: 10.1186/1471-2164-10-464] [Citation(s) in RCA: 48] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2009] [Accepted: 10/08/2009] [Indexed: 11/27/2022] Open
Abstract
BACKGROUND Schistosomes are trematode parasites of the phylum Platyhelminthes. They are considered the most important of the human helminth parasites in terms of morbidity and mortality. Draft genome sequences are now available for Schistosoma mansoni and Schistosoma japonicum. Non-coding RNA (ncRNA) plays a crucial role in gene expression regulation, cellular function and defense, homeostasis, and pathogenesis. The genome-wide annotation of ncRNAs is a non-trivial task unless well-annotated genomes of closely related species are already available. RESULTS A homology search for structured ncRNA in the genome of S. mansoni resulted in 23 types of ncRNAs with conserved primary and secondary structure. Among these, we identified rRNA, snRNA, SL RNA, SRP, tRNAs and RNase P, and also possibly MRP and 7SK RNAs. In addition, we confirmed five miRNAs that have recently been reported in S. japonicum and found two additional homologs of known miRNAs. The tRNA complement of S. mansoni is comparable to that of the free-living planarian Schmidtea mediterranea, although for some amino acids differences of more than a factor of two are observed: Leu, Ser, and His are overrepresented, while Cys, Meth, and Ile are underrepresented in S. mansoni. On the other hand, the number of tRNAs in the genome of S. japonicum is reduced by more than a factor of four. Both schistosomes have a complete set of minor spliceosomal snRNAs. Several ncRNAs that are expected to exist in the S. mansoni genome were not found, among them the telomerase RNA, vault RNAs, and Y RNAs. CONCLUSION The ncRNA sequences and structures presented here represent the most complete dataset of ncRNA from any lophotrochozoan reported so far. This data set provides an important reference for further analysis of the genomes of schistosomes and indeed eukaryotic genomes at large.
Collapse
Affiliation(s)
- Claudia S Copeland
- Bioinformatics Group, Department of Computer Science and Interdisciplinary Center for Bioinformatics, University of Leipzig, Härtelstrasse 16-18, D-04107 Leipzig, Germany.
| | | | | | | | | | | | | | | | | |
Collapse
|
74
|
Mosig A, Zhu L, Stadler PF. Customized strategies for discovering distant ncRNA homologs. BRIEFINGS IN FUNCTIONAL GENOMICS AND PROTEOMICS 2009; 8:451-60. [PMID: 19779009 DOI: 10.1093/bfgp/elp035] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Abstract
A large fraction of non-coding RNAs is short and/or poorly conserved in sequence. Most of the longer examples, furthermore, consist of a collection of conserved structural motifs rather than a coherent globally conserved secondary structure. As a consequence, the conceptually simple problem of homology search becomes a complex and technically demanding task. Despite the best efforts of databases such as Rfam, the situation is complicated further by the sparsity of information in many--in particular prokaryotic--RNA families. In this contribution, we review recent efforts to customize sequence-based search tools for ncRNA applications. In particular, semi-global alignments and the development of methods for fragmented pattern search have brought significant practical advances. Current developments in this area focus on the integration of fragmented sequence pattern search with search algorithms for secondary structure patterns. We focus here, in particular, on strategies that can be successful in the 'twilight zone' where generic approaches from blast to infernal to start to fail.
Collapse
Affiliation(s)
- Axel Mosig
- Chair of Bioinformatics, Department of Computer Science, University of Leipzig, Härtelstrasse 16-18, D-04107 Leipzig, Germany
| | | | | |
Collapse
|
75
|
Scott MS, Avolio F, Ono M, Lamond AI, Barton GJ. Human miRNA precursors with box H/ACA snoRNA features. PLoS Comput Biol 2009; 5:e1000507. [PMID: 19763159 PMCID: PMC2730528 DOI: 10.1371/journal.pcbi.1000507] [Citation(s) in RCA: 151] [Impact Index Per Article: 10.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2009] [Accepted: 08/14/2009] [Indexed: 12/01/2022] Open
Abstract
MicroRNAs (miRNAs) and small nucleolar RNAs (snoRNAs) are two classes of small non-coding regulatory RNAs, which have been much investigated in recent years. While their respective functions in the cell are distinct, they share interesting genomic similarities, and recent sequencing projects have identified processed forms of snoRNAs that resemble miRNAs. Here, we investigate a possible evolutionary relationship between miRNAs and box H/ACA snoRNAs. A comparison of the genomic locations of reported miRNAs and snoRNAs reveals an overlap of specific members of these classes. To test the hypothesis that some miRNAs might have evolved from snoRNA encoding genomic regions, reported miRNA-encoding regions were scanned for the presence of box H/ACA snoRNA features. Twenty miRNA precursors show significant similarity to H/ACA snoRNAs as predicted by snoGPS. These include molecules predicted to target known ribosomal RNA pseudouridylation sites in vivo for which no guide snoRNA has yet been reported. The predicted folded structures of these twenty H/ACA snoRNA-like miRNA precursors reveal molecules which resemble the structures of known box H/ACA snoRNAs. The genomic regions surrounding these predicted snoRNA-like miRNAs are often similar to regions around snoRNA retroposons, including the presence of transposable elements, target site duplications and poly (A) tails. We further show that the precursors of five H/ACA snoRNA-like miRNAs (miR-151, miR-605, mir-664, miR-215 and miR-140) bind to dyskerin, a specific protein component of functional box H/ACA small nucleolar ribonucleoprotein complexes suggesting that these molecules have retained some H/ACA snoRNA functionality. The detection of small RNA molecules that share features of miRNAs and snoRNAs suggest that these classes of RNA may have an evolutionary relationship. The major functions known for RNA were long believed to be either messenger RNAs, which function as intermediates between genes and proteins, or ribosomal RNAs and transfer RNAs which carry out the translation process. In recent years, however, newly discovered classes of small RNAs have been shown to play important cellular roles. These include microRNAs (miRNAs), which can regulate the production of specific proteins, and small nucleolar RNAs (snoRNAs), which recognise and chemically modify specific sequences in ribosomal RNA. Although miRNAs and snoRNAs are currently believed to be generated by different cellular pathways and to function in different cellular compartments, members of these two types of small RNAs display numerous genomic similarities, and a small number of snoRNAs have been shown to encode miRNAs in several organisms. Here we systematically investigate a possible evolutionary relationship between snoRNAs and miRNAs. Using computational analysis, we identify twenty genomic regions encoding miRNAs with highly significant similarity to snoRNAs, both on the level of their surrounding genomic context as well as their predicted folded structure. A subset of these miRNAs display functional snoRNA characteristics, strengthening the possibility that these miRNA molecules might have evolved from snoRNAs.
Collapse
Affiliation(s)
- Michelle S Scott
- Division of Biological Chemistry and Drug Discovery, College of Life Sciences, University of Dundee, Dundee, United Kingdom.
| | | | | | | | | |
Collapse
|
76
|
Zhang Y, Wang J, Huang S, Zhu X, Liu J, Yang N, Song D, Wu R, Deng W, Skogerbø G, Wang XJ, Chen R, Zhu D. Systematic identification and characterization of chicken (Gallus gallus) ncRNAs. Nucleic Acids Res 2009; 37:6562-74. [PMID: 19720738 PMCID: PMC2770669 DOI: 10.1093/nar/gkp704] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022] Open
Abstract
Recent studies have demonstrated that non-coding RNAs (ncRNAs) play important roles during development and evolution. Chicken, the first genome-sequenced non-mammalian amniote, possesses unique features for developmental and evolutionary studies. However, apart from microRNAs, information on chicken ncRNAs has mainly been obtained from computational predictions without experimental validation. In the present study, we performed a systematic identification of intermediate size ncRNAs (50–500 nt) by ncRNA library construction and identified 125 chicken ncRNAs. Importantly, through the bioinformatics and expression analysis, we found the chicken ncRNAs has several novel features: (i) comparative genomic analysis against 18 sequenced vertebrate genomes revealed that the majority of the newly identified ncRNA candidates is not conserved and most are potentially bird/chicken specific, suggesting that ncRNAs play roles in lineage/species specification during evolution. (ii) The expression pattern analysis of intronic snoRNAs and their host genes suggested the coordinated expression between snoRNAs and their host genes. (iii) Several spatio-temporal specific expression patterns suggest involvement of ncRNAs in tissue development. Together, these findings provide new clues for future functional study of ncRNAs during development and evolution.
Collapse
Affiliation(s)
- Yong Zhang
- National Laboratory of Medical Molecular Biology, Institute of Basic Medical Sciences, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing 100005, China
| | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
77
|
Hiller M, Findeiss S, Lein S, Marz M, Nickel C, Rose D, Schulz C, Backofen R, Prohaska SJ, Reuter G, Stadler PF. Conserved introns reveal novel transcripts in Drosophila melanogaster. Genome Res 2009; 19:1289-300. [PMID: 19458021 DOI: 10.1101/gr.090050.108] [Citation(s) in RCA: 37] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023]
Abstract
Noncoding RNAs that are-like mRNAs-spliced, capped, and polyadenylated have important functions in cellular processes. The inventory of these mRNA-like noncoding RNAs (mlncRNAs), however, is incomplete even in well-studied organisms, and so far, no computational methods exist to predict such RNAs from genomic sequences only. The subclass of these transcripts that is evolutionarily conserved usually has conserved intron positions. We demonstrate here that a genome-wide comparative genomics approach searching for short conserved introns is capable of identifying conserved transcripts with a high specificity. Our approach requires neither an open reading frame nor substantial sequence or secondary structure conservation in the surrounding exons. Thus it identifies spliced transcripts in an unbiased way. After applying our approach to insect genomes, we predict 369 introns outside annotated coding transcripts, of which 131 are confirmed by expressed sequence tags (ESTs) and/or noncoding FlyBase transcripts. Of the remaining 238 novel introns, about half are associated with protein-coding genes-either extending coding or untranslated regions or likely belonging to unannotated coding genes. The remaining 129 introns belong to novel mlncRNAs that are largely unstructured. Using RT-PCR, we verified seven of 12 tested introns in novel mlncRNAs and 11 of 17 introns in novel coding genes. The expression level of all verified mlncRNA transcripts is low but varies during development, which suggests regulation. As conserved introns indicate both purifying selection on the exon-intron structure and conserved expression of the transcript in related species, the novel mlncRNAs are good candidates for functional transcripts.
Collapse
Affiliation(s)
- Michael Hiller
- Bioinformatics Group, Albert-Ludwigs-University Freiburg, 79110 Freiburg, Germany.
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
78
|
Soldà G, Makunin IV, Sezerman OU, Corradin A, Corti G, Guffanti A. An Ariadne's thread to the identification and annotation of noncoding RNAs in eukaryotes. Brief Bioinform 2009; 10:475-89. [PMID: 19383843 DOI: 10.1093/bib/bbp022] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Non-protein coding RNAs (ncRNAs) have emerged as a vast and heterogeneous portion of eukaryotic transcriptomes. Several ncRNA families, either short (<200 nucleotides, nt) or long (>200 nt), have been described and implicated in a variety of biological processes, from translation to gene expression regulation and nuclear trafficking. Most probably, other families are still to be discovered. Computational methods for ncRNA research require different approaches from the ones normally used in the prediction of protein-coding genes. Indeed, primary sequence alone is often insufficient to infer ncRNA functionality, whereas secondary structure and local conservation of portions of the transcript could provide useful information for both the prediction and the functional annotation of ncRNAs. Here we present an overview of computational methods and bioinformatics resources currently available for studying ncRNA genes, introducing the common themes as well as the different approaches required for long and short ncRNA identification and annotation.
Collapse
Affiliation(s)
- Giulia Soldà
- Department of Biology and Genetics for Medical Sciences, University of Milano, 20133 Milan, Italy.
| | | | | | | | | | | |
Collapse
|
79
|
Hertel J, de Jong D, Marz M, Rose D, Tafer H, Tanzer A, Schierwater B, Stadler PF. Non-coding RNA annotation of the genome of Trichoplax adhaerens. Nucleic Acids Res 2009; 37:1602-15. [PMID: 19151082 PMCID: PMC2655684 DOI: 10.1093/nar/gkn1084] [Citation(s) in RCA: 49] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2008] [Revised: 12/22/2008] [Accepted: 12/23/2008] [Indexed: 02/06/2023] Open
Abstract
A detailed annotation of non-protein coding RNAs is typically missing in initial releases of newly sequenced genomes. Here we report on a comprehensive ncRNA annotation of the genome of Trichoplax adhaerens, the presumably most basal metazoan whose genome has been published to-date. Since blast identified only a small fraction of the best-conserved ncRNAs--in particular rRNAs, tRNAs and some snRNAs--we developed a semi-global dynamic programming tool, GotohScan, to increase the sensitivity of the homology search. It successfully identified the full complement of major and minor spliceosomal snRNAs, the genes for RNase P and MRP RNAs, the SRP RNA, as well as several small nucleolar RNAs. We did not find any microRNA candidates homologous to known eumetazoan sequences. Interestingly, most ncRNAs, including the pol-III transcripts, appear as single-copy genes or with very small copy numbers in the Trichoplax genome.
Collapse
Affiliation(s)
- Jana Hertel
- Bioinformatics Group, Department of Computer Science, Interdisciplinary Center for Bioinformatics, University of Leipzig, Härtelstraβe 16-18, D-04107 Leipzig, Division of Ecology and Evolution, Institut für Tierökologie und Zellbiologie, Tierärztliche Hochschule Hannover, Bünteweg 17d, D-30559 Hannover, Germany, Department of Theoretical Chemistry, University of Vienna, Währingerstraβe 17, A-1090 Wien, Austria, Department of Ecology and Evolutionary Biology, Yale University, New Haven, CT 06520, USA, RNomics Group, Fraunhofer Institut für Zelltherapie und Immunologie, Deutscher Platz 5e, D-04103 Leipzig, Germany and Santa Fe Institute, 1399 Hyde Park Rd., Santa Fe, NM 87501, USA
| | - Danielle de Jong
- Bioinformatics Group, Department of Computer Science, Interdisciplinary Center for Bioinformatics, University of Leipzig, Härtelstraβe 16-18, D-04107 Leipzig, Division of Ecology and Evolution, Institut für Tierökologie und Zellbiologie, Tierärztliche Hochschule Hannover, Bünteweg 17d, D-30559 Hannover, Germany, Department of Theoretical Chemistry, University of Vienna, Währingerstraβe 17, A-1090 Wien, Austria, Department of Ecology and Evolutionary Biology, Yale University, New Haven, CT 06520, USA, RNomics Group, Fraunhofer Institut für Zelltherapie und Immunologie, Deutscher Platz 5e, D-04103 Leipzig, Germany and Santa Fe Institute, 1399 Hyde Park Rd., Santa Fe, NM 87501, USA
| | - Manja Marz
- Bioinformatics Group, Department of Computer Science, Interdisciplinary Center for Bioinformatics, University of Leipzig, Härtelstraβe 16-18, D-04107 Leipzig, Division of Ecology and Evolution, Institut für Tierökologie und Zellbiologie, Tierärztliche Hochschule Hannover, Bünteweg 17d, D-30559 Hannover, Germany, Department of Theoretical Chemistry, University of Vienna, Währingerstraβe 17, A-1090 Wien, Austria, Department of Ecology and Evolutionary Biology, Yale University, New Haven, CT 06520, USA, RNomics Group, Fraunhofer Institut für Zelltherapie und Immunologie, Deutscher Platz 5e, D-04103 Leipzig, Germany and Santa Fe Institute, 1399 Hyde Park Rd., Santa Fe, NM 87501, USA
| | - Dominic Rose
- Bioinformatics Group, Department of Computer Science, Interdisciplinary Center for Bioinformatics, University of Leipzig, Härtelstraβe 16-18, D-04107 Leipzig, Division of Ecology and Evolution, Institut für Tierökologie und Zellbiologie, Tierärztliche Hochschule Hannover, Bünteweg 17d, D-30559 Hannover, Germany, Department of Theoretical Chemistry, University of Vienna, Währingerstraβe 17, A-1090 Wien, Austria, Department of Ecology and Evolutionary Biology, Yale University, New Haven, CT 06520, USA, RNomics Group, Fraunhofer Institut für Zelltherapie und Immunologie, Deutscher Platz 5e, D-04103 Leipzig, Germany and Santa Fe Institute, 1399 Hyde Park Rd., Santa Fe, NM 87501, USA
| | - Hakim Tafer
- Bioinformatics Group, Department of Computer Science, Interdisciplinary Center for Bioinformatics, University of Leipzig, Härtelstraβe 16-18, D-04107 Leipzig, Division of Ecology and Evolution, Institut für Tierökologie und Zellbiologie, Tierärztliche Hochschule Hannover, Bünteweg 17d, D-30559 Hannover, Germany, Department of Theoretical Chemistry, University of Vienna, Währingerstraβe 17, A-1090 Wien, Austria, Department of Ecology and Evolutionary Biology, Yale University, New Haven, CT 06520, USA, RNomics Group, Fraunhofer Institut für Zelltherapie und Immunologie, Deutscher Platz 5e, D-04103 Leipzig, Germany and Santa Fe Institute, 1399 Hyde Park Rd., Santa Fe, NM 87501, USA
| | - Andrea Tanzer
- Bioinformatics Group, Department of Computer Science, Interdisciplinary Center for Bioinformatics, University of Leipzig, Härtelstraβe 16-18, D-04107 Leipzig, Division of Ecology and Evolution, Institut für Tierökologie und Zellbiologie, Tierärztliche Hochschule Hannover, Bünteweg 17d, D-30559 Hannover, Germany, Department of Theoretical Chemistry, University of Vienna, Währingerstraβe 17, A-1090 Wien, Austria, Department of Ecology and Evolutionary Biology, Yale University, New Haven, CT 06520, USA, RNomics Group, Fraunhofer Institut für Zelltherapie und Immunologie, Deutscher Platz 5e, D-04103 Leipzig, Germany and Santa Fe Institute, 1399 Hyde Park Rd., Santa Fe, NM 87501, USA
| | - Bernd Schierwater
- Bioinformatics Group, Department of Computer Science, Interdisciplinary Center for Bioinformatics, University of Leipzig, Härtelstraβe 16-18, D-04107 Leipzig, Division of Ecology and Evolution, Institut für Tierökologie und Zellbiologie, Tierärztliche Hochschule Hannover, Bünteweg 17d, D-30559 Hannover, Germany, Department of Theoretical Chemistry, University of Vienna, Währingerstraβe 17, A-1090 Wien, Austria, Department of Ecology and Evolutionary Biology, Yale University, New Haven, CT 06520, USA, RNomics Group, Fraunhofer Institut für Zelltherapie und Immunologie, Deutscher Platz 5e, D-04103 Leipzig, Germany and Santa Fe Institute, 1399 Hyde Park Rd., Santa Fe, NM 87501, USA
| | - Peter F. Stadler
- Bioinformatics Group, Department of Computer Science, Interdisciplinary Center for Bioinformatics, University of Leipzig, Härtelstraβe 16-18, D-04107 Leipzig, Division of Ecology and Evolution, Institut für Tierökologie und Zellbiologie, Tierärztliche Hochschule Hannover, Bünteweg 17d, D-30559 Hannover, Germany, Department of Theoretical Chemistry, University of Vienna, Währingerstraβe 17, A-1090 Wien, Austria, Department of Ecology and Evolutionary Biology, Yale University, New Haven, CT 06520, USA, RNomics Group, Fraunhofer Institut für Zelltherapie und Immunologie, Deutscher Platz 5e, D-04103 Leipzig, Germany and Santa Fe Institute, 1399 Hyde Park Rd., Santa Fe, NM 87501, USA
| |
Collapse
|
80
|
Rose D, Jöris J, Hackermüller J, Reiche K, Li Q, Stadler PF. Duplicated RNA genes in teleost fish genomes. J Bioinform Comput Biol 2009; 6:1157-75. [PMID: 19090022 DOI: 10.1142/s0219720008003886] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2007] [Revised: 06/17/2008] [Accepted: 06/18/2008] [Indexed: 12/29/2022]
Abstract
Teleost fishes share a duplication of their entire genomes. We report here on a computational survey of structured non-coding RNAs (ncRNAs) in teleost genomes, focusing on the fate of fish-specific duplicates. As in other metazoan groups, we find evidence of a large number (11,543) of structured RNAs, most of which (~86%) are clade-specific or evolve so fast that their tetrapod homologs cannot be detected. In surprising contrast to protein-coding genes, the fish-specific genome duplication did not lead to a large number of paralogous ncRNAs: only 188 candidates, mostly microRNAs, appear in a larger copy number in teleosts than in tetrapods, suggesting that large-scale gene duplications do not play a major role in the expansion of the vertebrate ncRNA inventory.
Collapse
Affiliation(s)
- Dominic Rose
- Bioinformatics Group, Department of Computer Science, Interdisciplinary Center for Bioinformatics, University of Leipzig, Leipzig, Germany.
| | | | | | | | | | | |
Collapse
|
81
|
Zou Q, Zhao T, Liu Y, Guo M. Predicting RNA secondary structure based on the class information and Hopfield network. Comput Biol Med 2009; 39:206-14. [DOI: 10.1016/j.compbiomed.2008.12.010] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2008] [Revised: 10/28/2008] [Accepted: 12/16/2008] [Indexed: 11/24/2022]
|
82
|
Morita K, Saito Y, Sato K, Oka K, Hotta K, Sakakibara Y. Genome-wide searching with base-pairing kernel functions for noncoding RNAs: computational and expression analysis of snoRNA families in Caenorhabditis elegans. Nucleic Acids Res 2009; 37:999-1009. [PMID: 19129214 PMCID: PMC2647286 DOI: 10.1093/nar/gkn1054] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Despite the accumulating research on noncoding RNAs (ncRNAs), it is likely that we are seeing only the tip of the iceberg regarding our understanding of the functions and the regulatory roles served by ncRNAs in cellular metabolism, pathogenesis and host-pathogen interactions. Therefore, more powerful computational and experimental tools for analyzing ncRNAs need to be developed. To this end, we propose novel kernel functions, called base-pairing profile local alignment (BPLA) kernels, for analyzing functional ncRNA sequences using support vector machines (SVMs). We extend the local alignment kernels for amino acid sequences in order to handle RNA sequences by using STRAL's; scoring function, which takes into account sequence similarities as well as upstream and downstream base-pairing probabilities, thus enabling us to model secondary structures of RNA sequences. As a test of the performance of BPLA kernels, we applied our kernels to the problem of discriminating members of an RNA family from nonmembers using SVMs. The results indicated that the discrimination ability of our kernels is stronger than that of other existing methods. Furthermore, we demonstrated the applicability of our kernels to the problem of genome-wide search of snoRNA families in the Caenorhabditis elegans genome, and confirmed that the expression is valid in 14 out of 48 of our predicted candidates by using qRT-PCR. Finally, highly expressed six candidates were identified as the original target regions by DNA sequencing.
Collapse
Affiliation(s)
- Kensuke Morita
- Department of Biosciences and Informatics, Keio University, 3-14-1 Hiyoshi, Kohoku-ku, Yokohama, Kanagawa 223-8522, Japan
| | | | | | | | | | | |
Collapse
|
83
|
Kavanaugh LA, Dietrich FS. Non-coding RNA prediction and verification in Saccharomyces cerevisiae. PLoS Genet 2009; 5:e1000321. [PMID: 19119416 PMCID: PMC2603021 DOI: 10.1371/journal.pgen.1000321] [Citation(s) in RCA: 28] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2008] [Accepted: 12/01/2008] [Indexed: 11/18/2022] Open
Abstract
Non-coding RNA (ncRNA) play an important and varied role in cellular function. A significant amount of research has been devoted to computational prediction of these genes from genomic sequence, but the ability to do so has remained elusive due to a lack of apparent genomic features. In this work, thermodynamic stability of ncRNA structural elements, as summarized in a Z-score, is used to predict ncRNA in the yeast Saccharomyces cerevisiae. This analysis was coupled with comparative genomics to search for ncRNA genes on chromosome six of S. cerevisiae and S. bayanus. Sets of positive and negative control genes were evaluated to determine the efficacy of thermodynamic stability for discriminating ncRNA from background sequence. The effect of window sizes and step sizes on the sensitivity of ncRNA identification was also explored. Non-coding RNA gene candidates, common to both S. cerevisiae and S. bayanus, were verified using northern blot analysis, rapid amplification of cDNA ends (RACE), and publicly available cDNA library data. Four ncRNA transcripts are well supported by experimental data (RUF10, RUF11, RUF12, RUF13), while one additional putative ncRNA transcript is well supported but the data are not entirely conclusive. Six candidates appear to be structural elements in 5′ or 3′ untranslated regions of annotated protein-coding genes. This work shows that thermodynamic stability, coupled with comparative genomics, can be used to predict ncRNA with significant structural elements. Recent advances in DNA sequence technology have made it possible to sequence entire genomes. Once a genome is sequenced, it becomes necessary to identify the set of genes and other functional elements within the genome. This is particularly challenging as much of the genomic sequence does not appear to perform any function and is loosely referred to as “junk.” Identifying functional elements among the “junk” is difficult. Experimental methods have been developed for this purpose but they are time-consuming, expensive, and often provide an incomplete picture. Thus, it is important to develop the ability to identify these functional elements using computational methods. Protein-coding genes are relatively easy to identify computationally, but other categories of functional elements present a significantly greater challenge. In this work, we used a computational approach to identify genes that do not encode for a protein but rather function as an RNA molecule. We then used experimental methods to verify our predictions and thereby validate the computational method.
Collapse
Affiliation(s)
- Laura A. Kavanaugh
- Department of Molecular Genetics and Microbiology, Institute for Genome Sciences and Policy, Duke University Medical Center, Durham, North Carolina, United States of America
| | - Fred S. Dietrich
- Department of Molecular Genetics and Microbiology, Institute for Genome Sciences and Policy, Duke University Medical Center, Durham, North Carolina, United States of America
- * E-mail:
| |
Collapse
|
84
|
|
85
|
Myslyuk I, Doniger T, Horesh Y, Hury A, Hoffer R, Ziporen Y, Michaeli S, Unger R. Psiscan: a computational approach to identify H/ACA-like and AGA-like non-coding RNA in trypanosomatid genomes. BMC Bioinformatics 2008; 9:471. [PMID: 18986541 PMCID: PMC2613932 DOI: 10.1186/1471-2105-9-471] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2008] [Accepted: 11/05/2008] [Indexed: 11/12/2022] Open
Abstract
Background Detection of non coding RNA (ncRNA) molecules is a major bioinformatics challenge. This challenge is particularly difficult when attempting to detect H/ACA molecules which are involved in converting uridine to pseudouridine on rRNA in trypanosomes, because these organisms have unique H/ACA molecules (termed H/ACA-like) that lack several of the features that characterize H/ACA molecules in most other organisms. Results We present here a computational tool called Psiscan, which was designed to detect H/ACA-like molecules in trypanosomes. We started by analyzing known H/ACA-like molecules and characterized their crucial elements both computationally and experimentally. Next, we set up constraints based on this analysis and additional phylogenic and functional data to rapidly scan three trypanosome genomes (T. brucei, T. cruzi and L. major) for sequences that observe these constraints and are conserved among the species. In the next step, we used minimal energy calculation to select the molecules that are predicted to fold into a lowest energy structure that is consistent with the constraints. In the final computational step, we used a Support Vector Machine that was trained on known H/ACA-like molecules as positive examples and on negative examples of molecules that were identified by the computational analyses but were shown experimentally not to be H/ACA-like molecules. The leading candidate molecules predicted by the SVM model were then subjected to experimental validation. Conclusion The experimental validation showed 11 molecules to be expressed (4 out of 25 in the intermediate stage and 7 out of 19 in the final validation after the machine learning stage). Five of these 11 molecules were further shown to be bona fide H/ACA-like molecules. As snoRNA in trypanosomes are organized in clusters, the new H/ACA-like molecules could be used as starting points to manually search for additional molecules in their neighbourhood. All together this study increased our repertoire by fourteen H/ACA-like and six C/D snoRNAs molecules from T. brucei and L. Major. In addition the experimental analysis revealed that six ncRNA molecules that are expressed are not downregulated in CBF5 silenced cells, suggesting that they have structural features of H/ACA-like molecules but do not have their standard function. We termed this novel class of molecules AGA-like, and we are exploring their function. This study demonstrates the power of tight collaboration between computational and experimental approaches in a combined effort to reveal the repertoire of ncRNA molecles.
Collapse
Affiliation(s)
- Inna Myslyuk
- Faculty of Life Science, Bar-Ilan University, Ramat-Gan, Israel.
| | | | | | | | | | | | | | | |
Collapse
|
86
|
Seemann SE, Gorodkin J, Backofen R. Unifying evolutionary and thermodynamic information for RNA folding of multiple alignments. Nucleic Acids Res 2008; 36:6355-62. [PMID: 18836192 PMCID: PMC2582601 DOI: 10.1093/nar/gkn544] [Citation(s) in RCA: 65] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Computational methods for determining the secondary structure of RNA sequences from given alignments are currently either based on thermodynamic folding, compensatory base pair substitutions or both. However, there is currently no approach that combines both sources of information in a single optimization problem. Here, we present a model that formally integrates both the energy-based and evolution-based approaches to predict the folding of multiple aligned RNA sequences. We have implemented an extended version of Pfold that identifies base pairs that have high probabilities of being conserved and of being energetically favorable. The consensus structure is predicted using a maximum expected accuracy scoring scheme to smoothen the effect of incorrectly predicted base pairs. Parameter tuning revealed that the probability of base pairing has a higher impact on the RNA structure prediction than the corresponding probability of being single stranded. Furthermore, we found that structurally conserved RNA motifs are mostly supported by folding energies. Other problems (e.g. RNA-folding kinetics) may also benefit from employing the principles of the model we introduce. Our implementation, PETfold, was tested on a set of 46 well-curated Rfam families and its performance compared favorably to that of Pfold and RNAalifold.
Collapse
Affiliation(s)
- Stefan E Seemann
- Division of Genetics and Bioinformatics, IBHV and Center for Applied Bioinformatics, University of Copenhagen, Groennegårdsvej 3, DK-1870 Frederiksberg C, Denmark
| | | | | |
Collapse
|
87
|
Abstract
Comparative genomics is a powerful tool for gaining insight into genomic function and evolution. However, in plants, sequence data that would enable detailed comparisons of both coding and noncoding regions have been limited in availability. Here we report the generation and analysis of sequences for an unduplicated conserved syntenic segment (CSS) in the genomes of five members of the agriculturally important plant family Solanaceae. This CSS includes a 105-kb region of tomato chromosome 2 and orthologous regions of the potato, eggplant, pepper, and petunia genomes. With a total neutral divergence of 0.73-0.78 substitutions/site, these sequences are similar enough that most noncoding regions can be aligned, yet divergent enough to be informative about evolutionary dynamics and selective pressures. The CSS contains 17 distinct genes with generally conserved order and orientation, but with numerous small-scale differences between species. Our analysis indicates that the last common ancestor of these species lived approximately 27-36 million years ago, that more than one-third of short genomic segments (5-15 bp) are under selection, and that more than two-thirds of selected bases fall in noncoding regions. In addition, we identify genes under positive selection and analyze hundreds of conserved noncoding elements. This analysis provides a window into 30 million years of plant evolution in the absence of polyploidization.
Collapse
|
88
|
Sridhar P, Gan HH, Schlick T. A computational screen for C/D box snoRNAs in the human genomic region associated with Prader-Willi and Angelman syndromes. J Biomed Sci 2008; 15:697-705. [PMID: 18661287 DOI: 10.1007/s11373-008-9271-x] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2008] [Accepted: 07/10/2008] [Indexed: 11/29/2022] Open
Abstract
Small nucleolar RNAs (snoRNAs) play a significant role in Prader-Willi Syndrome (PWS) and Angelman Syndrome (AS), which are genomic disorders resulting from deletions in the human chromosomal region 15q11-q13. To identify snoRNAs in the region, our computational study employs key motif features of C/D box snoRNAs and introduces a complementary RNA-RNA hybridization test. We identify three previously unknown methylation guide snoRNAs targeting ribosomal 18S and 28S RNAs, and two snoRNAs targeting serotonin receptor 2C mRNA. We show that the three snoRNA candidates likely possess methylation strands complementary to, and form stable complexes with, human ribosomal RNAs. Our screen also identifies 8 other snoRNA candidates that do not pass the rRNA-complementarity and/or hybridization tests. Two of these candidates have extensive sequence similarity to HBII-52, a snoRNA that regulates the alternative splicing of serotonin receptor 2C mRNA. Six out of our eleven candidate snoRNAs are also predicted by other existing methods.
Collapse
Affiliation(s)
- Padmavati Sridhar
- Department of Chemistry, New York University, 100 Washington Square East, New York, NY 10003, USA
| | | | | |
Collapse
|
89
|
Sato K, Mituyama T, Asai K, Sakakibara Y. Directed acyclic graph kernels for structural RNA analysis. BMC Bioinformatics 2008; 9:318. [PMID: 18647390 PMCID: PMC2515856 DOI: 10.1186/1471-2105-9-318] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2008] [Accepted: 07/22/2008] [Indexed: 11/10/2022] Open
Abstract
Background Recent discoveries of a large variety of important roles for non-coding RNAs (ncRNAs) have been reported by numerous researchers. In order to analyze ncRNAs by kernel methods including support vector machines, we propose stem kernels as an extension of string kernels for measuring the similarities between two RNA sequences from the viewpoint of secondary structures. However, applying stem kernels directly to large data sets of ncRNAs is impractical due to their computational complexity. Results We have developed a new technique based on directed acyclic graphs (DAGs) derived from base-pairing probability matrices of RNA sequences that significantly increases the computation speed of stem kernels. Furthermore, we propose profile-profile stem kernels for multiple alignments of RNA sequences which utilize base-pairing probability matrices for multiple alignments instead of those for individual sequences. Our kernels outperformed the existing methods with respect to the detection of known ncRNAs and kernel hierarchical clustering. Conclusion Stem kernels can be utilized as a reliable similarity measure of structural RNAs, and can be used in various kernel-based applications.
Collapse
Affiliation(s)
- Kengo Sato
- Japan Biological Informatics Consortium (JBIC), 2-45 Aomi, Koto-ku, Tokyo 135-8073, Japan.
| | | | | | | |
Collapse
|
90
|
Freyhult E, Edvardsson S, Tamas I, Moulton V, Poole AM. Fisher: a program for the detection of H/ACA snoRNAs using MFE secondary structure prediction and comparative genomics - assessment and update. BMC Res Notes 2008; 1:49. [PMID: 18710502 PMCID: PMC2551606 DOI: 10.1186/1756-0500-1-49] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2008] [Accepted: 07/21/2008] [Indexed: 11/25/2022] Open
Abstract
Background The H/ACA family of small nucleolar RNAs (snoRNAs) plays a central role in guiding the pseudouridylation of ribosomal RNA (rRNA). In an effort to systematically identify the complete set of rRNA-modifying H/ACA snoRNAs from the genome sequence of the budding yeast, Saccharomyces cerevisiae, we developed a program – Fisher – and previously presented several candidate snoRNAs based on our analysis [1]. Findings In this report, we provide a brief update of this work, which was aborted after the publication of experimentally-identified snoRNAs [2] identical to candidates we had identified bioinformatically using Fisher. Our motivation for revisiting this work is to report on the status of the candidate snoRNAs described in [1], and secondly, to report that a modified version of Fisher together with the available multiple yeast genome sequences was able to correctly identify several H/ACA snoRNAs for modification sites not identified by the snoGPS program [3]. While we are no longer developing Fisher, we briefly consider the merits of the Fisher algorithm relative to snoGPS, which may be of use for workers considering pursuing a similar search strategy for the identification of small RNAs. The modified source code for Fisher is made available as supplementary material. Conclusion Our results confirm the validity of using minimum free energy (MFE) secondary structure prediction to guide comparative genomic screening for RNA families with few sequence constraints.
Collapse
Affiliation(s)
- Eva Freyhult
- Linnaeus Centre for Bioinformatics, Uppsala University, Box 598, S-751, 24 Uppsala, Sweden; Department of Clinical Microbiology, Clinical Bacteriology, Umeå University, 901 85 Umeå, Sweden.
| | | | | | | | | |
Collapse
|
91
|
Jöchl C, Rederstorff M, Hertel J, Stadler PF, Hofacker IL, Schrettl M, Haas H, Hüttenhofer A. Small ncRNA transcriptome analysis from Aspergillus fumigatus suggests a novel mechanism for regulation of protein synthesis. Nucleic Acids Res 2008; 36:2677-89. [PMID: 18346967 PMCID: PMC2377427 DOI: 10.1093/nar/gkn123] [Citation(s) in RCA: 133] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023] Open
Abstract
Small non-protein-coding RNAs (ncRNAs) have systematically been studied in various model organisms from Escherichia coli to Homo sapiens. Here, we analyse the small ncRNA transcriptome from the pathogenic filamentous fungus Aspergillus fumigatus. To that aim, we experimentally screened for ncRNAs, expressed under various growth conditions or during specific developmental stages, by generating a specialized cDNA library from size-selected small RNA species. Our screen revealed 30 novel ncRNA candidates from known ncRNA classes such as small nuclear RNAs (snRNAs) and C/D box-type small nucleolar RNAs (C/D box snoRNAs). Additionally, several candidates for H/ACA box snoRNAs could be predicted by a bioinformatical screen. We also identified 15 candidates for ncRNAs, which could not be assigned to any known ncRNA class. Some of these ncRNA species are developmentally regulated implying a possible novel function in A. fumigatus development. Surprisingly, in addition to full-length tRNAs, we also identified 5′- or 3′-halves of tRNAs, only, which are likely generated by tRNA cleavage within the anti-codon loop. We show that conidiation induces tRNA cleavage resulting in tRNA depletion within conidia. Since conidia represent the resting state of A. fumigatus we propose that conidial tRNA depletion might be a novel mechanism to down-regulate protein synthesis in a filamentous fungus.
Collapse
Affiliation(s)
- Christoph Jöchl
- Innsbruck Biocenter, Division of Genomics and RNomics - Innsbruck Medical University, Fritz-Pregl-Strasse 3, 6020 Innsbruck, Austria
| | | | | | | | | | | | | | | |
Collapse
|
92
|
Rose D, Hackermüller J, Washietl S, Reiche K, Hertel J, Findeiß S, Stadler PF, Prohaska SJ. Computational RNomics of drosophilids. BMC Genomics 2007; 8:406. [PMID: 17996037 PMCID: PMC2216035 DOI: 10.1186/1471-2164-8-406] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2007] [Accepted: 11/08/2007] [Indexed: 11/11/2022] Open
Abstract
BACKGROUND Recent experimental and computational studies have provided overwhelming evidence for a plethora of diverse transcripts that are unrelated to protein-coding genes. One subclass consists of those RNAs that require distinctive secondary structure motifs to exert their biological function and hence exhibit distinctive patterns of sequence conservation characteristic for positive selection on RNA secondary structure. The deep-sequencing of 12 drosophilid species coordinated by the NHGRI provides an ideal data set of comparative computational approaches to determine those genomic loci that code for evolutionarily conserved RNA motifs. This class of loci includes the majority of the known small ncRNAs as well as structured RNA motifs in mRNAs. We report here on a genome-wide survey using RNAz. RESULTS We obtain 16 000 high quality predictions among which we recover the majority of the known ncRNAs. Taking a pessimistically estimated false discovery rate of 40% into account, this implies that at least some ten thousand loci in the Drosophila genome show the hallmarks of stabilizing selection action of RNA structure, and hence are most likely functional at the RNA level. A subset of RNAz predictions overlapping with TRF1 and BRF binding sites [Isogai et al., EMBO J. 26: 79-89 (2007)], which are plausible candidates of Pol III transcripts, have been studied in more detail. Among these sequences we identify several "clusters" of ncRNA candidates with striking structural similarities. CONCLUSION The statistical evaluation of the RNAz predictions in comparison with a similar analysis of vertebrate genomes [Washietl et al., Nat. Biotech. 23: 1383-1390 (2005)] shows that qualitatively similar fractions of structured RNAs are found in introns, UTRs, and intergenic regions. The intergenic RNA structures, however, are concentrated much more closely around known protein-coding loci, suggesting that flies have significantly smaller complement of independent structured ncRNAs compared to mammals.
Collapse
Affiliation(s)
- Dominic Rose
- Bioinformatics Group, Department of Computer Science, University of Leipzig, Härtelstraße 16-18, Leipzig, Germany, D-04107
| | - Jörg Hackermüller
- Fraunhofer Institute for Cell Therapy and Immunology, Deutscher Platz 5e, Leipzig, Germany, D-04103
- Interdisciplinary Center for Bioinformatics, University of Leipzig, Härtelstraße 16-18, Leipzig, Germany, D-04107
| | - Stefan Washietl
- Department of Theoretical Chemistry, University of Vienna, Währingerstraße 17,Wien, Austria, A-1090
| | - Kristin Reiche
- Bioinformatics Group, Department of Computer Science, University of Leipzig, Härtelstraße 16-18, Leipzig, Germany, D-04107
| | - Jana Hertel
- Bioinformatics Group, Department of Computer Science, University of Leipzig, Härtelstraße 16-18, Leipzig, Germany, D-04107
- Department of Theoretical Chemistry, University of Vienna, Währingerstraße 17,Wien, Austria, A-1090
| | - Sven Findeiß
- Bioinformatics Group, Department of Computer Science, University of Leipzig, Härtelstraße 16-18, Leipzig, Germany, D-04107
| | - Peter F Stadler
- Bioinformatics Group, Department of Computer Science, University of Leipzig, Härtelstraße 16-18, Leipzig, Germany, D-04107
- Fraunhofer Institute for Cell Therapy and Immunology, Deutscher Platz 5e, Leipzig, Germany, D-04103
- Interdisciplinary Center for Bioinformatics, University of Leipzig, Härtelstraße 16-18, Leipzig, Germany, D-04107
- Department of Theoretical Chemistry, University of Vienna, Währingerstraße 17,Wien, Austria, A-1090
- Santa Fe Institute,1399 Hyde Park Rd., Santa Fe, USA, NM 87501
| | - Sonja J Prohaska
- Biomedical Informatics, Arizona State University, Tempe, PO-Box 878809, USA, AZ 85287
| |
Collapse
|
93
|
Progressive multiple sequence alignments from triplets. BMC Bioinformatics 2007; 8:254. [PMID: 17631683 PMCID: PMC1948021 DOI: 10.1186/1471-2105-8-254] [Citation(s) in RCA: 16] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2006] [Accepted: 07/15/2007] [Indexed: 11/27/2022] Open
Abstract
Background The quality of progressive sequence alignments strongly depends on the accuracy of the individual pairwise alignment steps since gaps that are introduced at one step cannot be removed at later aggregation steps. Adjacent insertions and deletions necessarily appear in arbitrary order in pairwise alignments and hence form an unavoidable source of errors. Research Here we present a modified variant of progressive sequence alignments that addresses both issues. Instead of pairwise alignments we use exact dynamic programming to align sequence or profile triples. This avoids a large fractions of the ambiguities arising in pairwise alignments. In the subsequent aggregation steps we follow the logic of the Neighbor-Net algorithm, which constructs a phylogenetic network by step-wisely replacing triples by pairs instead of combining pairs to singletons. To this end the three-way alignments are subdivided into two partial alignments, at which stage all-gap columns are naturally removed. This alleviates the "once a gap, always a gap" problem of progressive alignment procedures. Conclusion The three-way Neighbor-Net based alignment program aln3nn is shown to compare favorably on both protein sequences and nucleic acids sequences to other progressive alignment tools. In the latter case one easily can include scoring terms that consider secondary structure features. Overall, the quality of resulting alignments in general exceeds that of clustalw or other multiple alignments tools even though our software does not included heuristics for context dependent (mis)match scores.
Collapse
|