1
|
Liu X, Wang Y, Liu Z, Kang Y, Ma F, Luo Z, Wang J, Huang J. miR-434 and miR-242 have a potential role in heat stress response in rainbow trout (Oncorhynchus mykiss). JOURNAL OF FISH BIOLOGY 2021; 99:1798-1803. [PMID: 34405404 DOI: 10.1111/jfb.14881] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/14/2021] [Revised: 08/13/2021] [Accepted: 08/13/2021] [Indexed: 06/13/2023]
Abstract
MicroRNAs (miRNAs) are being extensively studied as they function as key metabolic regulators which play a role in the heat stress response. However, the role of miRNAs in heat stress remains uncertain and many new miRNAs have not yet been discovered. In a previous study, we performed high-throughput sequencing of differentially expressed miRNAs identified on exposing rainbow trout (Oncorhynchus mykiss) to heat stress (18 vs. 24°C), which led to the identification of two novel miRNAs, temporarily named novel miR-434 and -242. The differential expression level of these miRNAs was extremely significant (P < 0.01); we analysed target gene mRNA transcripts by bioinformatics software (miRanda). We found novel miR-434 and -242 were predicted to regulate the transcripts of heat shock 70-kDa protein 4-like (HSPA4L) and calreticulin (CRT), respectively, by bioinformatics software. Here our core objective was to validate if HSPA4L and CRT are indeed the target genes of novel miR-434 and -242, respectively, and for this purpose we used the dual-luciferase reporter assay system. Target gene sequences were synthesized and cloned into a dual-luciferase vector. To better understand the function of the target genes, we combined the previous sequencing results of Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis. We found that novel miR-434 regulated HSPA4L expression by binding to a putative binding site in the 3'-UTR of HSPA4L, and luciferase activity inhibition was observed. In contrast, novel miR-242 was not involved in regulating CRT expression. To conclude, we believe our results should serve as a foundation for future studies aiming to comprehensively understand the mechanisms used by rainbow trout to cope with heat stress.
Collapse
Affiliation(s)
- Xiaoxia Liu
- College of Animal Science and Technology, Gansu Agricultural University, Lanzhou, China
| | - Yongjie Wang
- College of Animal Science and Technology, Gansu Agricultural University, Lanzhou, China
| | - Zhe Liu
- College of Animal Science and Technology, Gansu Agricultural University, Lanzhou, China
| | - Yujun Kang
- College of Animal Science and Technology, Gansu Agricultural University, Lanzhou, China
| | - Fang Ma
- College of Animal Science and Technology, Gansu Agricultural University, Lanzhou, China
| | - Zhicheng Luo
- College of Animal Science and Technology, Gansu Agricultural University, Lanzhou, China
| | - Jianfu Wang
- College of Animal Science and Technology, Gansu Agricultural University, Lanzhou, China
| | - Jinqiang Huang
- College of Animal Science and Technology, Gansu Agricultural University, Lanzhou, China
| |
Collapse
|
2
|
Tong X, Liu S. CPPred: coding potential prediction based on the global description of RNA sequence. Nucleic Acids Res 2019; 47:e43. [PMID: 30753596 PMCID: PMC6486542 DOI: 10.1093/nar/gkz087] [Citation(s) in RCA: 47] [Impact Index Per Article: 9.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2018] [Revised: 01/26/2019] [Accepted: 02/01/2019] [Indexed: 11/12/2022] Open
Abstract
The rapid and accurate approach to distinguish between coding RNAs and ncRNAs has been playing a critical role in analyzing thousands of novel transcripts, which have been generated in recent years by next-generation sequencing technology. Previously developed methods CPAT, CPC2 and PLEK can distinguish coding RNAs and ncRNAs very well, but poorly distinguish between small coding RNAs and small ncRNAs. Herein, we report an approach, CPPred (coding potential prediction), which is based on SVM classifier and multiple sequence features including novel RNA features encoded by the global description. The CPPred can better distinguish not only between coding RNAs and ncRNAs, but also between small coding RNAs and small ncRNAs than the state-of-the-art methods due to the addition of the novel RNA features. A recent study proposes 1335 novel human coding RNAs from a large number of RNA-seq datasets. However, only 119 transcripts are predicted as coding RNAs by the CPPred. In fact, almost all proposed novel coding RNAs are ncRNAs (91.1%), which is consistent with previous reports. Remarkably, we also reveal that the global description of encoding features (T2, C0 and GC) plays an important role in the prediction of coding potential.
Collapse
Affiliation(s)
- Xiaoxue Tong
- School of Physics, Huazhong University of Science and Technology, Wuhan, Hubei 430074, China
| | - Shiyong Liu
- School of Physics, Huazhong University of Science and Technology, Wuhan, Hubei 430074, China
| |
Collapse
|
3
|
Arias-Carrasco R, Vásquez-Morán Y, Nakaya HI, Maracaja-Coutinho V. StructRNAfinder: an automated pipeline and web server for RNA families prediction. BMC Bioinformatics 2018; 19:55. [PMID: 29454313 PMCID: PMC5816368 DOI: 10.1186/s12859-018-2052-2] [Citation(s) in RCA: 25] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2017] [Accepted: 02/02/2018] [Indexed: 01/11/2023] Open
Abstract
Background The function of many noncoding RNAs (ncRNAs) depend upon their secondary structures. Over the last decades, several methodologies have been developed to predict such structures or to use them to functionally annotate RNAs into RNA families. However, to fully perform this analysis, researchers should utilize multiple tools, which require the constant parsing and processing of several intermediate files. This makes the large-scale prediction and annotation of RNAs a daunting task even to researchers with good computational or bioinformatics skills. Results We present an automated pipeline named StructRNAfinder that predicts and annotates RNA families in transcript or genome sequences. This single tool not only displays the sequence/structural consensus alignments for each RNA family, according to Rfam database but also provides a taxonomic overview for each assigned functional RNA. Moreover, we implemented a user-friendly web service that allows researchers to upload their own nucleotide sequences in order to perform the whole analysis. Finally, we provided a stand-alone version of StructRNAfinder to be used in large-scale projects. The tool was developed under GNU General Public License (GPLv3) and is freely available at http://structrnafinder.integrativebioinformatics.me. Conclusions The main advantage of StructRNAfinder relies on the large-scale processing and integrating the data obtained by each tool and database employed along the workflow, of which several files are generated and displayed in user-friendly reports, useful for downstream analyses and data exploration. Electronic supplementary material The online version of this article (10.1186/s12859-018-2052-2) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Raúl Arias-Carrasco
- Centro de Genómica y Bioinformática, Facultad de Ciencias, Universidad Mayor, 8580745, Santiago, Chile.,Programa de Doctorado en Genómica Integrativa, Vicerrectoría de Investigación, Universidad Mayor, 8580745, Santiago, Chile
| | - Yessenia Vásquez-Morán
- Centro de Genómica y Bioinformática, Facultad de Ciencias, Universidad Mayor, 8580745, Santiago, Chile
| | - Helder I Nakaya
- Faculdade de Ciências Farmacêuticas, Universidade de São Paulo, São Paulo, 05508-900, Brazil.
| | - Vinicius Maracaja-Coutinho
- Centro de Genómica y Bioinformática, Facultad de Ciencias, Universidad Mayor, 8580745, Santiago, Chile. .,Instituto Vandique, João Pessoa, 58000-000, Brazil. .,Beagle Bioinformatics, 8320000, Santiago, Chile. .,Advanced Center for Chronic Diseases (ACCDiS), Facultad de Ciencias Químicas y Farmacéuticas, Universidad de Chile, 8380492, Santiago, Chile.
| |
Collapse
|
4
|
A Review on Recent Computational Methods for Predicting Noncoding RNAs. BIOMED RESEARCH INTERNATIONAL 2017; 2017:9139504. [PMID: 28553651 PMCID: PMC5434267 DOI: 10.1155/2017/9139504] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/29/2016] [Revised: 02/06/2017] [Accepted: 02/15/2017] [Indexed: 12/20/2022]
Abstract
Noncoding RNAs (ncRNAs) play important roles in various cellular activities and diseases. In this paper, we presented a comprehensive review on computational methods for ncRNA prediction, which are generally grouped into four categories: (1) homology-based methods, that is, comparative methods involving evolutionarily conserved RNA sequences and structures, (2) de novo methods using RNA sequence and structure features, (3) transcriptional sequencing and assembling based methods, that is, methods designed for single and pair-ended reads generated from next-generation RNA sequencing, and (4) RNA family specific methods, for example, methods specific for microRNAs and long noncoding RNAs. In the end, we summarized the advantages and limitations of these methods and pointed out a few possible future directions for ncRNA prediction. In conclusion, many computational methods have been demonstrated to be effective in predicting ncRNAs for further experimental validation. They are critical in reducing the huge number of potential ncRNAs and pointing the community to high confidence candidates. In the future, high efficient mapping technology and more intrinsic sequence features (e.g., motif and k-mer frequencies) and structure features (e.g., minimum free energy, conserved stem-loop, or graph structures) are suggested to be combined with the next- and third-generation sequencing platforms to improve ncRNA prediction.
Collapse
|
5
|
Abstract
In order to carry out biological functions, RNA molecules must fold into specific three-dimensional (3D) structures. Current experimental methods to determine RNA 3D structures are expensive and time consuming. With the recent advances in computational biology, RNA structure prediction is becoming increasingly reliable. This chapter describes a recently developed RNA structure prediction software, Vfold, a virtual bond-based RNA folding model. The main features of Vfold are the physics-based loop free energy calculations for various RNA structure motifs and a template-based assembly method for RNA 3D structure prediction. For illustration, we use the yybP-ykoY Orphan riboswitch as an example to show the implementation of the Vfold model in RNA structure prediction from the sequence.
Collapse
Affiliation(s)
- Chenhan Zhao
- Department of Physics, Informatics Institute, University of Missouri, Columbia, MO 65211, USA
| | - Xiaojun Xu
- Department of Physics, Informatics Institute, University of Missouri, Columbia, MO 65211, USA
| | - Shi-Jie Chen
- Department of Physics, Informatics Institute, University of Missouri, Columbia, MO 65211, USA
| |
Collapse
|
6
|
Abstract
The secondary structure of an RNA molecule represents the base-pairing interactions within the molecule and fundamentally determines its overall structure. In this chapter, we overview the main approaches and existing tools for predicting RNA secondary structures, as well as methods for identifying noncoding RNAs from genomic sequences or RNA sequencing data. We then focus on the identification of a well-known class of small noncoding RNAs, namely microRNAs, which play very important roles in many biological processes through regulating post-transcriptionally the expression of genes and which dysregulation has been shown to be involved in several human diseases.
Collapse
Affiliation(s)
- Fariza Tahi
- IBISC, UEVE/Genopole, 23 bv. de France, 91000, Evry, France.
- IPS2, University of Paris-Saclay, 91190, Gif-sur-Yvette, France.
| | - Van Du T Tran
- Vital-IT group, SIB Swiss Institute of Bioinformatics, 1015, Lausanne, Switzerland
| | - Anouar Boucheham
- IBISC, UEVE/Genopole, 23 bv. de France, 91000, Evry, France
- College of NTIC, Constantine University 2, Constantine, Algeria
| |
Collapse
|
7
|
de Araujo Oliveira JV, Costa F, Backofen R, Stadler PF, Machado Telles Walter ME, Hertel J. SnoReport 2.0: new features and a refined Support Vector Machine to improve snoRNA identification. BMC Bioinformatics 2016; 17:464. [PMID: 28105919 PMCID: PMC5249026 DOI: 10.1186/s12859-016-1345-6] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/30/2023] Open
Abstract
Background
snoReport uses RNA secondary structure prediction combined with machine learning as the basis to identify the two main classes of small nucleolar RNAs, the box H/ACA snoRNAs and the box C/D snoRNAs. Here, we present snoReport 2.0, which substantially improves and extends in the original method by: extracting new features for both box C/D and H/ACA box snoRNAs; developing a more sophisticated technique in the SVM training phase with recent data from vertebrate organisms and a careful choice of the SVM parameters C and γ; and using updated versions of tools and databases used for the construction of the original version of snoReport. To validate the new version and to demonstrate its improved performance, we tested snoReport 2.0 in different organisms. Results Results of the training and test phases of boxes H/ACA and C/D snoRNAs, in both versions of snoReport, are discussed. Validation on real data was performed to evaluate the predictions of snoReport 2.0. Our program was applied to a set of previously annotated sequences, some of them experimentally confirmed, of humans, nematodes, drosophilids, platypus, chickens and leishmania. We significantly improved the predictions for vertebrates, since the training phase used information of these organisms, but H/ACA box snoRNAs identification was improved for the other ones. Conclusion We presented snoReport 2.0, to predict H/ACA box and C/D box snoRNAs, an efficient method to find true positives and avoid false positives in vertebrate organisms. H/ACA box snoRNA classifier showed an F-score of 93 % (an improvement of 10 % regarding the previous version), while C/D box snoRNA classifier, an F-Score of 94 % (improvement of 14 %). Besides, both classifiers exhibited performance measures above 90 %. These results show that snoReport 2.0 avoid false positives and false negatives, allowing to predict snoRNAs with high quality. In the validation phase, snoReport 2.0 predicted 67.43 % of vertebrate organisms for both classes. For Nematodes and Drosophilids, 69 % and 76.67 %, for H/ACA box snoRNAs were predicted, respectively, showing that snoReport 2.0 is good to identify snoRNAs in vertebrates and also H/ACA box snoRNAs in invertebrates organisms. Electronic supplementary material The online version of this article (doi:10.1186/s12859-016-1345-6) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
| | - Fabrizio Costa
- Bioinformatics Group, Department of Computer Science, Albert-Ludwigs-University Freiburg, Georges-Köhler-Allee 106, Freiburg, 79110, Germany
| | - Rolf Backofen
- Bioinformatics Group, Department of Computer Science, Albert-Ludwigs-University Freiburg, Georges-Köhler-Allee 106, Freiburg, 79110, Germany
| | - Peter Florian Stadler
- Bioinformatics Group, Department of Computer Science, and Interdisciplinary Center for Bioinformatics, University of Leipzig, Haertelstraße 16-18, Leipzig, D-04107, Germany.,German Centre for Integrative Biodiversity Research (iDiv), Halle-Jena-Leipzig, Germany.,Institute for Theoretical Chemistry, University of Vienna, Währingerstraße 17, Vienna, A-1090, Austria.,Center for non-coding RNA in Technology and Health, University of Copenhagen, Grønnegårdsvej 3, Frederiksberg, DK-1870, Denmark.,Max Planck Institute for Mathematics in the Sciences, Inselstraße 22, Leipzig, D-04103, Germany.,RNomics Group, Fraunhofer Institut for Cell Therapy and Immunology, Perlickstraße 1, Leipzig, D-04103, Germany.,Santa Fe Institute, 1399 Hyde Park Rd., Santa Fe, NM87501, USA.,Young Investigators Group Bioinformatics & Transcriptomics, Helmholtz Centre for Environmental Research - UFZ, Permoserstraße 15, Leipzig, D-04318, Germany
| | | | - Jana Hertel
- Bioinformatics Group, Department of Computer Science, and Interdisciplinary Center for Bioinformatics, University of Leipzig, Haertelstraße 16-18, Leipzig, D-04107, Germany
| |
Collapse
|
8
|
Yotsukura S, duVerle D, Hancock T, Natsume-Kitatani Y, Mamitsuka H. Computational recognition for long non-coding RNA (lncRNA): Software and databases. Brief Bioinform 2016; 18:9-27. [DOI: 10.1093/bib/bbv114] [Citation(s) in RCA: 39] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2015] [Revised: 12/10/2015] [Indexed: 01/22/2023] Open
|
9
|
Abstract
Genomic studies have greatly expanded our knowledge of structural non-coding RNAs (ncRNAs). These RNAs fold into characteristic secondary structures and perform specific-structure dependent biological functions. Hence RNA secondary structure prediction is one of the most well studied problems in computational RNA biology. Comparative sequence analysis is one of the more reliable RNA structure prediction approaches as it exploits information of multiple related sequences to infer the consensus secondary structure. This class of methods essentially learns a global secondary structure from the input sequences. In this paper, we consider the more general problem of unearthing common local secondary structure based patterns from a set of related sequences. The input sequences for example could correspond to 3(') or 5(') untranslated regions of a set of orthologous genes and the unearthed local patterns could correspond to regulatory motifs found in these regions. These sequences could also correspond to in vitro selected RNA, genomic segments housing ncRNA genes from the same family and so on. Here, we give a detailed review of the various computational techniques proposed in literature attempting to solve this general motif discovery problem. We also give empirical comparisons of some of the current state of the art methods and point out future directions of research.
Collapse
Affiliation(s)
- Avinash Achar
- Department of Computer and Information Science, Norwegian University of Science and Technology, Trondheim, Norway
| | - Pål Sætrom
- Department of Computer and Information Science, Norwegian University of Science and Technology, Trondheim, Norway.
- Department of Cancer Research and Molecular Medicine, Norwegian University of Science and Technology, Trondheim, Norway.
| |
Collapse
|
10
|
Juanes JM, Miguel A, Morales LJ, Pérez-Ortín JE, Arnau V. A web application for the unspecific detection of differentially expressed DNA regions in strand-specific expression data. Bioinformatics 2015; 31:3228-30. [PMID: 26040457 DOI: 10.1093/bioinformatics/btv343] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2015] [Accepted: 05/29/2015] [Indexed: 11/13/2022] Open
Abstract
UNLABELLED Genomic technologies allow laboratories to produce large-scale data sets, either through the use of next-generation sequencing or microarray platforms. To explore these data sets and obtain maximum value from the data, researchers view their results alongside all the known features of a given reference genome. To study transcriptional changes that occur under a given condition, researchers search for regions of the genome that are differentially expressed between different experimental conditions. In order to identify these regions several algorithms have been developed over the years, along with some bioinformatic platforms that enable their use. However, currently available applications for comparative microarray analysis exclusively focus on changes in gene expression within known transcribed regions of predicted protein-coding genes, the changes that occur in non-predictable genetic elements, such as non-coding RNAs. Here, we present a web application for the visualization of strand-specific tiling microarray or next-generation sequencing data that allows customized detection of differentially expressed regions all along the genome in an unspecific manner, that allows identification of all RNA sequences, predictable or not. AVAILABILITY AND IMPLEMENTATION The web application is freely accessible at http://tilingscan.uv.es/. TilingScan is implemented in PHP and JavaScript. CONTACT vicente.arnau@uv.es SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- José M Juanes
- Departamento de Informática, Escola Tècnica Superior d'Enginyeria
| | - Ana Miguel
- Departamento de Bioquímica y Biología Molecular, Facultad de Biología and E.R.I. Biotecmed, Universitat de València, Burjassot, Spain
| | - Lucas J Morales
- Departamento de Bioquímica y Biología Molecular, Facultad de Biología and
| | - José E Pérez-Ortín
- Departamento de Bioquímica y Biología Molecular, Facultad de Biología and E.R.I. Biotecmed, Universitat de València, Burjassot, Spain
| | - Vicente Arnau
- Departamento de Informática, Escola Tècnica Superior d'Enginyeria
| |
Collapse
|
11
|
Biswas AK, Kang M, Kim DC, Ding CHQ, Zhang B, Wu X, Gao JX. Inferring disease associations of the long non-coding RNAs through non-negative matrix factorization. ACTA ACUST UNITED AC 2015. [DOI: 10.1007/s13721-015-0081-6] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023]
|
12
|
Abstract
The ever increasing discoveries of noncoding RNA functions draw a strong demand for RNA structure determination from the sequence. In recently years, computational studies for RNA structures, at both the two-dimensional and the three-dimensional levels, led to several highly promising new developments. In this chapter, we describe a recently developed RNA structure prediction method based on the virtual bond-based coarse-grained folding model (Vfold). The main emphasis in the Vfold method is placed on the loop entropy calculations, the treatment of noncanonical (mismatch) interactions and the 3D structure assembly from motif-based template library. As case studies, we use the glycine riboswitch and the G310-U376 domain of MLV RNA to illustrate the Vfold-based prediction of RNA 3D structures from the sequences.
Collapse
Affiliation(s)
- Xiaojun Xu
- Department of Physics, University of Missouri, Columbia, MO, 65211, USA
| | | |
Collapse
|
13
|
Paschoal AR, Maracaja-Coutinho V, Setubal JC, Simões ZLP, Verjovski-Almeida S, Durham AM. Non-coding transcription characterization and annotation. RNA Biol 2014; 9:274-82. [DOI: 10.4161/rna.19352] [Citation(s) in RCA: 34] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022] Open
|
14
|
Vazquez-Anderson J, Contreras LM. Regulatory RNAs: charming gene management styles for synthetic biology applications. RNA Biol 2013; 10:1778-97. [PMID: 24356572 DOI: 10.4161/rna.27102] [Citation(s) in RCA: 43] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023] Open
Abstract
RNAs have many important functional properties, including that they are independently controllable and highly tunable. As a result of these advantageous properties, their use in a myriad of sophisticated devices has been widely explored. Yet, the exploitation of RNAs for synthetic applications is highly dependent on the ability to characterize the many new molecules that continue to be discovered by large-scale sequencing and high-throughput screening techniques. In this review, we present an exhaustive survey of the most recent synthetic bacterial riboswitches and small RNAs while emphasizing their virtues in gene expression management. We also explore the use of these RNA components as building blocks in the RNA synthetic biology toolbox and discuss examples of synthetic RNA components used to rewire bacterial regulatory circuitry. We anticipate that this field will expand its catalog of smart devices by mimicking and manipulating natural RNA mechanisms and functions.
Collapse
Affiliation(s)
- Jorge Vazquez-Anderson
- McKetta Department of Chemical Engineering; University of Texas at Austin; Austin, TX USA
| | - Lydia M Contreras
- McKetta Department of Chemical Engineering; University of Texas at Austin; Austin, TX USA
| |
Collapse
|
15
|
Biswas AK, Zhang B, Wu X, Gao JX. CNCTDiscriminator: coding and noncoding transcript discriminator - an excursion through hypothesis learning and ensemble learning approaches. J Bioinform Comput Biol 2013; 11:1342002. [PMID: 24131051 DOI: 10.1142/s021972001342002x] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
The statistics about the open reading frames, the base compositions and the properties of the predicted secondary structures have potential to address the problem of discriminating coding and noncoding transcripts. Again, the Next Generation Sequencing platform, RNA-seq, provides us bounty of data from which expression profiles of the transcripts can be extracted which urged us adding a new set of dimension in this classification task. In this paper, we proposed CNCTDiscriminator -- a coding and noncoding transcript discriminating system where we applied the integration of these four categories of features about the transcripts. The feature integration was done using both hypothesis learning and feature specific ensemble learning approaches. The CNCTDiscriminator model which was trained with composition and ORF features outperforms (precision 83.86%, recall 82.01%) other three popular methods -- CPC (precision 98.31%, recall 25.95%), CPAT (precision 97.74%, recall 52.50%) and PORTRAIT (precision 84.37%, recall 73.2%) when applied to an independent benchmark dataset. However, the CNCTDiscriminator model that was trained using the ensemble approach shows comparable performance (precision 89.85%, recall 71.08%).
Collapse
Affiliation(s)
- Ashis Kumer Biswas
- Computer Science and Engineering, The University of Texas at Arlington, Arlington, Texas 76019, USA
| | | | | | | |
Collapse
|
16
|
Smith MA, Gesell T, Stadler PF, Mattick JS. Widespread purifying selection on RNA structure in mammals. Nucleic Acids Res 2013; 41:8220-36. [PMID: 23847102 PMCID: PMC3783177 DOI: 10.1093/nar/gkt596] [Citation(s) in RCA: 130] [Impact Index Per Article: 11.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2013] [Revised: 05/29/2013] [Accepted: 06/16/2013] [Indexed: 12/14/2022] Open
Abstract
Evolutionarily conserved RNA secondary structures are a robust indicator of purifying selection and, consequently, molecular function. Evaluating their genome-wide occurrence through comparative genomics has consistently been plagued by high false-positive rates and divergent predictions. We present a novel benchmarking pipeline aimed at calibrating the precision of genome-wide scans for consensus RNA structure prediction. The benchmarking data obtained from two refined structure prediction algorithms, RNAz and SISSIz, were then analyzed to fine-tune the parameters of an optimized workflow for genomic sliding window screens. When applied to consistency-based multiple genome alignments of 35 mammals, our approach confidently identifies >4 million evolutionarily constrained RNA structures using a conservative sensitivity threshold that entails historically low false discovery rates for such analyses (5-22%). These predictions comprise 13.6% of the human genome, 88% of which fall outside any known sequence-constrained element, suggesting that a large proportion of the mammalian genome is functional. As an example, our findings identify both known and novel conserved RNA structure motifs in the long noncoding RNA MALAT1. This study provides an extensive set of functional transcriptomic annotations that will assist researchers in uncovering the precise mechanisms underlying the developmental ontologies of higher eukaryotes.
Collapse
Affiliation(s)
- Martin A. Smith
- RNA Biology and Plasticity Laboratory, Garvan Institute of Medical Research, 384 Victoria Street, Darlinghurst, Sydney, NSW 2010 Australia, Genomics and Computational Biology Division, Institute for Molecular Bioscience, 306 Carmody Rd, University of Queensland, Brisbane, 4067 Australia, Department of Structural and Computational Biology; and Center for Integrative Bioinformatics Vienna (CIBIV), Max F. Perutz Laboratories (MFPL), University of Vienna, Medical University of Vienna, Dr. Bohr-Gasse 9, A-1030 Vienna, Austria, Bioinformatics Group, Department of Computer Science; and Interdisciplinary Center for Bioinformatics, University of Leipzig, Härtelstrasse 16–18, D-04107 Leipzig, Germany, Max Planck Institute for Mathematics in the Sciences, Inselstraße 22, D-04103 Leipzig, Germany, Center for Non-coding RNA in Technology and Health, Department of Basic Veterinary and Animal Sciences, Faculty of Life Sciences University of Copenhagen, Grønnegårdsvej 3, 1870 Frederiksberg C Denmark, Santa Fe Institute, 1399 Hyde Park Rd, Santa Fe, NM 87501, USA and St Vincent’s Clinical School, University of New South Wales, Level 5, de Lacy, Victoria St, St Vincent's Hospital, Sydney, NSW 2010 Australia
| | - Tanja Gesell
- RNA Biology and Plasticity Laboratory, Garvan Institute of Medical Research, 384 Victoria Street, Darlinghurst, Sydney, NSW 2010 Australia, Genomics and Computational Biology Division, Institute for Molecular Bioscience, 306 Carmody Rd, University of Queensland, Brisbane, 4067 Australia, Department of Structural and Computational Biology; and Center for Integrative Bioinformatics Vienna (CIBIV), Max F. Perutz Laboratories (MFPL), University of Vienna, Medical University of Vienna, Dr. Bohr-Gasse 9, A-1030 Vienna, Austria, Bioinformatics Group, Department of Computer Science; and Interdisciplinary Center for Bioinformatics, University of Leipzig, Härtelstrasse 16–18, D-04107 Leipzig, Germany, Max Planck Institute for Mathematics in the Sciences, Inselstraße 22, D-04103 Leipzig, Germany, Center for Non-coding RNA in Technology and Health, Department of Basic Veterinary and Animal Sciences, Faculty of Life Sciences University of Copenhagen, Grønnegårdsvej 3, 1870 Frederiksberg C Denmark, Santa Fe Institute, 1399 Hyde Park Rd, Santa Fe, NM 87501, USA and St Vincent’s Clinical School, University of New South Wales, Level 5, de Lacy, Victoria St, St Vincent's Hospital, Sydney, NSW 2010 Australia
| | - Peter F. Stadler
- RNA Biology and Plasticity Laboratory, Garvan Institute of Medical Research, 384 Victoria Street, Darlinghurst, Sydney, NSW 2010 Australia, Genomics and Computational Biology Division, Institute for Molecular Bioscience, 306 Carmody Rd, University of Queensland, Brisbane, 4067 Australia, Department of Structural and Computational Biology; and Center for Integrative Bioinformatics Vienna (CIBIV), Max F. Perutz Laboratories (MFPL), University of Vienna, Medical University of Vienna, Dr. Bohr-Gasse 9, A-1030 Vienna, Austria, Bioinformatics Group, Department of Computer Science; and Interdisciplinary Center for Bioinformatics, University of Leipzig, Härtelstrasse 16–18, D-04107 Leipzig, Germany, Max Planck Institute for Mathematics in the Sciences, Inselstraße 22, D-04103 Leipzig, Germany, Center for Non-coding RNA in Technology and Health, Department of Basic Veterinary and Animal Sciences, Faculty of Life Sciences University of Copenhagen, Grønnegårdsvej 3, 1870 Frederiksberg C Denmark, Santa Fe Institute, 1399 Hyde Park Rd, Santa Fe, NM 87501, USA and St Vincent’s Clinical School, University of New South Wales, Level 5, de Lacy, Victoria St, St Vincent's Hospital, Sydney, NSW 2010 Australia
| | - John S. Mattick
- RNA Biology and Plasticity Laboratory, Garvan Institute of Medical Research, 384 Victoria Street, Darlinghurst, Sydney, NSW 2010 Australia, Genomics and Computational Biology Division, Institute for Molecular Bioscience, 306 Carmody Rd, University of Queensland, Brisbane, 4067 Australia, Department of Structural and Computational Biology; and Center for Integrative Bioinformatics Vienna (CIBIV), Max F. Perutz Laboratories (MFPL), University of Vienna, Medical University of Vienna, Dr. Bohr-Gasse 9, A-1030 Vienna, Austria, Bioinformatics Group, Department of Computer Science; and Interdisciplinary Center for Bioinformatics, University of Leipzig, Härtelstrasse 16–18, D-04107 Leipzig, Germany, Max Planck Institute for Mathematics in the Sciences, Inselstraße 22, D-04103 Leipzig, Germany, Center for Non-coding RNA in Technology and Health, Department of Basic Veterinary and Animal Sciences, Faculty of Life Sciences University of Copenhagen, Grønnegårdsvej 3, 1870 Frederiksberg C Denmark, Santa Fe Institute, 1399 Hyde Park Rd, Santa Fe, NM 87501, USA and St Vincent’s Clinical School, University of New South Wales, Level 5, de Lacy, Victoria St, St Vincent's Hospital, Sydney, NSW 2010 Australia
| |
Collapse
|
17
|
Abstract
A key step toward understanding a metagenomics data set is the identification of functional sequence elements within it, such as protein coding genes and structural RNAs. Relative to protein coding genes, structural RNAs are more difficult to identify because of their reduced alphabet size, lack of open reading frames, and short length. Infernal is a software package that implements “covariance models” (CMs) for RNA homology search, which harness both sequence and structural conservation when searching for RNA homologs. Thanks to the added statistical signal inherent in the secondary structure conservation of many RNA families, Infernal is more powerful than sequence-only based methods such as BLAST and profile HMMs. Together with the Rfam database of CMs, Infernal is a useful tool for identifying RNAs in metagenomics data sets.
Collapse
|
18
|
Abstract
Heart function requires sophisticated regulatory networks to orchestrate organ development, physiological responses, and environmental adaptation. Until recently, it was thought that these regulatory networks are composed solely of protein-mediated transcriptional control and signaling systems; consequently, it was thought that cardiac disease involves perturbation of these systems. However, it is becoming evident that RNA, long considered to function primarily as the platform for protein production, may in fact play a major role in most, if not all, aspects of gene regulation, especially the epigenetic processes that underpin organogenesis. These include not only well-validated classes of regulatory RNAs, such as microRNAs, but also tens of thousands of long noncoding RNAs that are differentially expressed across the entire genome of humans and other animals. Here, we review this emerging landscape, summarizing what is known about their functions and their role in cardiac biology, and provide a toolkit to assist in exploring this previously hidden layer of gene regulation that may underpin heart adaptation and complex heart diseases.
Collapse
Affiliation(s)
- Nicole Schonrock
- From the Victor Chang Cardiac Research Institute, Darlinghurst, New South Wales, Australia (N.S., R.R.H.); St. Vincent’s Clinical School, Faculty of Medicine, University of New South Wales, Kensington, New South Wales, Australia (N.S., R.P.H., J.S.M.); and Garvan Institute of Medical Research, Darlinghurst, New South Wales, Australia (J.S.M.)
| | - Richard P. Harvey
- From the Victor Chang Cardiac Research Institute, Darlinghurst, New South Wales, Australia (N.S., R.R.H.); St. Vincent’s Clinical School, Faculty of Medicine, University of New South Wales, Kensington, New South Wales, Australia (N.S., R.P.H., J.S.M.); and Garvan Institute of Medical Research, Darlinghurst, New South Wales, Australia (J.S.M.)
| | - John S. Mattick
- From the Victor Chang Cardiac Research Institute, Darlinghurst, New South Wales, Australia (N.S., R.R.H.); St. Vincent’s Clinical School, Faculty of Medicine, University of New South Wales, Kensington, New South Wales, Australia (N.S., R.P.H., J.S.M.); and Garvan Institute of Medical Research, Darlinghurst, New South Wales, Australia (J.S.M.)
| |
Collapse
|
19
|
Enfield KSS, Pikor LA, Martinez VD, Lam WL. Mechanistic Roles of Noncoding RNAs in Lung Cancer Biology and Their Clinical Implications. GENETICS RESEARCH INTERNATIONAL 2012; 2012:737416. [PMID: 22852089 PMCID: PMC3407615 DOI: 10.1155/2012/737416] [Citation(s) in RCA: 48] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/16/2011] [Accepted: 03/08/2012] [Indexed: 01/07/2023]
Abstract
Lung cancer biology has traditionally focused on genomic and epigenomic deregulation of protein-coding genes to identify oncogenes and tumor suppressors diagnostic and therapeutic targets. Another important layer of cancer biology has emerged in the form of noncoding RNAs (ncRNAs), which are major regulators of key cellular processes such as proliferation, RNA splicing, gene regulation, and apoptosis. In the past decade, microRNAs (miRNAs) have moved to the forefront of ncRNA cancer research, while the role of long noncoding RNAs (lncRNAs) is emerging. Here we review the mechanisms by which miRNAs and lncRNAs are deregulated in lung cancer, the technologies that can be applied to detect such alterations, and the clinical potential of these RNA species. An improved comprehension of lung cancer biology will come through the understanding of the interplay between deregulation of non-coding RNAs, the protein-coding genes they regulate, and how these interactions influence cellular networks and signalling pathways.
Collapse
Affiliation(s)
- Katey S. S. Enfield
- British Columbia Cancer Research Center, Vancouver, BC, Canada V5Z 1L3
- Interdisciplinary Oncology Program, University of British Columbia, Vancouver, BC, Canada V5Z1L3
| | - Larissa A. Pikor
- British Columbia Cancer Research Center, Vancouver, BC, Canada V5Z 1L3
- Interdisciplinary Oncology Program, University of British Columbia, Vancouver, BC, Canada V5Z1L3
| | - Victor D. Martinez
- British Columbia Cancer Research Center, Vancouver, BC, Canada V5Z 1L3
- Department of Pathology and Laboratory Medicine, University of British Columbia, Vancouver, BC, Canada V6T2B5
| | - Wan L. Lam
- British Columbia Cancer Research Center, Vancouver, BC, Canada V5Z 1L3
- Interdisciplinary Oncology Program, University of British Columbia, Vancouver, BC, Canada V5Z1L3
- Department of Pathology and Laboratory Medicine, University of British Columbia, Vancouver, BC, Canada V6T2B5
| |
Collapse
|
20
|
Wang Y, Manzour A, Shareghi P, Shaw TI, Li YW, Malmberg RL, Cai L. Stable stem enabled Shannon entropies distinguish non-coding RNAs from random backgrounds. BMC Bioinformatics 2012; 13 Suppl 5:S1. [PMID: 22537005 PMCID: PMC3358654 DOI: 10.1186/1471-2105-13-s5-s1] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
Background The computational identification of RNAs in genomic sequences requires the identification of signals of RNA sequences. Shannon base pairing entropy is an indicator for RNA secondary structure fold certainty in detection of structural, non-coding RNAs (ncRNAs). Under the Boltzmann ensemble of secondary structures, the probability of a base pair is estimated from its frequency across all the alternative equilibrium structures. However, such an entropy has yet to deliver the desired performance for distinguishing ncRNAs from random sequences. Developing novel methods to improve the entropy measure performance may result in more effective ncRNA gene finding based on structure detection. Results This paper shows that the measuring performance of base pairing entropy can be significantly improved with a constrained secondary structure ensemble in which only canonical base pairs are assumed to occur in energetically stable stems in a fold. This constraint actually reduces the space of the secondary structure and may lower the probabilities of base pairs unfavorable to the native fold. Indeed, base pairing entropies computed with this constrained model demonstrate substantially narrowed gaps of Z-scores between ncRNAs, as well as drastic increases in the Z-score for all 13 tested ncRNA sets, compared to shuffled sequences. Conclusions These results suggest the viability of developing effective structure-based ncRNA gene finding methods by investigating secondary structure ensembles of ncRNAs.
Collapse
Affiliation(s)
- Yingfeng Wang
- Department of Computer Science, University of Georgia, Athens, Georgia 30602, USA.
| | | | | | | | | | | | | |
Collapse
|
21
|
Izzo JA, Kim N, Elmetwaly S, Schlick T. RAG: an update to the RNA-As-Graphs resource. BMC Bioinformatics 2011; 12:219. [PMID: 21627789 PMCID: PMC3123240 DOI: 10.1186/1471-2105-12-219] [Citation(s) in RCA: 43] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2010] [Accepted: 05/31/2011] [Indexed: 02/08/2023] Open
Abstract
Background In 2004, we presented a web resource for stimulating the search for novel RNAs, RNA-As-Graphs (RAG), which classified, catalogued, and predicted RNA secondary structure motifs using clustering and build-up approaches. With the increased availability of secondary structures in recent years, we update the RAG resource and provide various improvements for analyzing RNA structures. Description Our RAG update includes a new supervised clustering algorithm that can suggest RNA motifs that may be "RNA-like". We use this utility to describe RNA motifs as three classes: existing, RNA-like, and non-RNA-like. This produces 126 tree and 16,658 dual graphs as candidate RNA-like topologies using the supervised clustering algorithm with existing RNAs serving as the training data. A comparison of this clustering approach to an earlier method shows considerable improvements. Additional RAG features include greatly expanded search capabilities, an interface to better utilize the benefits of relational database, and improvements to several of the utilities such as directed/labeled graphs and a subgraph search program. Conclusions The RAG updates presented here augment the database's intended function - stimulating the search for novel RNA functionality - by classifying available motifs, suggesting new motifs for design, and allowing for more specific searches for specific topologies. The updated RAG web resource offers users a graph-based tool for exploring available RNA motifs and suggesting new RNAs for design.
Collapse
Affiliation(s)
- Joseph A Izzo
- Department of Chemistry, New York University, New York, NY 10003, USA
| | | | | | | |
Collapse
|
22
|
Chen Y, Indurthi DC, Jones SW, Papoutsakis ET. Small RNAs in the genus Clostridium. mBio 2011; 2:e00340-10. [PMID: 21264064 PMCID: PMC3025663 DOI: 10.1128/mbio.00340-10] [Citation(s) in RCA: 64] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2010] [Accepted: 01/03/2011] [Indexed: 11/20/2022] Open
Abstract
The genus Clostridium includes major human pathogens and species important to cellulose degradation, the carbon cycle, and biotechnology. Small RNAs (sRNAs) are emerging as crucial regulatory molecules in all organisms, but they have not been investigated in clostridia. Research on sRNAs in clostridia is hindered by the absence of a systematic method to identify sRNA candidates, thus delegating clostridial sRNA research to a hit-and-miss process. Thus, we wanted to develop a method to identify potential sRNAs in the Clostridium genus to open up the field of sRNA research in clostridia. Using comparative genomics analyses combined with predictions of rho-independent terminators and promoters, we predicted sRNAs in 21 clostridial genomes: Clostridium acetobutylicum, C. beijerinckii, C. botulinum (eight strains), C. cellulolyticum, C. difficile, C. kluyveri (two strains), C. novyi, C. perfringens (three strains), C. phytofermentans, C. tetani, and C. thermocellum. Although more than one-third of predicted sRNAs have Shine-Dalgarno (SD) sequences, only one-sixth have a start codon downstream of SD sequences; thus, most of the predicted sRNAs are noncoding RNAs. Quantitative reverse transcription-PCR (Q-RT-PCR) and Northern analysis were employed to test the presence of a randomly chosen set of sRNAs in C. acetobutylicum and several C. botulinum strains, leading to the confirmation of a large fraction of the tested sRNAs. We identified a conserved, novel sRNA which, together with the downstream gene coding for an ATP-binding cassette (ABC) transporter gene, responds to the antibiotic clindamycin. The number of predicted sRNAs correlated with the physiological function of the species (high for pathogens, low for cellulolytic, and intermediate for solventogenic), but not with 16S rRNA-based phylogeny.
Collapse
Affiliation(s)
- Yili Chen
- Delaware Biotechnology Institute, Molecular Biotechnology Laboratory, University of Delaware, Newark, Delaware, USA
- Department of Chemical Engineering, Colburn Laboratory, University of Delaware, Newark, Delaware, USA; and
| | - Dinesh C. Indurthi
- Delaware Biotechnology Institute, Molecular Biotechnology Laboratory, University of Delaware, Newark, Delaware, USA
- Department of Chemical Engineering, Colburn Laboratory, University of Delaware, Newark, Delaware, USA; and
| | - Shawn W. Jones
- Delaware Biotechnology Institute, Molecular Biotechnology Laboratory, University of Delaware, Newark, Delaware, USA
- Department of Chemical Engineering, Colburn Laboratory, University of Delaware, Newark, Delaware, USA; and
- Department of Chemical and Biological Engineering, Northwestern University, Evanston, Illinois, USA
| | - Eleftherios T. Papoutsakis
- Delaware Biotechnology Institute, Molecular Biotechnology Laboratory, University of Delaware, Newark, Delaware, USA
- Department of Chemical Engineering, Colburn Laboratory, University of Delaware, Newark, Delaware, USA; and
- Department of Chemical and Biological Engineering, Northwestern University, Evanston, Illinois, USA
| |
Collapse
|
23
|
Underwood JG, Uzilov AV, Katzman S, Onodera CS, Mainzer JE, Mathews DH, Lowe TM, Salama SR, Haussler D. FragSeq: transcriptome-wide RNA structure probing using high-throughput sequencing. Nat Methods 2010; 7:995-1001. [PMID: 21057495 PMCID: PMC3247016 DOI: 10.1038/nmeth.1529] [Citation(s) in RCA: 242] [Impact Index Per Article: 17.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2010] [Accepted: 10/13/2010] [Indexed: 01/07/2023]
Abstract
Previous efforts to determine structures of non-coding RNA (ncRNA) probed only one RNA at a time with enzymes and chemicals, using gel electrophoresis to identify reactive positions. To accelerate RNA structure inference, we have developed FragSeq, a high-throughput RNA structure probing method that uses high-throughput RNA sequencing on fragments generated by nuclease P1, which specifically cleaves single stranded nucleic acids. In experiments probing the entire mouse nuclear transcriptome, we show that we can accurately and simultaneously map single-stranded regions (ssRNA) in multiple ncRNAs with known structure. We carried out probing in two cell types to demonstrate reproducibility. We also identified and experimentally validated structured regions in ncRNAs never previously probed.
Collapse
Affiliation(s)
- Jason G Underwood
- Howard Hughes Medical Institute, University of California Santa Cruz, Santa Cruz, California, USA
| | | | | | | | | | | | | | | | | |
Collapse
|
24
|
Kandoth C, Ercal F, Frank RL. A framework for automated enrichment of functionally significant inverted repeats in whole genomes. BMC Bioinformatics 2010; 11 Suppl 6:S20. [PMID: 20946604 PMCID: PMC3026368 DOI: 10.1186/1471-2105-11-s6-s20] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/25/2023] Open
Abstract
BACKGROUND RNA transcripts from genomic sequences showing dyad symmetry typically adopt hairpin-like, cloverleaf, or similar structures that act as recognition sites for proteins. Such structures often are the precursors of non-coding RNA (ncRNA) sequences like microRNA (miRNA) and small-interfering RNA (siRNA) that have recently garnered more functional significance than in the past. Genomic DNA contains hundreds of thousands of such inverted repeats (IRs) with varying degrees of symmetry. But by collecting statistically significant information from a known set of ncRNA, we can sort these IRs into those that are likely to be functional. RESULTS A novel method was developed to scan genomic DNA for partially symmetric inverted repeats and the resulting set was further refined to match miRNA precursors (pre-miRNA) with respect to their density of symmetry, statistical probability of the symmetry, length of stems in the predicted hairpin secondary structure, and the GC content of the stems. This method was applied on the Arabidopsis thaliana genome and validated against the set of 190 known Arabidopsis pre-miRNA in the miRBase database. A preliminary scan for IRs identified 186 of the known pre-miRNA but with 714700 pre-miRNA candidates. This large number of IRs was further refined to 483908 candidates with 183 pre-miRNA identified and further still to 165371 candidates with 171 pre-miRNA identified (i.e. with 90% of the known pre-miRNA retained). CONCLUSIONS 165371 candidates for potentially functional miRNA is still too large a set to warrant wet lab analyses, such as northern blotting, on all of them. Hence additional filters are needed to further refine the number of candidates while still retaining most of the known miRNA. These include detection of promoters and terminators, homology analyses, location of candidate relative to coding regions, and better secondary structure prediction algorithms. The software developed is designed to easily accommodate such additional filters with a minimal experience in Perl.
Collapse
Affiliation(s)
- Cyriac Kandoth
- Department of Computer Science, Missouri University of Science and Technology, Rolla, MO 65409, USA.
| | | | | |
Collapse
|
25
|
Raasch P, Schmitz U, Patenge N, Vera J, Kreikemeyer B, Wolkenhauer O. Non-coding RNA detection methods combined to improve usability, reproducibility and precision. BMC Bioinformatics 2010; 11:491. [PMID: 20920260 PMCID: PMC2955705 DOI: 10.1186/1471-2105-11-491] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2010] [Accepted: 09/29/2010] [Indexed: 11/10/2022] Open
Abstract
Background Non-coding RNAs gain more attention as their diverse roles in many cellular processes are discovered. At the same time, the need for efficient computational prediction of ncRNAs increases with the pace of sequencing technology. Existing tools are based on various approaches and techniques, but none of them provides a reliable ncRNA detector yet. Consequently, a natural approach is to combine existing tools. Due to a lack of standard input and output formats combination and comparison of existing tools is difficult. Also, for genomic scans they often need to be incorporated in detection workflows using custom scripts, which decreases transparency and reproducibility. Results We developed a Java-based framework to integrate existing tools and methods for ncRNA detection. This framework enables users to construct transparent detection workflows and to combine and compare different methods efficiently. We demonstrate the effectiveness of combining detection methods in case studies with the small genomes of Escherichia coli, Listeria monocytogenes and Streptococcus pyogenes. With the combined method, we gained 10% to 20% precision for sensitivities from 30% to 80%. Further, we investigated Streptococcus pyogenes for novel ncRNAs. Using multiple methods--integrated by our framework--we determined four highly probable candidates. We verified all four candidates experimentally using RT-PCR. Conclusions We have created an extensible framework for practical, transparent and reproducible combination and comparison of ncRNA detection methods. We have proven the effectiveness of this approach in tests and by guiding experiments to find new ncRNAs. The software is freely available under the GNU General Public License (GPL), version 3 at http://www.sbi.uni-rostock.de/moses along with source code, screen shots, examples and tutorial material.
Collapse
Affiliation(s)
- Peter Raasch
- Systems Biology and Bioinformatics Group, University of Rostock, Rostock, Germany
| | | | | | | | | | | |
Collapse
|
26
|
Laing C, Schlick T. Computational approaches to 3D modeling of RNA. JOURNAL OF PHYSICS. CONDENSED MATTER : AN INSTITUTE OF PHYSICS JOURNAL 2010; 22:283101. [PMID: 21399271 PMCID: PMC6286080 DOI: 10.1088/0953-8984/22/28/283101] [Citation(s) in RCA: 75] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/30/2023]
Abstract
Many exciting discoveries have recently revealed the versatility of RNA and its importance in a variety of functions within the cell. Since the structural features of RNA are of major importance to their biological function, there is much interest in predicting RNA structure, either in free form or in interaction with various ligands, including proteins, metabolites and other molecules. In recent years, an increasing number of researchers have developed novel RNA algorithms for predicting RNA secondary and tertiary structures. In this review, we describe current experimental and computational advances and discuss recent ideas that are transforming the traditional view of RNA folding. To evaluate the performance of the most recent RNA 3D folding algorithms, we provide a comparative study in order to test the performance of available 3D structure prediction algorithms for an RNA data set of 43 structures of various lengths and motifs. We find that the algorithms vary widely in terms of prediction quality across different RNA lengths and topologies; most predictions have very large root mean square deviations from the experimental structure. We conclude by outlining some suggestions for future RNA folding research.
Collapse
Affiliation(s)
- Christian Laing
- Department of Chemistry and Courant Institute of Mathematical Sciences, New York University, 251 Mercer Street, New York, NY 10012, USA
| | | |
Collapse
|
27
|
Matrajt M. Non-coding RNA in apicomplexan parasites. Mol Biochem Parasitol 2010; 174:1-7. [PMID: 20566348 DOI: 10.1016/j.molbiopara.2010.06.002] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/24/2009] [Revised: 05/29/2010] [Accepted: 06/01/2010] [Indexed: 11/28/2022]
Abstract
In recent years it has became evident that the transcriptome of most species has little protein-coding capacity and that the abundance of non-coding RNA was previously overlooked. Non-coding RNAs were initially thought to be transcriptional noise, however, a growing number of studies is showing that many of these RNAs have important regulatory functions. Here, we review the progress done in apicomplexan parasites in this rapidly growing field.
Collapse
Affiliation(s)
- Mariana Matrajt
- Department of Microbiology and Molecular Genetics, University of Vermont, Stafford Hall, Room 306, 95 Carrigan Drive, Burlington, VT 05405, United States.
| |
Collapse
|
28
|
Soldà G, Makunin IV, Sezerman OU, Corradin A, Corti G, Guffanti A. An Ariadne's thread to the identification and annotation of noncoding RNAs in eukaryotes. Brief Bioinform 2009; 10:475-89. [PMID: 19383843 DOI: 10.1093/bib/bbp022] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Non-protein coding RNAs (ncRNAs) have emerged as a vast and heterogeneous portion of eukaryotic transcriptomes. Several ncRNA families, either short (<200 nucleotides, nt) or long (>200 nt), have been described and implicated in a variety of biological processes, from translation to gene expression regulation and nuclear trafficking. Most probably, other families are still to be discovered. Computational methods for ncRNA research require different approaches from the ones normally used in the prediction of protein-coding genes. Indeed, primary sequence alone is often insufficient to infer ncRNA functionality, whereas secondary structure and local conservation of portions of the transcript could provide useful information for both the prediction and the functional annotation of ncRNAs. Here we present an overview of computational methods and bioinformatics resources currently available for studying ncRNA genes, introducing the common themes as well as the different approaches required for long and short ncRNA identification and annotation.
Collapse
Affiliation(s)
- Giulia Soldà
- Department of Biology and Genetics for Medical Sciences, University of Milano, 20133 Milan, Italy.
| | | | | | | | | | | |
Collapse
|
29
|
Mello BP, Abrantes EF, Torres CH, Machado-Lima A, Fonseca RDS, Carraro DM, Brentani RR, Reis LFL, Brentani H. No-match ORESTES explored as tumor markers. Nucleic Acids Res 2009; 37:2607-17. [PMID: 19270067 PMCID: PMC2677862 DOI: 10.1093/nar/gkp074] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022] Open
Abstract
Sequencing technologies and new bioinformatics tools have led to the complete sequencing of various genomes. However, information regarding the human transcriptome and its annotation is yet to be completed. The Human Cancer Genome Project, using ORESTES (open reading frame EST sequences) methodology, contributed to this objective by generating data from about 1.2 million expressed sequence tags. Approximately 30% of these sequences did not align to ESTs in the public databases and were considered no-match ORESTES. On the basis that a set of these ESTs could represent new transcripts, we constructed a cDNA microarray. This platform was used to hybridize against 12 different normal or tumor tissues. We identified 3421 transcribed regions not associated with annotated transcripts, representing 83.3% of the platform. The total number of differentially expressed sequences was 1007. Also, 28% of analyzed sequences could represent noncoding RNAs. Our data reinforces the knowledge of the human genome being pervasively transcribed, and point out molecular marker candidates for different cancers. To reinforce our data, we confirmed, by real-time PCR, the differential expression of three out of eight potentially tumor markers in prostate tissues. Lists of 1007 differentially expressed sequences, and the 291 potentially noncoding tumor markers were provided.
Collapse
Affiliation(s)
- Barbara P Mello
- Hospital A. C. Camargo, Rua Prof. Antônio Prudente 211, São Paulo, SP, Brazil
| | | | | | | | | | | | | | | | | |
Collapse
|
30
|
Kleesiek J, Torda AE. RNA secondary structure prediction using a self-consistent mean field approach. J Comput Chem 2009; 31:1135-42. [DOI: 10.1002/jcc.21398] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
|
31
|
Vestheim H, Jarman SN. Blocking primers to enhance PCR amplification of rare sequences in mixed samples - a case study on prey DNA in Antarctic krill stomachs. Front Zool 2008; 5:12. [PMID: 18638418 PMCID: PMC2517594 DOI: 10.1186/1742-9994-5-12] [Citation(s) in RCA: 224] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2008] [Accepted: 07/20/2008] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Identification of DNA sequence diversity is a powerful means for assessing the species present in environmental samples. The most common molecular strategies for estimating taxonomic composition depend upon PCR with universal primers that amplify an orthologous DNA region from a range of species. The diversity of sequences within a sample that can be detected by universal primers is often compromised by high concentrations of some DNA templates. If the DNA within the sample contains a small number of sequences in relatively high concentrations, then less concentrated sequences are often not amplified because the PCR favours the dominant DNA types. This is a particular problem in molecular diet studies, where predator DNA is often present in great excess of food-derived DNA. RESULTS We have developed a strategy where a universal PCR simultaneously amplifies DNA from food items present in DNA purified from stomach samples, while the predator's own DNA is blocked from amplification by the addition of a modified predator-specific blocking primer. Three different types of modified primers were tested out; one annealing inhibiting primer overlapping with the 3' end of one of the universal primers, another annealing inhibiting primer also having an internal modification of five dI molecules making it a dual priming oligo, and a third elongation arrest primer located between the two universal primers. All blocking primers were modified with a C3 spacer. In artificial PCR mixtures, annealing inhibiting primers proved to be the most efficient ones and this method reduced predator amplicons to undetectable levels even when predator template was present in 1000 fold excess of the prey template. The prey template then showed strong PCR amplification where none was detectable without the addition of blocking primer. Our method was applied to identifying the winter food of one of the most abundant animals in the world, the Antarctic krill, Euphausia superba. Dietary item DNA was PCR amplified from a range of species in krill stomachs for which we had no prior sequence knowledge. CONCLUSION We present a simple, robust and cheap method that is easily adaptable to many situations where a rare DNA template is to be PCR amplified in the presence of a higher concentration template with identical PCR primer binding sites.
Collapse
Affiliation(s)
- Hege Vestheim
- Department of Biology, University of Oslo, P,O, Box 1066, Blindern, 0316, Oslo, Norway.
| | | |
Collapse
|
32
|
Zhao Y, Li H, Hou Y, Cha L, Cao Y, Wang L, Ying X, Li W. Construction of two mathematical models for prediction of bacterial sRNA targets. Biochem Biophys Res Commun 2008; 372:346-50. [PMID: 18501192 DOI: 10.1016/j.bbrc.2008.05.046] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2008] [Accepted: 05/03/2008] [Indexed: 10/22/2022]
Abstract
Accurate prediction of sRNA targets plays a key role in determining sRNA functions. Here we introduced two mathematical models, sRNATargetNB and sRNATargetSVM, for prediction of sRNA targets using Nai ve Bayes method and support vector machines (SVM), respectively. The training dataset was composed of 46 positive samples (real sRNA-targets interaction) and 86 negative samples (no interaction between sRNA and targets). The leave-one-out cross-validation (LOOCV) classification accuracy was 91.67% for sRNATargetNB, and 100.00% for sRNATargetSVM. To evaluate the performance of the models, an independent test dataset was used, which contained 22 positive samples and 1700 randomly generated negative samples. The results showed that the classification accuracy, sensitivity, and specificity were 93.03%, 40.90%, and 93.71% for sRNATargetNB and 80.55%, 72.73%, and 80.65% for sRNATargetSVM, respectively. Therefore, the presented models provide support for experimental identification of sRNA targets. The related software and supplementary materials can be downloaded from webpage http://www.biosun.org.cn/srnatarget/.
Collapse
Affiliation(s)
- Yalin Zhao
- Center of Computational Biology, Beijing Institute of Basic Medical Sciences, Taiping Road 27#, Haidian District, Beijing 100850, China
| | | | | | | | | | | | | | | |
Collapse
|