1
|
Sichani AR, Sichani ZR, Yazdani B, Looha MA, Sirous H. A bioinformatics approach of specificity protein transcription factors in head and neck squamous cell carcinoma. Res Pharm Sci 2024; 19:287-302. [PMID: 39035812 PMCID: PMC11257197 DOI: 10.4103/rps.rps_171_23] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2023] [Revised: 10/14/2023] [Accepted: 11/13/2023] [Indexed: 07/23/2024] Open
Abstract
Background and purpose The seventh most common type of cancer with increasing diagnosis rates around the world is head and neck squamous cell carcinoma (HNSCC). Specificity proteins (SPs) have been known for their role in the regulation of cellular division, growth, and apoptotic pathways in various cancers. In this work, we analyzed the expression levels of SPs in HNSCC to assess their diagnostic and prognostic biomarker potential. Experimental approach Differential gene expression and correlation analysis methods were used to determine the top dysregulated genes in HNSCC. Functional enrichment and protein-protein interaction analyses were done with the DAVID database and Cytoscape software to understand their function and biological processes. Receiver operating test, logistic regression, and Cox regression analyses were performed to check SP genes' diagnostic and prognostic potential. Findings/Results SP1 (LogFC = -0.27, P = 0.0013) and SP2 (LogFC = -0.20, P = 0.0019) genes were upregulated in HNSCC samples, while SP8 (LogFC = 2.57, P < 0.001) and SP9 (LogFC = 2.57, P < 0.001) genes were downregulated in cancer samples. A moderate positive correlation was observed among the expression levels of SP1, SP2, and SP3 genes. The SP8 and SP9 genes with AUC values of 0.79 and 0.75 demonstrated diagnostic potential which increased to 0.84 when both genes were assessed by logistic regression test. Also, the SP1 gene held a marginally significant prognostic potential. Conclusion and implications Our findings clarify the potential of SP transcription factors as candidate diagnostic and prognostic biomarkers for early screening and treatment of HNSCC.
Collapse
Affiliation(s)
- Adel Rezvani Sichani
- Department of Food Science and Technology, Shahreza Branch, Islamic Azad University, Shahreza. I.R. Iran
| | - Ziba Rezvani Sichani
- Department of Biochemistry, Islamic Azad University, Falavarjan Branch, I.R. Iran
| | - Behnaz Yazdani
- Bioscience Department, Faculty of Science and Technology (FCT), Universitat de Vic—Universitat Central de Catalunya (Uvic-UCC), 08500 Vic, Spain
| | - Mehdi Azizmohammad Looha
- Basic and Molecular Epidemiology of Gastrointestinal Disorders Research Center, Research Institute for Gastroenterology and Liver Diseases, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | - Hajar Sirous
- Bioinformatics Research Center, School of Pharmacy and Pharmaceutical Sciences, Isfahan University of Medical Sciences, Isfahan, I.R. Iran
| |
Collapse
|
2
|
Biesiada M, Hu MY, Williams LD, Purzycka KJ, Petrov AS. rRNA expansion segment 7 in eukaryotes: from Signature Fold to tentacles. Nucleic Acids Res 2022; 50:10717-10732. [PMID: 36200812 PMCID: PMC9561286 DOI: 10.1093/nar/gkac844] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2021] [Revised: 09/13/2022] [Accepted: 09/22/2022] [Indexed: 11/14/2022] Open
Abstract
The ribosomal core is universally conserved across the tree of life. However, eukaryotic ribosomes contain diverse rRNA expansion segments (ESs) on their surfaces. Sites of ES insertions are predicted from sites of insertion of micro-ESs in archaea. Expansion segment 7 (ES7) is one of the most diverse regions of the ribosome, emanating from a short stem loop and ranging to over 750 nucleotides in mammals. We present secondary and full-atom 3D structures of ES7 from species spanning eukaryotic diversity. Our results are based on experimental 3D structures, the accretion model of ribosomal evolution, phylogenetic relationships, multiple sequence alignments, RNA folding algorithms and 3D modeling by RNAComposer. ES7 contains a distinct motif, the 'ES7 Signature Fold', which is generally invariant in 2D topology and 3D structure in all eukaryotic ribosomes. We establish a model in which ES7 developed over evolution through a series of elementary and recursive growth events. The data are sufficient to support an atomic-level accretion path for rRNA growth. The non-monophyletic distribution of some ES7 features across the phylogeny suggests acquisition via convergent processes. And finally, illustrating the power of our approach, we constructed the 2D and 3D structure of the entire LSU rRNA of Mus musculus.
Collapse
Affiliation(s)
- Marcin Biesiada
- Institute of Bioorganic Chemistry, Polish Academy of Sciences, Poznan 61-704, Poland
| | - Michael Y Hu
- Center for the Origins of Life, Georgia Institute of Technology, Atlanta, GA 30332, USA.,School of Chemistry and Biochemistry, Georgia Institute of Technology, Atlanta, GA 30332, USA
| | - Loren Dean Williams
- Center for the Origins of Life, Georgia Institute of Technology, Atlanta, GA 30332, USA.,School of Chemistry and Biochemistry, Georgia Institute of Technology, Atlanta, GA 30332, USA
| | - Katarzyna J Purzycka
- Institute of Bioorganic Chemistry, Polish Academy of Sciences, Poznan 61-704, Poland
| | - Anton S Petrov
- Center for the Origins of Life, Georgia Institute of Technology, Atlanta, GA 30332, USA.,School of Chemistry and Biochemistry, Georgia Institute of Technology, Atlanta, GA 30332, USA
| |
Collapse
|
3
|
Pervouchine DD. Towards Long-Range RNA Structure Prediction in Eukaryotic Genes. Genes (Basel) 2018; 9:genes9060302. [PMID: 29914113 PMCID: PMC6027157 DOI: 10.3390/genes9060302] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2018] [Revised: 06/13/2018] [Accepted: 06/13/2018] [Indexed: 01/03/2023] Open
Abstract
The ability to form an intramolecular structure plays a fundamental role in eukaryotic RNA biogenesis. Proximate regions in the primary transcripts fold into a local secondary structure, which is then hierarchically assembled into a tertiary structure that is stabilized by RNA-binding proteins and long-range intramolecular base pairings. While the local RNA structure can be predicted reasonably well for short sequences, long-range structure at the scale of eukaryotic genes remains problematic from the computational standpoint. The aim of this review is to list functional examples of long-range RNA structures, to summarize current comparative methods of structure prediction, and to highlight their advances and limitations in the context of long-range RNA structures. Most comparative methods implement the “first-align-then-fold” principle, i.e., they operate on multiple sequence alignments, while functional RNA structures often reside in non-conserved parts of the primary transcripts. The opposite “first-fold-then-align” approach is currently explored to a much lesser extent. Developing novel methods in both directions will improve the performance of comparative RNA structure analysis and help discover novel long-range structures, their higher-order organization, and RNA⁻RNA interactions across the transcriptome.
Collapse
Affiliation(s)
- Dmitri D Pervouchine
- Skolkovo Institute for Science and Technology, Ulitsa Nobelya 3, Moscow 121205, Russia.
- The Faculty of Bioengineering and Bioinformatics, Moscow State University 1-73, Moscow 119899, Russia.
- Faculty of Computer Science, Higher School of Economics, Kochnovskiy Proyezd 3, Moscow 125319, Russia.
| |
Collapse
|
4
|
Meyer IM. In silico methods for co-transcriptional RNA secondary structure prediction and for investigating alternative RNA structure expression. Methods 2017; 120:3-16. [PMID: 28433606 DOI: 10.1016/j.ymeth.2017.04.009] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2017] [Revised: 03/16/2017] [Accepted: 04/14/2017] [Indexed: 01/26/2023] Open
Abstract
RNA transcripts are the primary products of active genes in any living organism, including many viruses. Their cellular destiny not only depends on primary sequence signals, but can also be determined by RNA structure. Recent experimental evidence shows that many transcripts can be assigned more than a single functional RNA structure throughout their cellular life and that structure formation happens co-transcriptionally, i.e. as the transcript is synthesised in the cell. Moreover, functional RNA structures are not limited to non-coding transcripts, but can also feature in coding transcripts. The picture that now emerges is that RNA structures constitute an additional layer of information that can be encoded in any RNA transcript (and on top of other layers of information such as protein-context) in order to exert a wide range of functional roles. Moreover, different encoded RNA structures can be expressed at different stages of a transcript's life in order to alter the transcript's behaviour depending on its actual cellular context. Similar to the concept of alternative splicing for protein-coding genes, where a single transcript can yield different proteins depending on cellular context, it is thus appropriate to propose the notion of alternative RNA structure expression for any given transcript. This review introduces several computational strategies that my group developed to detect different aspects of RNA structure expression in vivo. Two aspects are of particular interest to us: (1) RNA secondary structure features that emerge during co-transcriptional folding and (2) functional RNA structure features that are expressed at different times of a transcript's life and potentially mutually exclusive.
Collapse
Affiliation(s)
- Irmtraud M Meyer
- Laboratory of Bioinformatics of RNA Structure and Transcriptome Regulation, Berlin Institute for Medical Systems Biology, Max Delbrück Center for Molecular Medicine, Robert-Rössle-Str. 10, 13125 Berlin-Buch, Germany; Institute of Chemistry and Biochemistry, Free University, Thielallee 63, 14195 Berlin, Germany.
| |
Collapse
|
5
|
An RNA secondary structure prediction method based on minimum and suboptimal free energy structures. J Theor Biol 2015; 380:473-9. [PMID: 26100179 DOI: 10.1016/j.jtbi.2015.05.007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2014] [Revised: 03/12/2015] [Accepted: 05/04/2015] [Indexed: 11/24/2022]
Abstract
The function of an RNA-molecule is mainly determined by its tertiary structures. And its secondary structure is an important determinant of its tertiary structure. The comparative methods usually give better results than the single-sequence methods. Based on minimum and suboptimal free energy structures, the paper presents a novel method for predicting conserved secondary structure of a group of related RNAs. In the method, the information from the known RNA structures is used as training data in a SVM (Support Vector Machine) classifier. Our method has been tested on the benchmark dataset given by Puton et al. The results show that the average sensitivity of our method is higher than that of other comparative methods such as CentroidAlifold, MXScrana, RNAalifold, and TurboFold.
Collapse
|
6
|
Fiscon G, Paci P, Iannello G. MONSTER v1.1: a tool to extract and search for RNA non-branching structures. BMC Genomics 2015; 16:S1. [PMID: 26047478 PMCID: PMC4460781 DOI: 10.1186/1471-2164-16-s6-s1] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
Background Detection of RNA structure similarities is still one of the major computational problems in the discovery of RNA functions. A case in point is the study of the new appreciated long non-coding RNAs (lncRNAs), emerging as new players involved in many cellular processes and molecular interactions. Among several mechanisms of action, some lncRNAs show specific substructures that are likely to be instrumental for their functioning. For instance, it has been reported in literature that some lncRNAs have a guiding or scaffolding role by binding chromatin-modifying protein complexes. Thus, a functionally characterized lncRNA (reference) can be used to infer the function of others that are functionally unknown (target), based on shared structural motifs. Methods In our previous work we presented a tool, MONSTER v1.0, able to identify structural motifs shared between two full-length RNAs. Our procedure is mainly composed of two ad-hoc developed algorithms: nbRSSP_extractor for characterizing the folding of an RNA sequence by means of a sequence-structure descriptor (i.e., an array of non-overlapping substructures located on the RNA sequence and coded by dot-bracket notation); and SSD_finder, to enable an effective search engine for groups of matches (i.e., chains) common to the reference and target RNA based on a dynamic programming approach with a new score function. Here, we present an updated version of the previous one (MONSTER v1.1) accounting for the peculiar feature of lncRNAs that are not expected to have a unique fold, but appear to fluctuate among a large number of equally-stable folds. In particular, we improved our SSD_finder algorithm in order to take into account all the alternative equally-stable structures. Results We present an application of MONSTER v1.1 on lincRNAs, which are a specific class of lncRNAs located in genomic regions which do not overlap protein-coding genes. In particular, we provide reliable predictions of the shared chains between HOTAIR, ANRIL and COLDAIR. The latter are lincRNAs which interact with the same protein complexes of the Polycomb group and hence they are expected to share structural motifs. Software availability: the software package is provided as additional file 1 ("archive_updated.zip").
Collapse
|
7
|
Bai Y, Dai X, Harrison A, Johnston C, Chen M. Toward a next-generation atlas of RNA secondary structure. Brief Bioinform 2015; 17:63-77. [PMID: 25922372 DOI: 10.1093/bib/bbv026] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2015] [Indexed: 12/23/2022] Open
Abstract
RNA structure plays a crucial role in gene maturation, regulation and function. Determining the form and frequency of RNA folds is essential for a better understanding of how RNA exerts its functions. Low-throughput studies have focused on RNA primary sequences and expression levels, but with an emphasis on relatively small numbers of transcripts. However, with the recent advent of high-throughput technologies, it is realistic to begin analyzing RNA secondary structures on a genome-wide scale. Here, we review genome-wide RNA secondary structure profiles as well as advances in computational structure predictions. We further discuss the novel characteristics of RNA secondary structure across messenger RNAs. Probing RNA secondary structure by high-throughput sequencing will enable us to build atlases of RNA secondary structures, an important step in helping us to understand the versatility of RNA functions in diverse cellular processes.
Collapse
|
8
|
Asai K, Hamada M. RNA structural alignments, part II: non-Sankoff approaches for structural alignments. Methods Mol Biol 2014; 1097:291-301. [PMID: 24639165 DOI: 10.1007/978-1-62703-709-9_14] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
Abstract
In structural alignments of RNA sequences, the computational cost of Sankoff algorithm, which simultaneously optimizes the score of the common secondary structure and the score of the alignment, is too high for long sequences (O(L (6)) time for two sequences of length L). In this chapter, we introduce the methods that predict the structures and the alignment separately to avoid the heavy computations in Sankoff algorithm. In those methods, neither of those two prediction processes is independent, but each of them utilizes the information of the other process. The first process typically includes prediction of base-pairing probabilities (BPPs) or the candidates of the stems, and the alignment process utilizes those results. At the same time, it is also important to reflect the information of the alignment to the structure prediction. This idea can be implemented as the probabilistic transformation (PCT) of BPPs using the potential alignment. As same as for all the estimation problems, it is important to define the evaluation measure for the structural alignment. The principle of maximum expected accuracy (MEA) is applicable for sum-of-pairs (SPS) score based on the reference alignment.
Collapse
Affiliation(s)
- Kiyoshi Asai
- Computational Biology Research Center (CBRC), National Institute of Advanced Industrial Science and Technology (AIST), Koto-ku, Tokyo, Japan
| | | |
Collapse
|
9
|
Jabbari H, Condon A. A fast and robust iterative algorithm for prediction of RNA pseudoknotted secondary structures. BMC Bioinformatics 2014; 15:147. [PMID: 24884954 PMCID: PMC4064103 DOI: 10.1186/1471-2105-15-147] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2014] [Accepted: 05/08/2014] [Indexed: 12/12/2022] Open
Abstract
Background Improving accuracy and efficiency of computational methods that predict pseudoknotted RNA secondary structures is an ongoing challenge. Existing methods based on free energy minimization tend to be very slow and are limited in the types of pseudoknots that they can predict. Incorporating known structural information can improve prediction accuracy; however, there are not many methods for prediction of pseudoknotted structures that can incorporate structural information as input. There is even less understanding of the relative robustness of these methods with respect to partial information. Results We present a new method, Iterative HFold, for pseudoknotted RNA secondary structure prediction. Iterative HFold takes as input a pseudoknot-free structure, and produces a possibly pseudoknotted structure whose energy is at least as low as that of any (density-2) pseudoknotted structure containing the input structure. Iterative HFold leverages strengths of earlier methods, namely the fast running time of HFold, a method that is based on the hierarchical folding hypothesis, and the energy parameters of HotKnots V2.0. Our experimental evaluation on a large data set shows that Iterative HFold is robust with respect to partial information, with average accuracy on pseudoknotted structures steadily increasing from roughly 54% to 79% as the user provides up to 40% of the input structure. Iterative HFold is much faster than HotKnots V2.0, while having comparable accuracy. Iterative HFold also has significantly better accuracy than IPknot on our HK-PK and IP-pk168 data sets. Conclusions Iterative HFold is a robust method for prediction of pseudoknotted RNA secondary structures, whose accuracy with more than 5% information about true pseudoknot-free structures is better than that of IPknot, and with about 35% information about true pseudoknot-free structures compares well with that of HotKnots V2.0 while being significantly faster. Iterative HFold and all data used in this work are freely available at http://www.cs.ubc.ca/~hjabbari/software.php.
Collapse
Affiliation(s)
- Hosna Jabbari
- Department of Computer Science, University of British Columbia, 2366 Main Mall, Vancouver, Canada.
| | | |
Collapse
|
10
|
Lai D, Meyer IM. e-RNA: a collection of web servers for comparative RNA structure prediction and visualisation. Nucleic Acids Res 2014; 42:W373-6. [PMID: 24810851 PMCID: PMC4086097 DOI: 10.1093/nar/gku292] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/30/2023] Open
Abstract
e-RNA offers a free and open-access collection of five published RNA sequence analysis tools, each solving specific problems not readily addressed by other available tools. Given multiple sequence alignments, Transat detects all conserved helices, including those expected in a final structure, but also transient, alternative and pseudo-knotted helices. RNA-Decoder uses unique evolutionary models to detect conserved RNA secondary structure in alignments which may be partly protein-coding. SimulFold simultaneously co-estimates the potentially pseudo-knotted conserved structure, alignment and phylogenetic tree for a set of homologous input sequences. CoFold predicts the minimum-free energy structure for an input sequence while taking the effects of co-transcriptional folding into account, thereby greatly improving the prediction accuracy for long sequences. R-chie is a program to visualise RNA secondary structures as arc diagrams, allowing for easy comparison and analysis of conserved base-pairs and quantitative features. The web site server dispatches user jobs to a cluster, where up to 100 jobs can be processed in parallel. Upon job completion, users can retrieve their results via a bookmarked or emailed link. e-RNA is located at http://www.e-rna.org.
Collapse
Affiliation(s)
- Daniel Lai
- Centre for High-Throughput Biology, Department of Computer Science and Department of Medical Genetics, University of British Columbia, Vancouver V6T 1Z4, Canada
| | - Irmtraud M Meyer
- Centre for High-Throughput Biology, Department of Computer Science and Department of Medical Genetics, University of British Columbia, Vancouver V6T 1Z4, Canada
| |
Collapse
|
11
|
Abstract
De novo discovery of "motifs" capturing the commonalities among related noncoding ncRNA structured RNAs is among the most difficult problems in computational biology. This chapter outlines the challenges presented by this problem, together with some approaches towards solving them, with an emphasis on an approach based on the CMfinder CMfinder program as a case study. Applications to genomic screens for novel de novo structured ncRNA ncRNA s, including structured RNA elements in untranslated portions of protein-coding genes, are presented.
Collapse
Affiliation(s)
- Walter L Ruzzo
- Fred Hutchinson Cancer Research Center, Seattle, WA, 98109, USA
| | | |
Collapse
|
12
|
Lai D, Proctor JR, Meyer IM. On the importance of cotranscriptional RNA structure formation. RNA (NEW YORK, N.Y.) 2013; 19:1461-1473. [PMID: 24131802 PMCID: PMC3851714 DOI: 10.1261/rna.037390.112] [Citation(s) in RCA: 120] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/29/2023]
Abstract
The expression of genes, both coding and noncoding, can be significantly influenced by RNA structural features of their corresponding transcripts. There is by now mounting experimental and some theoretical evidence that structure formation in vivo starts during transcription and that this cotranscriptional folding determines the functional RNA structural features that are being formed. Several decades of research in bioinformatics have resulted in a wide range of computational methods for predicting RNA secondary structures. Almost all state-of-the-art methods in terms of prediction accuracy, however, completely ignore the process of structure formation and focus exclusively on the final RNA structure. This review hopes to bridge this gap. We summarize the existing evidence for cotranscriptional folding and then review the different, currently used strategies for RNA secondary-structure prediction. Finally, we propose a range of ideas on how state-of-the-art methods could be potentially improved by explicitly capturing the process of cotranscriptional structure formation.
Collapse
|
13
|
Zhu JYA, Steif A, Proctor JR, Meyer IM. Transient RNA structure features are evolutionarily conserved and can be computationally predicted. Nucleic Acids Res 2013; 41:6273-85. [PMID: 23625966 PMCID: PMC3695514 DOI: 10.1093/nar/gkt319] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023] Open
Abstract
Functional RNA structures tend to be conserved during evolution. This finding is, for example, exploited by comparative methods for RNA secondary structure prediction that currently provide the state-of-art in terms of prediction accuracy. We here provide strong evidence that homologous RNA genes not only fold into similar final RNA structures, but that their folding pathways also share common transient structural features that have been evolutionarily conserved. For this, we compile and investigate a non-redundant data set of 32 sequences with known transient and final RNA secondary structures and devise a dedicated computational analysis pipeline.
Collapse
Affiliation(s)
- Jing Yun A Zhu
- Centre for High-Throughput Biology, University of British Columbia, 2125 East Mall, Vancouver, British Columbia V6T 1Z4, Canada
| | | | | | | |
Collapse
|
14
|
Abstract
Bacterial, small RNAs were once regarded as potent regulators of gene expression and are now being considered as essential for their diversified roles. Many small RNAs are now reported to have a wide array of regulatory functions, ranging from environmental sensing to pathogenesis. Traditionally, noncoding transcripts were rarely detected by means of genetic screens. However, the availability of approximately 2200 prokaryotic genome sequences in public databases facilitates the efficient computational search of those molecules, followed by experimental validation. In principle, the following four major computational methods were applied for the prediction of sRNA locations from bacterial genome sequences: (1) comparative genomics, (2) secondary structure and thermodynamic stability, (3) ‘Orphan’ transcriptional signals and (4) ab initio methods regardless of sequence or structure similarity; most of these tools were applied to locate the putative genomic sRNA locations followed by experimental validation of those transcripts. Therefore, computational screening has simplified the sRNA identification process in bacteria. In this review, a plethora of small RNA prediction methods and tools that have been reported in the past decade are discussed comprehensively and assessed based on their attributes, compatibility, and their prediction accuracy.
Collapse
Affiliation(s)
- Jayavel Sridhar
- UGC-Networking Resource Centre in Biological Sciences, School of Biological Sciences, Madurai Kamaraj University, Madurai, TN, India
| | | |
Collapse
|
15
|
Puton T, Kozlowski LP, Rother KM, Bujnicki JM. CompaRNA: a server for continuous benchmarking of automated methods for RNA secondary structure prediction. Nucleic Acids Res 2013; 41:4307-23. [PMID: 23435231 PMCID: PMC3627593 DOI: 10.1093/nar/gkt101] [Citation(s) in RCA: 81] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022] Open
Abstract
We present a continuous benchmarking approach for the assessment of RNA secondary structure prediction methods implemented in the CompaRNA web server. As of 3 October 2012, the performance of 28 single-sequence and 13 comparative methods has been evaluated on RNA sequences/structures released weekly by the Protein Data Bank. We also provide a static benchmark generated on RNA 2D structures derived from the RNAstrand database. Benchmarks on both data sets offer insight into the relative performance of RNA secondary structure prediction methods on RNAs of different size and with respect to different types of structure. According to our tests, on the average, the most accurate predictions obtained by a comparative approach are generated by CentroidAlifold, MXScarna, RNAalifold and TurboFold. On the average, the most accurate predictions obtained by single-sequence analyses are generated by CentroidFold, ContextFold and IPknot. The best comparative methods typically outperform the best single-sequence methods if an alignment of homologous RNA sequences is available. This article presents the results of our benchmarks as of 3 October 2012, whereas the rankings presented online are continuously updated. We will gladly include new prediction methods and new measures of accuracy in the new editions of CompaRNA benchmarks.
Collapse
Affiliation(s)
- Tomasz Puton
- Bioinformatics Laboratory, Institute for Molecular Biology and Biotechnology, Faculty of Biology, Adam Mickiewicz University, ul. Umultowska 89, 61-614 Poznan, Poland
| | | | | | | |
Collapse
|
16
|
Cros MJ, de Monte A, Mariette J, Bardou P, Grenier-Boley B, Gautheret D, Touzet H, Gaspin C. RNAspace.org: An integrated environment for the prediction, annotation, and analysis of ncRNA. RNA (NEW YORK, N.Y.) 2011; 17:1947-56. [PMID: 21947200 PMCID: PMC3198588 DOI: 10.1261/rna.2844911] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/30/2011] [Accepted: 08/07/2011] [Indexed: 05/22/2023]
Abstract
The annotation of noncoding RNA genes remains a major bottleneck in genome sequencing projects. Most genome sequences released today still come with sets of tRNAs and rRNAs as the only annotated RNA elements, ignoring hundreds of other RNA families. We have developed a web environment that is dedicated to noncoding RNA (ncRNA) prediction, annotation, and analysis and allows users to run a variety of tools in an integrated and flexible manner. This environment offers complementary ncRNA gene finders and a set of tools for the comparison, visualization, editing, and export of ncRNA candidates. Predictions can be filtered according to a large set of characteristics. Based on this environment, we created a public website located at http://RNAspace.org. It accepts genomic sequences up to 5 Mb, which permits for an online annotation of a complete bacterial genome or a small eukaryotic chromosome. The project is hosted as a Source Forge project (http://rnaspace.sourceforge.net/).
Collapse
Affiliation(s)
| | - Antoine de Monte
- LIFL, UMR CNRS 8022 Université Lille 1 and INRIA Lille Nord Europe, 59655 Villeneuve d'Ascq cedex, France
| | - Jérôme Mariette
- INRA, Plateforme Bioinformatique, F-31320, UR 875, Castanet-Tolosan, France
| | | | - Benjamin Grenier-Boley
- LIFL, UMR CNRS 8022 Université Lille 1 and INRIA Lille Nord Europe, 59655 Villeneuve d'Ascq cedex, France
| | | | - Hélène Touzet
- LIFL, UMR CNRS 8022 Université Lille 1 and INRIA Lille Nord Europe, 59655 Villeneuve d'Ascq cedex, France
| | - Christine Gaspin
- INRA, UBIA, UR 875, F-31320 Castanet-Tolosan, France
- INRA, Plateforme Bioinformatique, F-31320, UR 875, Castanet-Tolosan, France
| |
Collapse
|
17
|
Zou Q, Lin C, Liu XY, Han YP, Li WB, Guo MZ. Novel representation of RNA secondary structure used to improve prediction algorithms. GENETICS AND MOLECULAR RESEARCH 2011; 10:1986-98. [PMID: 21948761 DOI: 10.4238/vol10-3gmr1181] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/03/2022]
Abstract
We propose a novel representation of RNA secondary structure for a quick comparison of different structures. Secondary structure was viewed as a set of stems and each stem was represented by two values according to its position. Using this representation, we improved the comparative sequence analysis method results and the minimum free-energy model. In the comparative sequence analysis method, a novel algorithm independent of multiple sequence alignment was developed to improve performance. When dealing with a single-RNA sequence, the minimum free-energy model is improved by combining it with RNA class information. Secondary structure prediction experiments were done on tRNA and RNAse P RNA; sensitivity and specificity were both improved. Furthermore, software programs were developed for non-commercial use.
Collapse
Affiliation(s)
- Q Zou
- School of Information Science and Technology, Xiamen University, Xiamen, China
| | | | | | | | | | | |
Collapse
|
18
|
Achawanantakun R, Sun Y, Takyar SS. ncRNA consensus secondary structure derivation using grammar strings. J Bioinform Comput Biol 2011; 9:317-37. [PMID: 21523935 DOI: 10.1142/s0219720011005501] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2011] [Revised: 02/28/2011] [Accepted: 03/01/2011] [Indexed: 11/18/2022]
Abstract
Many noncoding RNAs (ncRNAs) function through both their sequences and secondary structures. Thus, secondary structure derivation is an important issue in today's RNA research. The state-of-the-art structure annotation tools are based on comparative analysis, which derives consensus structure of homologous ncRNAs. Despite promising results from existing ncRNA aligning and consensus structure derivation tools, there is a need for more efficient and accurate ncRNA secondary structure modeling and alignment methods. In this work, we introduce a consensus structure derivation approach based on grammar string, a novel ncRNA secondary structure representation that encodes an ncRNA's sequence and secondary structure in the parameter space of a context-free grammar (CFG) and a full RNA grammar including pseudoknots. Being a string defined on a special alphabet constructed from a grammar, grammar string converts ncRNA alignment into sequence alignment. We derive consensus secondary structures from hundreds of ncRNA families from BraliBase 2.1 and 25 families containing pseudoknots using grammar string alignment. Our experiments have shown that grammar string-based structure derivation competes favorably in consensus structure quality with Murlet and RNASampler. Source code and experimental data are available at http://www.cse.msu.edu/~yannisun/grammar-string.
Collapse
Affiliation(s)
- Rujira Achawanantakun
- Computer Science and Engineering Department, Michigan State University, East Lansing, Michigan 48824, USA
| | | | | |
Collapse
|
19
|
Abstract
RNA localisation is an important mode of delivering proteins to their site of function. Cis-acting signals within the RNAs, which can be thought of as zip-codes, determine the site of localisation. There are few examples of fully characterised RNA signals, but the signals are thought to be defined through a combination of primary, secondary, and tertiary structures. In this chapter, we describe a selection of computational methods for predicting RNA secondary structure, identifying localisation signals, and searching for similar localisation signals on a genome-wide scale. The chapter is aimed at the biologist rather than presenting the details of each of the individual methods.
Collapse
|
20
|
Richard AL, Withey JH, Beyhan S, Yildiz F, DiRita VJ. The Vibrio cholerae virulence regulatory cascade controls glucose uptake through activation of TarA, a small regulatory RNA. Mol Microbiol 2010; 78:1171-81. [PMID: 21091503 DOI: 10.1111/j.1365-2958.2010.07397.x] [Citation(s) in RCA: 38] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/16/2023]
Abstract
Vibrio cholerae causes the severe diarrhoeal disease cholera. A cascade of regulators controls expression of virulence determinants in V. cholerae at both transcriptional and post-transcriptional levels. ToxT is the direct transcription activator of the major virulence genes in V. cholerae. Here we describe TarA, a highly conserved, small regulatory RNA, whose transcription is activated by ToxT from toxboxes present upstream of the ToxT-activated gene tcpI. TarA regulates ptsG, encoding a major glucose transporter in V. cholerae. Cells overexpressing TarA exhibit decreased steady-state levels of ptsG mRNA and grow poorly in glucose-minimal media. A mutant lacking the ubiquitous regulatory protein Hfq expresses diminished TarA levels, indicating that TarA likely interacts with Hfq to regulate gene expression. RNAhybrid analysis of TarA and the putative ptsG mRNA leader suggests potential productive base-pairing between these two RNA molecules. A V. cholerae mutant lacking TarA is compromised for infant mouse colonization in competition with wild type, suggesting a role in the in vivo fitness of V. cholerae. Although somewhat functionally analogous to SgrS of Escherichia coli, TarA does not encode a regulatory peptide, and its expression is activated by the virulence gene pathway in V. cholerae and not by glycolytic intermediates.
Collapse
Affiliation(s)
- Aimee L Richard
- Unit for Laboratory Animal Medicine, Department of Microbiology and Immunology, University of Michigan Medical School, Ann Arbor, MI 48109, USA
| | | | | | | | | |
Collapse
|
21
|
Laing C, Schlick T. Computational approaches to 3D modeling of RNA. JOURNAL OF PHYSICS. CONDENSED MATTER : AN INSTITUTE OF PHYSICS JOURNAL 2010; 22:283101. [PMID: 21399271 PMCID: PMC6286080 DOI: 10.1088/0953-8984/22/28/283101] [Citation(s) in RCA: 75] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/30/2023]
Abstract
Many exciting discoveries have recently revealed the versatility of RNA and its importance in a variety of functions within the cell. Since the structural features of RNA are of major importance to their biological function, there is much interest in predicting RNA structure, either in free form or in interaction with various ligands, including proteins, metabolites and other molecules. In recent years, an increasing number of researchers have developed novel RNA algorithms for predicting RNA secondary and tertiary structures. In this review, we describe current experimental and computational advances and discuss recent ideas that are transforming the traditional view of RNA folding. To evaluate the performance of the most recent RNA 3D folding algorithms, we provide a comparative study in order to test the performance of available 3D structure prediction algorithms for an RNA data set of 43 structures of various lengths and motifs. We find that the algorithms vary widely in terms of prediction quality across different RNA lengths and topologies; most predictions have very large root mean square deviations from the experimental structure. We conclude by outlining some suggestions for future RNA folding research.
Collapse
Affiliation(s)
- Christian Laing
- Department of Chemistry and Courant Institute of Mathematical Sciences, New York University, 251 Mercer Street, New York, NY 10012, USA
| | | |
Collapse
|
22
|
Wiebe NJP, Meyer IM. TRANSAT-- method for detecting the conserved helices of functional RNA structures, including transient, pseudo-knotted and alternative structures. PLoS Comput Biol 2010; 6:e1000823. [PMID: 20589081 PMCID: PMC2891591 DOI: 10.1371/journal.pcbi.1000823] [Citation(s) in RCA: 29] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2009] [Accepted: 05/19/2010] [Indexed: 12/20/2022] Open
Abstract
The prediction of functional RNA structures has attracted increased interest, as it allows us to study the potential functional roles of many genes. RNA structure prediction methods, however, assume that there is a unique functional RNA structure and also do not predict functional features required for in vivo folding. In order to understand how functional RNA structures form in vivo, we require sophisticated experiments or reliable prediction methods. So far, there exist only a few, experimentally validated transient RNA structures. On the computational side, there exist several computer programs which aim to predict the co-transcriptional folding pathway in vivo, but these make a range of simplifying assumptions and do not capture all features known to influence RNA folding in vivo. We want to investigate if evolutionarily related RNA genes fold in a similar way in vivo. To this end, we have developed a new computational method, Transat, which detects conserved helices of high statistical significance. We introduce the method, present a comprehensive performance evaluation and show that Transat is able to predict the structural features of known reference structures including pseudo-knotted ones as well as those of known alternative structural configurations. Transat can also identify unstructured sub-sequences bound by other molecules and provides evidence for new helices which may define folding pathways, supporting the notion that homologous RNA sequence not only assume a similar reference RNA structure, but also fold similarly. Finally, we show that the structural features predicted by Transat differ from those assuming thermodynamic equilibrium. Unlike the existing methods for predicting folding pathways, our method works in a comparative way. This has the disadvantage of not being able to predict features as function of time, but has the considerable advantage of highlighting conserved features and of not requiring a detailed knowledge of the cellular environment. Many non-coding genes exert their function via an RNA structure which starts emerging while the RNA sequence is being transcribed from the genome. The resulting folding pathway is known to depend on a variety of features such as the transcription speed, the concentration of various ions and the binding of proteins and other molecules. Not all of these influences can be adequately captured by the existing computational methods which try to replicate what happens in vivo. So far, it has been challenging to experimentally investigate co-transcriptional folding pathways in vivo and only little data from in vitro experiments exists. In order to investigate if functionally similar RNA sequences from different organisms fold in a similar way, we have developed a new computational method, called Transat, which does not require the detailed computational modeling of the cellular environment. We show in a comprehensive analysis that our method is capable of detecting known structural features and provide evidence that structural features of the in vivo folding pathways have been conserved for several biologically interesting classes of RNA sequences.
Collapse
Affiliation(s)
- Nicholas J. P. Wiebe
- Centre for High-Throughput Biology & Department of Computer Science and Department of Medical Genetics, University of British Columbia, Vancouver, British Columbia, Canada
| | - Irmtraud M. Meyer
- Centre for High-Throughput Biology & Department of Computer Science and Department of Medical Genetics, University of British Columbia, Vancouver, British Columbia, Canada
- * E-mail:
| |
Collapse
|
23
|
Abstract
We give an overview of RNA structure predictions in this chapter. We discuss here the main approaches to RNA structure prediction: combinatorial approaches, comparative approaches, and kinetic approaches. The main algorithms and mathematical concepts such as transformational grammars will be briefly introduced.
Collapse
Affiliation(s)
- István Miklós
- Rényi Institute, Hungarian Academy of Sciences, Budapest, Hungary.
| |
Collapse
|
24
|
Simultaneous alignment and folding of 28S rRNA sequences uncovers phylogenetic signal in structure variation. Mol Phylogenet Evol 2009; 53:758-71. [DOI: 10.1016/j.ympev.2009.07.033] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2009] [Revised: 07/22/2009] [Accepted: 07/28/2009] [Indexed: 11/21/2022]
|
25
|
|
26
|
Fan D, Bitterman PB, Larsson O. Regulatory element identification in subsets of transcripts: comparison and integration of current computational methods. RNA (NEW YORK, N.Y.) 2009; 15:1469-82. [PMID: 19553345 PMCID: PMC2714745 DOI: 10.1261/rna.1617009] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/25/2009] [Accepted: 05/20/2009] [Indexed: 05/20/2023]
Abstract
Regulatory elements in mRNA play an often pivotal role in post-transcriptional regulation of gene expression. However, a systematic approach to efficiently identify putative regulatory elements from sets of post-transcriptionally coregulated genes is lacking, hampering studies of coregulation mechanisms. Although there are several analytical methods that can be used to detect conserved mRNA regulatory elements in a set of transcripts, there has been no systematic study of how well any of these methods perform individually or as a group. We therefore compared how well three algorithms, each based on a different principle (enumeration, optimization, or structure/sequence profiles), can identify elements in unaligned untranslated sequence regions. Two algorithms were originally designed to detect transcription factor binding sites, Weeder and BioProspector; and one was designed to detect RNA elements conserved in structure, RNAProfile. Three types of elements were examined: (1) elements conserved in both primary sequence and secondary structure; (2) elements conserved only in primary sequence; and (3) microRNA targets. Our results indicate that all methods can uniquely identify certain known RNA elements, and therefore, integrating the output from all algorithms leads to the most complete identification of elements. We therefore developed an approach to integrate results and guide selection of candidate elements from several algorithms presented as a web service (https://dbw.msi.umn.edu:8443/recit). These findings together with the approach for integration can be used to identify candidate elements from genome-wide post-transcriptional profiling data sets.
Collapse
Affiliation(s)
- Danhua Fan
- Department of Medicine, University of Minnesota, Minneapolis, Minnesota 55455, USA
| | | | | |
Collapse
|
27
|
Spirollari J, Wang JTL, Zhang K, Bellofatto V, Park Y, Shapiro BA. Predicting consensus structures for RNA alignments via pseudo-energy minimization. Bioinform Biol Insights 2009; 3:51-69. [PMID: 20140072 PMCID: PMC2808183 DOI: 10.4137/bbi.s2578] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022] Open
Abstract
Thermodynamic processes with free energy parameters are often used in algorithms that solve the free energy minimization problem to predict secondary structures of single RNA sequences. While results from these algorithms are promising, an observation is that single sequence-based methods have moderate accuracy and more information is needed to improve on RNA secondary structure prediction, such as covariance scores obtained from multiple sequence alignments. We present in this paper a new approach to predicting the consensus secondary structure of a set of aligned RNA sequences via pseudo-energy minimization. Our tool, called RSpredict, takes into account sequence covariation and employs effective heuristics for accuracy improvement. RSpredict accepts, as input data, a multiple sequence alignment in FASTA or ClustalW format and outputs the consensus secondary structure of the input sequences in both the Vienna style Dot Bracket format and the Connectivity Table format. Our method was compared with some widely used tools including KNetFold, Pfold and RNAalifold. A comprehensive test on different datasets including Rfam sequence alignments and a multiple sequence alignment obtained from our study on the Drosophila X chromosome reveals that RSpredict is competitive with the existing tools on the tested datasets. RSpredict is freely available online as a web server and also as a jar file for download at http://datalab.njit.edu/biology/RSpredict.
Collapse
Affiliation(s)
- Junilda Spirollari
- Bioinformatics Program, Department of Computer Science, New Jersey Institute of Technology, University Heights, Newark, NJ 07102, U.S.A
| | | | | | | | | | | |
Collapse
|
28
|
Do CB, Foo CS, Batzoglou S. A max-margin model for efficient simultaneous alignment and folding of RNA sequences. Bioinformatics 2008; 24:i68-76. [PMID: 18586747 PMCID: PMC2718655 DOI: 10.1093/bioinformatics/btn177] [Citation(s) in RCA: 69] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION The need for accurate and efficient tools for computational RNA structure analysis has become increasingly apparent over the last several years: RNA folding algorithms underlie numerous applications in bioinformatics, ranging from microarray probe selection to de novo non-coding RNA gene prediction. In this work, we present RAF (RNA Alignment and Folding), an efficient algorithm for simultaneous alignment and consensus folding of unaligned RNA sequences. Algorithmically, RAF exploits sparsity in the set of likely pairing and alignment candidates for each nucleotide (as identified by the CONTRAfold or CONTRAlign programs) to achieve an effectively quadratic running time for simultaneous pairwise alignment and folding. RAF's fast sparse dynamic programming, in turn, serves as the inference engine within a discriminative machine learning algorithm for parameter estimation. RESULTS In cross-validated benchmark tests, RAF achieves accuracies equaling or surpassing the current best approaches for RNA multiple sequence secondary structure prediction. However, RAF requires nearly an order of magnitude less time than other simultaneous folding and alignment methods, thus making it especially appropriate for high-throughput studies. AVAILABILITY Source code for RAF is available at:http://contra.stanford.edu/contrafold/.
Collapse
Affiliation(s)
- Chuong B Do
- Computer Science Department, Stanford University, Stanford, CA 94305, USA.
| | | | | |
Collapse
|
29
|
Fontaine A, de Monte A, Touzet H. MAGNOLIA: multiple alignment of protein-coding and structural RNA sequences. Nucleic Acids Res 2008; 36:W14-8. [PMID: 18515348 PMCID: PMC2447753 DOI: 10.1093/nar/gkn321] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2008] [Revised: 04/26/2008] [Accepted: 05/07/2008] [Indexed: 11/25/2022] Open
Abstract
MAGNOLIA is a new software for multiple alignment of nucleic acid sequences, which are recognized to be hard to align. The idea is that the multiple alignment process should be improved by taking into account the putative function of the sequences. In this perspective, MAGNOLIA is especially designed for sequences that are intended to be either protein-coding or structural RNAs. It extracts information from the similarities and differences in the data, and searches for a specific evolutionary pattern between sequences before aligning them. The alignment step then incorporates this information to achieve higher accuracy. The website is available at http://bioinfo.lifl.fr/magnolia.
Collapse
Affiliation(s)
| | | | - Hélène Touzet
- LIFL (UMR CNRS 8022 Université Lille 1) – INRIA Lille-Nord Europe
| |
Collapse
|
30
|
Murphy KL, Zhang X, Gainetdinov RR, Beaulieu JM, Caron MG. A regulatory domain in the N terminus of tryptophan hydroxylase 2 controls enzyme expression. J Biol Chem 2008; 283:13216-24. [PMID: 18339632 PMCID: PMC2442358 DOI: 10.1074/jbc.m706749200] [Citation(s) in RCA: 29] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2007] [Revised: 03/12/2008] [Indexed: 01/10/2023] Open
Abstract
Serotonin is involved in a variety of physiological processes in the central nervous system and the periphery. As the rate-limiting enzyme in serotonin synthesis, tryptophan hydroxylase plays an important role in modulating these processes. Of the two variants of tryptophan hydroxylase, tryptophan hydroxylase 2 (TPH2) is expressed predominantly in the central nervous system, whereas tryptophan hydroxylase 1 (TPH1) is expressed mostly in peripheral tissues. Although the two enzymes share considerable sequence homology, the regulatory domain of TPH2 contains an additional 41 amino acids at the N terminus that TPH1 lacks. Here we show that the extended TPH2 N-terminal domain contains a unique sequence involved in the regulation of enzyme expression. When expressed in cultured mammalian cells, TPH2 is synthesized less efficiently and is also less stable than TPH1. Removal of the unique portion of the N terminus of TPH2 results in expression of the enzyme at a level similar to that of TPH1, whereas protein chimeras containing this fragment are expressed at lower levels than their wild-type counterparts. We identify a region centered on amino acids 10-20 that mediates the bulk of this effect. We also demonstrate that phosphorylation of serine 19, a protein kinase A consensus site located in this N-terminal domain, results in increased TPH2 stability and consequent increases in enzyme output in cell culture systems. Because this domain is unique to TPH2, these data provide evidence for selective regulation of brain serotonin synthesis.
Collapse
Affiliation(s)
- Karen L Murphy
- Department of Neurobiology and Cell Biology, Duke University Medical Center, Durham, North Carolina 27710, USA
| | | | | | | | | |
Collapse
|
31
|
Abstract
As the number of sequenced genomes increases, the ability to deduce genome function becomes increasingly salient. For many genome sequences, the only annotation that will be available for the foreseeable future will be based on computational predictions and comparisons with functional elements in related species. Here we discuss computational approaches for automated genome-wide annotation of functional elements in mammalian genomes. These include methods for ab initio and comparative gene-structure predictions. Gene features such as intron splice sites, 3' untranslated regions, promoters, and cis-regulatory elements are discussed, as is a novel method for predicting DNaseI hypersensitive sites. Recent methodologies for predicting noncoding RNA genes, including microRNA genes and their targets, are also reviewed.
Collapse
Affiliation(s)
- Steven J M Jones
- Genome Sciences Centre, British Columbia Cancer Research Center, Vancouver, British Columbia, V5Z 1L3, Canada.
| |
Collapse
|
32
|
Namy O, Zhou Y, Gundllapalli S, Polycarpo CR, Denise A, Rousset JP, Söll D, Ambrogelly A. Adding pyrrolysine to the Escherichia coli genetic code. FEBS Lett 2007; 581:5282-8. [PMID: 17967457 DOI: 10.1016/j.febslet.2007.10.022] [Citation(s) in RCA: 52] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2007] [Accepted: 10/10/2007] [Indexed: 11/24/2022]
Abstract
Pyrrolysyl-tRNA synthetase and its cognate suppressor tRNA(Pyl) mediate pyrrolysine (Pyl) insertion at in frame UAG codons. The presence of an RNA hairpin structure named Pyl insertion structure (PYLIS) downstream of the suppression site has been shown to stimulate the insertion of Pyl in archaea. We study here the impact of the presence of PYLIS on the level of Pyl and the Pyl analog N-epsilon-cyclopentyloxycarbonyl-l-lysine (Cyc) incorporation using a quantitative lacZ-luc tandem reporter system in an Escherichia coli context. We show that PYLIS has no effect on the level of neither Pyl nor Cyc incorporation. Exogenously supplying our reporter system with d-ornithine significantly increases suppression efficiency, indicating that d-ornithine is a direct precursor to Pyl.
Collapse
Affiliation(s)
- Olivier Namy
- Institut de Genetique et Microbiologie, Université Paris-Sud, CNRS UMR8621, Orsay F-91405, France
| | | | | | | | | | | | | | | |
Collapse
|
33
|
Abstract
The tmRNA-SmpB system releases ribosomes stalled on truncated mRNAs and tags the nascent polypeptides to target them for proteolysis. In many species, mutations that disrupt tmRNA activity cause defects in growth or development. In Caulobacter crescentus cells lacking tmRNA activity there is a delay in the initiation of DNA replication, which disrupts the cell cycle. To understand the molecular basis for this phenotype, 73 C. crescentus proteins were identified that are tagged by tmRNA under normal growth conditions. Among these substrates, proteins involved in DNA replication, recombination, and repair were overrepresented, suggesting that misregulation of these factors in the absence of tmRNA activity might be responsible for the delay in initiation of DNA replication. Analysis of the tagging sites within these substrates revealed a conserved nucleotide motif 5' of the tagging site, which is required for wild-type tmRNA tagging.
Collapse
|
34
|
Machado-Lima A, del Portillo HA, Durham AM. Computational methods in noncoding RNA research. J Math Biol 2007; 56:15-49. [PMID: 17786447 DOI: 10.1007/s00285-007-0122-6] [Citation(s) in RCA: 45] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/01/2007] [Indexed: 11/26/2022]
Abstract
Non protein-coding RNAs (ncRNAs) are a research hotspot in bioinformatics. Recent discoveries have revealed new ncRNA families performing a variety of roles, from gene expression regulation to catalytic activities. It is also believed that other families are still to be unveiled. Computational methods developed for protein coding genes often fail when searching for ncRNAs. Noncoding RNAs functionality is often heavily dependent on their secondary structure, which makes gene discovery very different from protein coding RNA genes. This motivated the development of specific methods for ncRNA research. This article reviews the main approaches used to identify ncRNAs and predict secondary structure.
Collapse
Affiliation(s)
- Ariane Machado-Lima
- Institute of Mathematics and Statistics, University of Sao Paulo, Sao Paulo, SP, Brazil.
| | | | | |
Collapse
|
35
|
Yen ZC, Meyer IM, Karalic S, Brown CJ. A cross-species comparison of X-chromosome inactivation in Eutheria. Genomics 2007; 90:453-63. [PMID: 17728098 DOI: 10.1016/j.ygeno.2007.07.002] [Citation(s) in RCA: 50] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2006] [Revised: 05/25/2007] [Accepted: 07/04/2007] [Indexed: 10/22/2022]
Abstract
Mammalian X-chromosome inactivation achieves dosage compensation between the sexes by the silencing of one X chromosome in females. In Eutheria, X inactivation is initiated by the large noncoding RNA Xist; however, it is unknown how this RNA results in silencing of the chromosome or why, at least in humans, many genes escape silencing in somatic cells. We have sequenced the coast mole Xist gene and compared the Xist RNA sequence among seven eutherians to provide insight into the structure of the RNA and origins of the gene. Using DNA methylation of promoter sequences to assess whether genes are silenced in females we report the inactivation status of seven X-linked genes in humans and mice as well as two additional eutherians, the mole and the cow, providing evidence that escape from inactivation is common among Eutheria.
Collapse
Affiliation(s)
- Ziny C Yen
- Department of Medical Genetics, University of British Columbia, Vancouver, BC, Canada
| | | | | | | |
Collapse
|
36
|
Fast pairwise structural RNA alignments by pruning of the dynamical programming matrix. PLoS Comput Biol 2007; 3:1896-908. [PMID: 17937495 PMCID: PMC2014794 DOI: 10.1371/journal.pcbi.0030193] [Citation(s) in RCA: 82] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2007] [Accepted: 08/20/2007] [Indexed: 11/19/2022] Open
Abstract
It has become clear that noncoding RNAs (ncRNA) play important roles in cells, and emerging studies indicate that there might be a large number of unknown ncRNAs in mammalian genomes. There exist computational methods that can be used to search for ncRNAs by comparing sequences from different genomes. One main problem with these methods is their computational complexity, and heuristics are therefore employed. Two heuristics are currently very popular: pre-folding and pre-aligning. However, these heuristics are not ideal, as pre-aligning is dependent on sequence similarity that may not be present and pre-folding ignores the comparative information. Here, pruning of the dynamical programming matrix is presented as an alternative novel heuristic constraint. All subalignments that do not exceed a length-dependent minimum score are discarded as the matrix is filled out, thus giving the advantage of providing the constraints dynamically. This has been included in a new implementation of the FOLDALIGN algorithm for pairwise local or global structural alignment of RNA sequences. It is shown that time and memory requirements are dramatically lowered while overall performance is maintained. Furthermore, a new divide and conquer method is introduced to limit the memory requirement during global alignment and backtrack of local alignment. All branch points in the computed RNA structure are found and used to divide the structure into smaller unbranched segments. Each segment is then realigned and backtracked in a normal fashion. Finally, the FOLDALIGN algorithm has also been updated with a better memory implementation and an improved energy model. With these improvements in the algorithm, the FOLDALIGN software package provides the molecular biologist with an efficient and user-friendly tool for searching for new ncRNAs. The software package is available for download at http://foldalign.ku.dk. FOLDALIGN is an algorithm for making pairwise structural alignments of RNA sequences. It uses a lightweight energy model and sequence similarity to simultaneously fold and align the sequences. The algorithm can make local and global alignments. The power of structural alignment methods is that they can align sequences where the primary sequences have diverged too much for normal alignment methods to be useful. The structures predicted by structural alignment methods are usually better than the structures predicted by single-sequence folding methods since they can take comparative information into account. The main problem for most structural alignment methods is that they are too computationally expensive. In this paper we introduce the dynamical pruning heuristic that makes the FOLDALIGN method significantly faster without lowering the predictive performance. The memory requirements are also significantly lowered, allowing for the analysis of longer sequences. A user-friendly (still command-line based, though) implementation of the algorithm is available at the Web site: http://foldalign.ku.dk
Collapse
|
37
|
Meyer IM, Miklós I. SimulFold: simultaneously inferring RNA structures including pseudoknots, alignments, and trees using a Bayesian MCMC framework. PLoS Comput Biol 2007; 3:e149. [PMID: 17696604 PMCID: PMC1941756 DOI: 10.1371/journal.pcbi.0030149] [Citation(s) in RCA: 55] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2006] [Accepted: 06/14/2007] [Indexed: 11/19/2022] Open
Abstract
Computational methods for predicting evolutionarily conserved rather than thermodynamic RNA structures have recently attracted increased interest. These methods are indispensable not only for elucidating the regulatory roles of known RNA transcripts, but also for predicting RNA genes. It has been notoriously difficult to devise them to make the best use of the available data and to predict high-quality RNA structures that may also contain pseudoknots. We introduce a novel theoretical framework for co-estimating an RNA secondary structure including pseudoknots, a multiple sequence alignment, and an evolutionary tree, given several RNA input sequences. We also present an implementation of the framework in a new computer program, called SimulFold, which employs a Bayesian Markov chain Monte Carlo method to sample from the joint posterior distribution of RNA structures, alignments, and trees. We use the new framework to predict RNA structures, and comprehensively evaluate the quality of our predictions by comparing our results to those of several other programs. We also present preliminary data that show SimulFold's potential as an alignment and phylogeny prediction method. SimulFold overcomes many conceptual limitations that current RNA structure prediction methods face, introduces several new theoretical techniques, and generates high-quality predictions of conserved RNA structures that may include pseudoknots. It is thus likely to have a strong impact, both on the field of RNA structure prediction and on a wide range of data analyses.
Collapse
Affiliation(s)
- Irmtraud M Meyer
- UBC Bioinformatics Centre, University of British Columbia, Vancouver, British Columbia, Canada.
| | | |
Collapse
|
38
|
Xu X, Ji Y, Stormo GD. RNA Sampler: a new sampling based algorithm for common RNA secondary structure prediction and structural alignment. Bioinformatics 2007; 23:1883-91. [PMID: 17537756 DOI: 10.1093/bioinformatics/btm272] [Citation(s) in RCA: 53] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023] Open
Abstract
MOTIVATION Non-coding RNA genes and RNA structural regulatory motifs play important roles in gene regulation and other cellular functions. They are often characterized by specific secondary structures that are critical to their functions and are often conserved in phylogenetically or functionally related sequences. Predicting common RNA secondary structures in multiple unaligned sequences remains a challenge in bioinformatics research. METHODS AND RESULTS We present a new sampling based algorithm to predict common RNA secondary structures in multiple unaligned sequences. Our algorithm finds the common structure between two sequences by probabilistically sampling aligned stems based on stem conservation calculated from intrasequence base pairing probabilities and intersequence base alignment probabilities. It iteratively updates these probabilities based on sampled structures and subsequently recalculates stem conservation using the updated probabilities. The iterative process terminates upon convergence of the sampled structures. We extend the algorithm to multiple sequences by a consistency-based method, which iteratively incorporates and reinforces consistent structure information from pairwise comparisons into consensus structures. The algorithm has no limitation on predicting pseudoknots. In extensive testing on real sequence data, our algorithm outperformed other leading RNA structure prediction methods in both sensitivity and specificity with a reasonably fast speed. It also generated better structural alignments than other programs in sequences of a wide range of identities, which more accurately represent the RNA secondary structure conservations. AVAILABILITY The algorithm is implemented in a C program, RNA Sampler, which is available at http://ural.wustl.edu/software.html
Collapse
Affiliation(s)
- Xing Xu
- Department of Genetics, Washington University, School of Medicine, St. Louis, MO 63110, USA.
| | | | | |
Collapse
|
39
|
Abstract
RNA genes are ubiquitous in the cell and are involved in a number of biochemical processes. Because there is a close relationship between function and structure, software tools that predict the secondary structure of noncoding RNAs from the base sequence are very helpful. In this article, we focus our attention on the inference of conserved secondary structure for a group of homologous RNA sequences. We present the caRNAc software, which enables the analysis of families of homologous sequences without prior alignment. The method relies both on comparative analysis and thermodynamic information.
Collapse
Affiliation(s)
- Hélène Touzet
- LIFL-batiment M3, Cité Scientifique, Université des Sciences et Technologies de Lille
| |
Collapse
|
40
|
Abstract
The knowledge about classes of non-coding RNAs (ncRNAs) is growing very fast and it is mainly the structure which is the common characteristic property shared by members of the same class. For correct characterization of such classes it is therefore of great importance to analyse the structural features in great detail. In this manuscript I present RNAlishapes which combines various secondary structure analysis methods, such as suboptimal folding and shape abstraction, with a comparative approach known as RNA alignment folding. RNAlishapes makes use of an extended thermodynamic model and covariance scoring, which allows to reward covariation of paired bases. Applying the algorithm to a set of bacterial trp-operon leaders using shape abstraction it was able to identify the two alternating conformations of this attenuator. Besides providing in-depth analysis methods for aligned RNAs, the tool also shows a fairly well prediction accuracy. Therefore, RNAlishapes provides the community with a powerful tool for structural analysis of classes of RNAs and is also a reasonable method for consensus structure prediction based on sequence alignments. RNAlishapes is available for online use and download at .
Collapse
Affiliation(s)
- Björn Voss
- Experimental Bioinformatics, Institute of Biology II, Freiburg University, Schänzlestrasse 1, 79104 Freiburg, Germany.
| |
Collapse
|
41
|
Abstract
As one of the earliest problems in computational biology, RNA secondary structure prediction (sometimes referred to as "RNA folding") problem has attracted attention again, thanks to the recent discoveries of many novel non-coding RNA molecules. The two common approaches to this problem are de novo prediction of RNA secondary structure based on energy minimization and the consensus folding approach (computing the common secondary structure for a set of unaligned RNA sequences). Consensus folding algorithms work well when the correct seed alignment is part of the input to the problem. However, seed alignment itself is a challenging problem for diverged RNA families. In this paper, we propose a novel framework to predict the common secondary structure for unaligned RNA sequences. By matching putative stacks in RNA sequences, we make use of both primary sequence information and thermodynamic stability for prediction at the same time. We show that our method can predict the correct common RNA secondary structures even when we are given only a limited number of unaligned RNA sequences, and it outperforms current algorithms in sensitivity and accuracy.
Collapse
Affiliation(s)
- Vineet Bafna
- Department of Computer Science and Engineering, University of California, San Diego, La Jolla, 92093, USA
| | | | | |
Collapse
|
42
|
Abstract
MOTIVATION The recent discoveries of large numbers of non-coding RNAs and computational advances in genome-scale RNA search create a need for tools for automatic, high quality identification and characterization of conserved RNA motifs that can be readily used for database search. Previous tools fall short of this goal. RESULTS CMfinder is a new tool to predict RNA motifs in unaligned sequences. It is an expectation maximization algorithm using covariance models for motif description, featuring novel integration of multiple techniques for effective search of motif space, and a Bayesian framework that blends mutual information-based and folding energy-based approaches to predict structure in a principled way. Extensive tests show that our method works well on datasets with either low or high sequence similarity, is robust to inclusion of lengthy extraneous flanking sequence and/or completely unrelated sequences, and is reasonably fast and scalable. In testing on 19 known ncRNA families, including some difficult cases with poor sequence conservation and large indels, our method demonstrates excellent average per-base-pair accuracy--79% compared with at most 60% for alternative methods. More importantly, the resulting probabilistic model can be directly used for homology search, allowing iterative refinement of structural models based on additional homologs. We have used this approach to obtain highly accurate covariance models of known RNA motifs based on small numbers of related sequences, which identified homologs in deeply-diverged species.
Collapse
Affiliation(s)
- Zizhen Yao
- Department of Computer Science and Engineering, University of Washington, Seattle, WA 98195-2350, USA.
| | | | | |
Collapse
|
43
|
|
44
|
Xayaphoummine A, Bucher T, Isambert H. Kinefold web server for RNA/DNA folding path and structure prediction including pseudoknots and knots. Nucleic Acids Res 2005; 33:W605-10. [PMID: 15980546 PMCID: PMC1160208 DOI: 10.1093/nar/gki447] [Citation(s) in RCA: 209] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
Abstract
The Kinefold web server provides a web interface for stochastic folding simulations of nucleic acids on second to minute molecular time scales. Renaturation or co-transcriptional folding paths are simulated at the level of helix formation and dissociation in agreement with the seminal experimental results. Pseudoknots and topologically ‘entangled’ helices (i.e. knots) are efficiently predicted taking into account simple geometrical and topological constraints. To encourage interactivity, simulations launched as immediate jobs are automatically stopped after a few seconds and return adapted recommendations. Users can then choose to continue incomplete simulations using the batch queuing system or go back and modify suggested options in their initial query. Detailed output provide (i) a series of low free energy structures, (ii) an online animated folding path and (iii) a programmable trajectory plot focusing on a few helices of interest to each user. The service can be accessed at .
Collapse
Affiliation(s)
- A. Xayaphoummine
- Laboratoire de Dynamique des Fluides Complexes, CNRS-ULP, Institut de Physique3 rue de l'Université, 67000 Strasbourg, France
| | - T. Bucher
- Laboratoire de Dynamique des Fluides Complexes, CNRS-ULP, Institut de Physique3 rue de l'Université, 67000 Strasbourg, France
| | - H. Isambert
- Laboratoire de Dynamique des Fluides Complexes, CNRS-ULP, Institut de Physique3 rue de l'Université, 67000 Strasbourg, France
- Physico-chimie Curie, CNRS UMR168, Institut Curie, Section de Recherche11 rue P. & M. Curie, 75005 Paris, France
- To whom correspondence should be addressed. Tel: +33 1 42 34 64 74;
| |
Collapse
|
45
|
Gardner PP, Giegerich R. A comprehensive comparison of comparative RNA structure prediction approaches. BMC Bioinformatics 2004; 5:140. [PMID: 15458580 PMCID: PMC526219 DOI: 10.1186/1471-2105-5-140] [Citation(s) in RCA: 260] [Impact Index Per Article: 12.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2004] [Accepted: 09/30/2004] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND An increasing number of researchers have released novel RNA structure analysis and prediction algorithms for comparative approaches to structure prediction. Yet, independent benchmarking of these algorithms is rarely performed as is now common practice for protein-folding, gene-finding and multiple-sequence-alignment algorithms. RESULTS Here we evaluate a number of RNA folding algorithms using reliable RNA data-sets and compare their relative performance. CONCLUSIONS We conclude that comparative data can enhance structure prediction but structure-prediction-algorithms vary widely in terms of both sensitivity and selectivity across different lengths and homologies. Furthermore, we outline some directions for future research.
Collapse
Affiliation(s)
- Paul P Gardner
- Department of Evolutionary Biology, University of Copenhagen, Universitetsparken 15, 2100 Copenhagen Ø, Denmark
| | - Robert Giegerich
- Faculty of Technology, University of Bielefeld, PO Box 10 01 31, 33501 Bielefeld, Germany
| |
Collapse
|