1
|
Tieng FYF, Abdullah-Zawawi MR, Md Shahri NAA, Mohamed-Hussein ZA, Lee LH, Mutalib NSA. A Hitchhiker's guide to RNA-RNA structure and interaction prediction tools. Brief Bioinform 2023; 25:bbad421. [PMID: 38040490 PMCID: PMC10753535 DOI: 10.1093/bib/bbad421] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2023] [Revised: 10/16/2023] [Accepted: 10/26/2023] [Indexed: 12/03/2023] Open
Abstract
RNA biology has risen to prominence after a remarkable discovery of diverse functions of noncoding RNA (ncRNA). Most untranslated transcripts often exert their regulatory functions into RNA-RNA complexes via base pairing with complementary sequences in other RNAs. An interplay between RNAs is essential, as it possesses various functional roles in human cells, including genetic translation, RNA splicing, editing, ribosomal RNA maturation, RNA degradation and the regulation of metabolic pathways/riboswitches. Moreover, the pervasive transcription of the human genome allows for the discovery of novel genomic functions via RNA interactome investigation. The advancement of experimental procedures has resulted in an explosion of documented data, necessitating the development of efficient and precise computational tools and algorithms. This review provides an extensive update on RNA-RNA interaction (RRI) analysis via thermodynamic- and comparative-based RNA secondary structure prediction (RSP) and RNA-RNA interaction prediction (RIP) tools and their general functions. We also highlighted the current knowledge of RRIs and the limitations of RNA interactome mapping via experimental data. Then, the gap between RSP and RIP, the importance of RNA homologues, the relationship between pseudoknots, and RNA folding thermodynamics are discussed. It is hoped that these emerging prediction tools will deepen the understanding of RNA-associated interactions in human diseases and hasten treatment processes.
Collapse
Affiliation(s)
- Francis Yew Fu Tieng
- UKM Medical Molecular Biology Institute (UMBI), Universiti Kebangsaan Malaysia (UKM), Kuala Lumpur 56000, Malaysia
| | | | - Nur Alyaa Afifah Md Shahri
- UKM Medical Molecular Biology Institute (UMBI), Universiti Kebangsaan Malaysia (UKM), Kuala Lumpur 56000, Malaysia
| | - Zeti-Azura Mohamed-Hussein
- Institute of Systems Biology (INBIOSIS), UKM, Selangor 43600, Malaysia
- Department of Applied Physics, Faculty of Science and Technology, UKM, Selangor 43600, Malaysia
| | - Learn-Han Lee
- Sunway Microbiomics Centre, School of Medical and Life Sciences, Sunway University, Sunway City 47500, Malaysia
- Novel Bacteria and Drug Discovery Research Group, Microbiome and Bioresource Research Strength, Jeffrey Cheah School of Medicine and Health Sciences, Monash University of Malaysia, Selangor 47500, Malaysia
| | - Nurul-Syakima Ab Mutalib
- UKM Medical Molecular Biology Institute (UMBI), Universiti Kebangsaan Malaysia (UKM), Kuala Lumpur 56000, Malaysia
- Novel Bacteria and Drug Discovery Research Group, Microbiome and Bioresource Research Strength, Jeffrey Cheah School of Medicine and Health Sciences, Monash University of Malaysia, Selangor 47500, Malaysia
- Faculty of Health Sciences, UKM, Kuala Lumpur 50300, Malaysia
| |
Collapse
|
2
|
Network-Based Structural Alignment of RNA Sequences Using TOPAS. Methods Mol Biol 2023; 2586:147-162. [PMID: 36705903 DOI: 10.1007/978-1-0716-2768-6_9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]
Abstract
TOPAS (TOPological network-based Alignment of Structural RNAs) is a network-based alignment algorithm that predicts structurally sound pairwise alignment of RNAs. In order to take advantage of recent advances in comparative network analysis for efficient structurally sound RNA alignment, TOPAS constructs topological network representations for RNAs, which consist of sequential edges connecting nucleotide bases as well as structural edges reflecting the underlying folding structure. Structural edges are weighted by the estimated base-pairing probabilities. Next, the constructed networks are aligned using probabilistic network alignment techniques, which yield a structurally sound RNA alignment that considers both the sequence similarity and the structural similarity between the given RNAs. Compared to traditional Sankoff-style algorithms, this network-based alignment scheme leads to a significant reduction in the overall computational cost while yielding favorable alignment results. Another important benefit is its capability to handle arbitrary folding structures, which can potentially lead to more accurate alignment for RNAs with pseudoknots.
Collapse
|
3
|
Wang H, Lu X, Zheng H, Wang W, Zhang G, Wang S, Lin P, Zhuang Y, Chen C, Chen Q, Qu J, Xu L. RNAsmc: A integrated tool for comparing RNA secondary structure and evaluating allosteric effects. Comput Struct Biotechnol J 2023; 21:965-973. [PMID: 36733704 PMCID: PMC9876829 DOI: 10.1016/j.csbj.2023.01.007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2022] [Revised: 01/06/2023] [Accepted: 01/07/2023] [Indexed: 01/11/2023] Open
Abstract
RNA structure plays a crucial role in gene regulation, in RNA stability and the essential biological processes. RNA secondary structure (RSS) motifs are the basic building blocks for investigating the biological mechanisms of structure. Here, we present a strategy for structural motif-based dynamic alignment, namely, RNA secondary-structural motif-comparing (RNAsmc), to identify structural motifs and quantitatively evaluate their underlying molecular functions. RNAsmc also has strong robustness to sequence length, folding protocol and RNA structural profile by chemical probing. Notably, it is also applicable to quantify structural variation in special RNA editing events (SNVs or SNPs, fragment insertion or deletion, etc.). The findings indicate that RNAsmc can uncover the heterogeneity of RNA secondary structure and score for similarities among components, which provides an impetus to cluster RNA families and evaluate allosteric effects. We find that RNAsmc exhibits remarkable detection efficiency for experimentally-derived RiboSNitches. Finally, the pipeline was assembled into an R software package to serve as an automated toolkit to explore, align, and cluster RSS. It is freely available for download at https://CRAN.R-project.org/package=RNAsmc.
Collapse
Affiliation(s)
- Hong Wang
- National Engineering Research Center of Ophthalmology and Optometry, Eye Hospital, Wenzhou Medical University, Wenzhou 325027, China
- State Key Laboratory of Ophthalmology, Optometry and Visual Science, Eye Hospital, Wenzhou Medical University, Wenzhou 325027, China
- Center of Optometry International Innovation of Wenzhou, Eye Valley, Wenzhou 325027, China
| | - Xiaoyan Lu
- National Engineering Research Center of Ophthalmology and Optometry, Eye Hospital, Wenzhou Medical University, Wenzhou 325027, China
| | - Hewei Zheng
- Wekemo Tech Group Co., Ltd. Shenzhen 518000, China
| | - Wencan Wang
- National Engineering Research Center of Ophthalmology and Optometry, Eye Hospital, Wenzhou Medical University, Wenzhou 325027, China
- State Key Laboratory of Ophthalmology, Optometry and Visual Science, Eye Hospital, Wenzhou Medical University, Wenzhou 325027, China
- Wenzhou Realdata Medical Research Co., Ltd, Wenzhou 325027, China
| | - Guosi Zhang
- National Engineering Research Center of Ophthalmology and Optometry, Eye Hospital, Wenzhou Medical University, Wenzhou 325027, China
- State Key Laboratory of Ophthalmology, Optometry and Visual Science, Eye Hospital, Wenzhou Medical University, Wenzhou 325027, China
| | - Siyu Wang
- National Engineering Research Center of Ophthalmology and Optometry, Eye Hospital, Wenzhou Medical University, Wenzhou 325027, China
- State Key Laboratory of Ophthalmology, Optometry and Visual Science, Eye Hospital, Wenzhou Medical University, Wenzhou 325027, China
| | - Peng Lin
- National Engineering Research Center of Ophthalmology and Optometry, Eye Hospital, Wenzhou Medical University, Wenzhou 325027, China
- State Key Laboratory of Ophthalmology, Optometry and Visual Science, Eye Hospital, Wenzhou Medical University, Wenzhou 325027, China
| | - Youyuan Zhuang
- National Engineering Research Center of Ophthalmology and Optometry, Eye Hospital, Wenzhou Medical University, Wenzhou 325027, China
- State Key Laboratory of Ophthalmology, Optometry and Visual Science, Eye Hospital, Wenzhou Medical University, Wenzhou 325027, China
| | - Chong Chen
- National Engineering Research Center of Ophthalmology and Optometry, Eye Hospital, Wenzhou Medical University, Wenzhou 325027, China
- State Key Laboratory of Ophthalmology, Optometry and Visual Science, Eye Hospital, Wenzhou Medical University, Wenzhou 325027, China
| | - Qi Chen
- National Engineering Research Center of Ophthalmology and Optometry, Eye Hospital, Wenzhou Medical University, Wenzhou 325027, China
- State Key Laboratory of Ophthalmology, Optometry and Visual Science, Eye Hospital, Wenzhou Medical University, Wenzhou 325027, China
| | - Jia Qu
- National Engineering Research Center of Ophthalmology and Optometry, Eye Hospital, Wenzhou Medical University, Wenzhou 325027, China
- State Key Laboratory of Ophthalmology, Optometry and Visual Science, Eye Hospital, Wenzhou Medical University, Wenzhou 325027, China
- Center of Optometry International Innovation of Wenzhou, Eye Valley, Wenzhou 325027, China
- Corresponding authors at: National Engineering Research Center of Ophthalmology and Optometry, Eye Hospital, Wenzhou Medical University, Wenzhou 325027, China
| | - Liangde Xu
- National Engineering Research Center of Ophthalmology and Optometry, Eye Hospital, Wenzhou Medical University, Wenzhou 325027, China
- State Key Laboratory of Ophthalmology, Optometry and Visual Science, Eye Hospital, Wenzhou Medical University, Wenzhou 325027, China
- Center of Optometry International Innovation of Wenzhou, Eye Valley, Wenzhou 325027, China
- Corresponding authors at: National Engineering Research Center of Ophthalmology and Optometry, Eye Hospital, Wenzhou Medical University, Wenzhou 325027, China
| |
Collapse
|
4
|
Sun S, Yang J, Zhang Z. RNALigands: a database and web server for RNA-ligand interactions. RNA (NEW YORK, N.Y.) 2022; 28:115-122. [PMID: 34732566 PMCID: PMC8906548 DOI: 10.1261/rna.078889.121] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/05/2021] [Accepted: 10/25/2021] [Indexed: 06/13/2023]
Abstract
RNA molecules can fold into complex and stable 3D structures, allowing them to carry out important genetic, structural, and regulatory roles inside the cell. These complex structures often contain 3D pockets made up of secondary structural motifs that can be potentially targeted by small molecule ligands. Indeed, many RNA structures in PDB contain bound small molecules, and high-throughput experimental studies have generated a large number of interacting RNA and ligand pairs. There is considerable interest in developing small molecule lead compounds targeting viral RNAs or those RNAs implicated in neurological diseases or cancer. We hypothesize that RNAs that have similar secondary structural motifs may bind to similar small molecule ligands. Toward this goal, we established a database collecting RNA secondary structural motifs and bound small molecule ligands. We further developed a computational pipeline, which takes as input an RNA sequence, predicts its secondary structure, extracts structural motifs, and searches the database for similar secondary structure motifs and interacting small molecule. We demonstrated the utility of the server by querying α-synuclein mRNA 5' UTR sequence and finding potential matches which were validated as correct. The server is publicly available at http://RNALigands.ccbr.utoronto.ca The source code can also be downloaded at https://github.com/SaisaiSun/RNALigands.
Collapse
Affiliation(s)
- Saisai Sun
- School of Computer Science and Technology, Xidian University, Xi'an, 710071, Shanxi, China
- Terrence Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, Ontario M5S 3E1, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, Ontario M5S 3E1, Canada
| | - Jianyi Yang
- School of Mathematical Sciences, Nankai University, Tianjin, 300071, China
| | - Zhaolei Zhang
- Terrence Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, Ontario M5S 3E1, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, Ontario M5S 3E1, Canada
- Department of Computer Science, University of Toronto, Toronto, Ontario M5S 3E1, Canada
| |
Collapse
|
5
|
Tagashira M, Asai K. ConsAlifold: considering RNA structural alignments improves prediction accuracy of RNA consensus secondary structures. Bioinformatics 2022; 38:710-719. [PMID: 34694364 DOI: 10.1093/bioinformatics/btab738] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2020] [Revised: 08/24/2021] [Accepted: 10/20/2021] [Indexed: 02/03/2023] Open
Abstract
MOTIVATION By detecting homology among RNAs, the probabilistic consideration of RNA structural alignments has improved the prediction accuracy of significant RNA prediction problems. Predicting an RNA consensus secondary structure from an RNA sequence alignment is a fundamental research objective because in the detection of conserved base-pairings among RNA homologs, predicting an RNA consensus secondary structure is more convenient than predicting an RNA structural alignment. RESULTS We developed and implemented ConsAlifold, a dynamic programming-based method that predicts the consensus secondary structure of an RNA sequence alignment. ConsAlifold considers RNA structural alignments. ConsAlifold achieves moderate running time and the best prediction accuracy of RNA consensus secondary structures among available prediction methods. AVAILABILITY AND IMPLEMENTATION ConsAlifold, data and Python scripts for generating both figures and tables are freely available at https://github.com/heartsh/consalifold. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Masaki Tagashira
- Department of Computational Biology and Medical Sciences, University of Tokyo, Chiba 277-8561, Japan.,Artificial Intelligence Research Center, AIST, Tokyo 135-0064, Japan
| | - Kiyoshi Asai
- Department of Computational Biology and Medical Sciences, University of Tokyo, Chiba 277-8561, Japan.,Artificial Intelligence Research Center, AIST, Tokyo 135-0064, Japan
| |
Collapse
|
6
|
A structurally conserved RNA element within SARS-CoV-2 ORF1a RNA and S mRNA regulates translation in response to viral S protein-induced signaling in human lung cells. J Virol 2021; 96:e0167821. [PMID: 34757848 PMCID: PMC8791291 DOI: 10.1128/jvi.01678-21] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
The positive-sense, single-stranded RNA genome SARS-CoV-2 harbors functionally important cis-acting elements governing critical aspects of viral gene expression. However, insights on how these elements sense various signals from the host cell and regulate viral protein synthesis are lacking. Here, we identified two novel cis-regulatory elements in SARS-CoV-2 ORF1a and S RNAs and describe their role in translational control of SARS-CoV-2. These elements are sequence-unrelated but form conserved hairpin structures (validated by NMR) resembling Gamma Activated Inhibitor of Translation (GAIT) elements that are found in a cohort of human mRNAs directing translational suppression in myeloid cells in response to IFN-γ. Our studies show that treatment of human lung cells with receptor-binding S1 subunit, S protein pseudotyped lentivirus, and S protein-containing virus-like particles triggers a signaling pathway involving DAP-kinase1 that leads to phosphorylation and release of the ribosomal protein L13a from the large ribosomal subunit. Released L13a forms a Virus Activated Inhibitor of Translation (VAIT) complex that binds to ORF1a and S VAIT elements, causing translational silencing. Translational silencing requires extracellular S protein (and its interaction with host ACE2 receptor), but not its intracellular synthesis. RNA-protein interaction analyses and in vitro translation experiments showed that GAIT and VAIT elements do not compete with each other, highlighting differences between the two pathways. Sequence alignments of SARS-CoV-2 genomes showed a high level of conservation of VAIT elements, suggesting their functional importance. This VAIT-mediated translational control mechanism of SARS-CoV-2 may provide novel targets for small molecule intervention and/or facilitate development of more effective mRNA vaccines. Importance Specific RNA elements in the genomes of RNA viruses play important roles in host-virus interaction. For SARS-CoV-2, the mechanistic insights on how these RNA elements could sense the signals from the host cell are lacking. Here we report a novel relationship between the GAIT-like SARS-CoV-2 RNA element (called VAITs) and the signal generated from the host cell. We show that for SARS-CoV-2, the interaction of spike protein with ACE2 not only serves the purpose for viral entry into the host cell, but also transduces signals that culminate into the phosphorylation and the release of L13a from the large ribosomal subunit. We also show that this event leads to the translational arrest of ORF1a and S mRNAs in a manner dependent on the structure of the RNA elements. Translational control of viral mRNA by a host-cell generated signal triggered by viral protein is a new paradigm in the host-virus relationship.
Collapse
|
7
|
Liu B, Thippabhotla S, Zhang J, Zhong C. DRAGoM: Classification and Quantification of Noncoding RNA in Metagenomic Data. Front Genet 2021; 12:669495. [PMID: 34025724 PMCID: PMC8131839 DOI: 10.3389/fgene.2021.669495] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2021] [Accepted: 03/23/2021] [Indexed: 12/21/2022] Open
Abstract
Noncoding RNAs (ncRNAs) play important regulatory and functional roles in microorganisms, such as regulation of gene expression, signaling, protein synthesis, and RNA processing. Hence, their classification and quantification are central tasks toward the understanding of the function of the microbial community. However, the majority of the current metagenomic sequencing technologies generate short reads, which may contain only a partial secondary structure that complicates ncRNA homology detection. Meanwhile, de novo assembly of the metagenomic sequencing data remains challenging for complex communities. To tackle these challenges, we developed a novel algorithm called DRAGoM (Detection of RNA using Assembly Graph from Metagenomic data). DRAGoM first constructs a hybrid graph by merging an assembly string graph and an assembly de Bruijn graph. Then, it classifies paths in the hybrid graph and their constituent readsinto differentncRNA families based on both sequence and structural homology. Our benchmark experiments show that DRAGoMcan improve the performance and robustness over traditional approaches on the classification and quantification of a wide class of ncRNA families.
Collapse
Affiliation(s)
- Ben Liu
- Department of Electrical Engineering and Computer Science, The University of Kansas, Lawrence, KS, United States
| | - Sirisha Thippabhotla
- Department of Electrical Engineering and Computer Science, The University of Kansas, Lawrence, KS, United States
| | - Jun Zhang
- Division of Medical Oncology, Department of Internal Medicine, University of Kansas Medical Center, Kansas City, KS, United States.,Department of Cancer Biology, University of Kansas Medical Center, Kansas City, KS, United States
| | - Cuncong Zhong
- Department of Electrical Engineering and Computer Science, The University of Kansas, Lawrence, KS, United States.,Bioengineering Program, The University of Kansas, Lawrence, KS, United States.,Center for Computational Biology, The University of Kansas, Lawrence, KS, United States
| |
Collapse
|
8
|
Miladi M, Raden M, Will S, Backofen R. Fast and accurate structure probability estimation for simultaneous alignment and folding of RNAs with Markov chains. Algorithms Mol Biol 2020; 15:19. [PMID: 33292340 PMCID: PMC7666477 DOI: 10.1186/s13015-020-00179-w] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2019] [Accepted: 10/16/2020] [Indexed: 12/14/2022] Open
Abstract
MOTIVATION Simultaneous alignment and folding (SA&F) of RNAs is the indispensable gold standard for inferring the structure of non-coding RNAs and their general analysis. The original algorithm, proposed by Sankoff, solves the theoretical problem exactly with a complexity of [Formula: see text] in the full energy model. Over the last two decades, several variants and improvements of the Sankoff algorithm have been proposed to reduce its extreme complexity by proposing simplified energy models or imposing restrictions on the predicted alignments. RESULTS Here, we introduce a novel variant of Sankoff's algorithm that reconciles the simplifications of PMcomp, namely moving from the full energy model to a simpler base pair-based model, with the accuracy of the loop-based full energy model. Instead of estimating pseudo-energies from unconditional base pair probabilities, our model calculates energies from conditional base pair probabilities that allow to accurately capture structure probabilities, which obey a conditional dependency. This model gives rise to the fast and highly accurate novel algorithm Pankov (Probabilistic Sankoff-like simultaneous alignment and folding of RNAs inspired by Markov chains). CONCLUSIONS Pankov benefits from the speed-up of excluding unreliable base-pairing without compromising the loop-based free energy model of the Sankoff's algorithm. We show that Pankov outperforms its predecessors LocARNA and SPARSE in folding quality and is faster than LocARNA.
Collapse
Affiliation(s)
- Milad Miladi
- Bioinformatics Group, Department of Computer Science, University of Freiburg, Georges-Köhler-Allee 106, Freiburg, Germany
| | - Martin Raden
- Bioinformatics Group, Department of Computer Science, University of Freiburg, Georges-Köhler-Allee 106, Freiburg, Germany
| | - Sebastian Will
- Theoretical Biochemistry Group (TBI), Institute for Theoretical Chemistry, University of Vienna, Währingerstrasse 17, Vienna, Austria
- Bioinformatics group (AMIBIO), Laboratoire d’Informatique de l’École Polytechnique (LIX), Institut Polytechnique de Paris (IPP), Batiment Turing, 1 rue d’Estienne d’Orve, Palaiseau, France
| | - Rolf Backofen
- Bioinformatics Group, Department of Computer Science, University of Freiburg, Georges-Köhler-Allee 106, Freiburg, Germany
- Signalling Research Centres BIOSS and CIBSS, University of Freiburg, Schänzlestr. 18, Freiburg, Germany
| |
Collapse
|
9
|
Chen CC, Jeong H, Qian X, Yoon BJ. TOPAS: network-based structural alignment of RNA sequences. Bioinformatics 2020; 35:2941-2948. [PMID: 30629122 DOI: 10.1093/bioinformatics/btz001] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2017] [Revised: 12/07/2018] [Accepted: 01/04/2019] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION For many RNA families, the secondary structure is known to be better conserved among the member RNAs compared to the primary sequence. For this reason, it is important to consider the underlying folding structures when aligning RNA sequences, especially for those with relatively low sequence identity. Given a set of RNAs with unknown structures, simultaneous RNA alignment and folding algorithms aim to accurately align the RNAs by jointly predicting their consensus secondary structure and the optimal sequence alignment. Despite the improved accuracy of the resulting alignment, the computational complexity of simultaneous alignment and folding for a pair of RNAs is O(N6), which is too costly to be used for large-scale analysis. RESULTS In order to address this shortcoming, in this work, we propose a novel network-based scheme for pairwise structural alignment of RNAs. The proposed algorithm, TOPAS, builds on the concept of topological networks that provide structural maps of the RNAs to be aligned. For each RNA sequence, TOPAS first constructs a topological network based on the predicted folding structure, which consists of sequential edges and structural edges weighted by the base-pairing probabilities. The obtained networks can then be efficiently aligned by using probabilistic network alignment techniques, thereby yielding the structural alignment of the RNAs. The computational complexity of our proposed method is significantly lower than that of the Sankoff-style dynamic programming approach, while yielding favorable alignment results. Furthermore, another important advantage of the proposed algorithm is its capability of handling RNAs with pseudoknots while predicting the RNA structural alignment. We demonstrate that TOPAS generally outperforms previous RNA structural alignment methods on RNA benchmarks in terms of both speed and accuracy. AVAILABILITY AND IMPLEMENTATION Source code of TOPAS and the benchmark data used in this paper are available at https://github.com/bjyoontamu/TOPAS.
Collapse
Affiliation(s)
- Chun-Chi Chen
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX, USA.,TEES-AgriLife Center for Bioinformatics & Genomic Systems Engineering, Texas A&M University, College Station, TX, USA
| | - Hyundoo Jeong
- Department of Electronic Engineering, Chosun University, Gwangju, Republic of Korea
| | - Xiaoning Qian
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX, USA.,TEES-AgriLife Center for Bioinformatics & Genomic Systems Engineering, Texas A&M University, College Station, TX, USA
| | - Byung-Jun Yoon
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX, USA.,TEES-AgriLife Center for Bioinformatics & Genomic Systems Engineering, Texas A&M University, College Station, TX, USA
| |
Collapse
|
10
|
Basu A, Dvorina N, Baldwin WM, Mazumder B. High-fat diet-induced GAIT element-mediated translational silencing of mRNAs encoding inflammatory proteins in macrophage protects against atherosclerosis. FASEB J 2020; 34:6888-6906. [PMID: 32232901 DOI: 10.1096/fj.201903119r] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2019] [Revised: 02/28/2020] [Accepted: 03/16/2020] [Indexed: 11/11/2022]
Abstract
Previously, we identified a mechanism of inflammation control directed by ribosomal protein L13a and "GAIT" (Gamma Activated Inhibitor of Translation) elements in target mRNAs and showed that its elimination in myeloid cell-specific L13a knockout mice (L13a KO) increased atherosclerosis susceptibility and severity. Here, we investigated the mechanistic basis of this endogenous defense against atherosclerosis. We compared molecular and cellular aspects of atherosclerosis in high-fat diet (HFD)-fed L13a KO and intact (control) mice. HFD treatment of control mice induced release of L13a from 60S ribosome, formation of RNA-binding complex, and subsequent GAIT element-mediated translational silencing. Atherosclerotic plaques from HFD-treated KO mice showed increased infiltration of M1 type inflammatory macrophages. Macrophages from KO mice showed increased phagocytic activity and elevated expression of LDL receptor and pro-inflammatory mediators. NanoString analysis of the plaques from KO mice showed upregulation of a number of mRNAs encoding inflammatory proteins. Bioinformatics analysis suggests the presence of the potential GAIT elements in the 3'UTRs of several of these mRNAs. Macrophage induces L13a/GAIT-dependent translational silencing of inflammatory genes in response to HFD as an endogenous defense against atherosclerosis in ApoE-/- model.
Collapse
Affiliation(s)
- Abhijit Basu
- Department of Biology, Geology and Environmental Sciences, Center for Gene Regulation in Health and Disease, Cleveland State University, Cleveland, OH, USA
| | - Nina Dvorina
- Department of Inflammation and Immunity, Lerner College of Medicine, Cleveland Clinic, Cleveland, OH, USA
| | - William M Baldwin
- Department of Inflammation and Immunity, Lerner College of Medicine, Cleveland Clinic, Cleveland, OH, USA
| | - Barsanjit Mazumder
- Department of Biology, Geology and Environmental Sciences, Center for Gene Regulation in Health and Disease, Cleveland State University, Cleveland, OH, USA
| |
Collapse
|
11
|
Tourasse NJ, Darfeuille F. Structural Alignment and Covariation Analysis of RNA Sequences. Bio Protoc 2020; 10:e3511. [PMID: 33654736 PMCID: PMC7842705 DOI: 10.21769/bioprotoc.3511] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2019] [Revised: 12/29/2019] [Accepted: 12/29/2019] [Indexed: 11/02/2022] Open
Abstract
RNA molecules adopt defined structural conformations that are essential to exert their function. During the course of evolution, the structure of a given RNA can be maintained via compensatory base-pair changes that occur among covarying nucleotides in paired regions. Therefore, for comparative, structural, and evolutionary studies of RNA molecules, numerous computational tools have been developed to incorporate structural information into sequence alignments and a number of tools have been developed to study covariation. The bioinformatic protocol presented here explains how to use some of these tools to generate a secondary-structure-aware multiple alignment of RNA sequences and to annotate the alignment to examine the conservation and covariation of structural elements among the sequences.
Collapse
Affiliation(s)
- Nicolas J. Tourasse
- ARNA Laboratory, INSERM U1212, CNRS UMR5320, University of Bordeaux, Bordeaux, France
| | - Fabien Darfeuille
- ARNA Laboratory, INSERM U1212, CNRS UMR5320, University of Bordeaux, Bordeaux, France
| |
Collapse
|
12
|
Bayegan AH, Clote P. RNAmountAlign: Efficient software for local, global, semiglobal pairwise and multiple RNA sequence/structure alignment. PLoS One 2020; 15:e0227177. [PMID: 31978147 PMCID: PMC6980424 DOI: 10.1371/journal.pone.0227177] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2018] [Accepted: 12/13/2019] [Indexed: 11/19/2022] Open
Abstract
Alignment of structural RNAs is an important problem with a wide range of applications. Since function is often determined by molecular structure, RNA alignment programs should take into account both sequence and base-pairing information for structural homology identification. This paper describes C++ software, RNAmountAlign, for RNA sequence/structure alignment that runs in O(n3) time and O(n2) space for two sequences of length n; moreover, our software returns a p-value (transformable to expect value E) based on Karlin-Altschul statistics for local alignment, as well as parameter fitting for local and global alignment. Using incremental mountain height, a representation of structural information computable in cubic time, RNAmountAlign implements quadratic time pairwise local, global and global/semiglobal (query search) alignment using a weighted combination of sequence and structural similarity. RNAmountAlign is capable of performing progressive multiple alignment as well. Benchmarking of RNAmountAlign against LocARNA, LARA, FOLDALIGN, DYNALIGN, STRAL, MXSCARNA, and MUSCLE shows that RNAmountAlign has reasonably good accuracy and faster run time supporting all alignment types. Additionally, our extension of RNAmountAlign, called RNAmountAlignScan, which scans a target genome sequence to find hits having high sequence and structural similarity to a given query sequence, outperforms RSEARCH and sequence-only query scans and runs faster than FOLDALIGN query scan.
Collapse
Affiliation(s)
- Amir H. Bayegan
- Biology Department, Boston College, Chestnut Hill, MA, United States of America
| | - Peter Clote
- Biology Department, Boston College, Chestnut Hill, MA, United States of America
- * E-mail:
| |
Collapse
|
13
|
Mathews DH. How to benchmark RNA secondary structure prediction accuracy. Methods 2019; 162-163:60-67. [PMID: 30951834 DOI: 10.1016/j.ymeth.2019.04.003] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2018] [Revised: 03/24/2019] [Accepted: 04/01/2019] [Indexed: 11/18/2022] Open
Abstract
RNA secondary structure prediction is widely used. As new methods are developed, these are often benchmarked for accuracy against existing methods. This review discusses good practices for performing these benchmarks, including the choice of benchmarking structures, metrics to quantify accuracy, the importance of allowing flexibility for pairs in the accepted structure, and the importance of statistical testing for significance.
Collapse
Affiliation(s)
- David H Mathews
- Center for RNA Biology, Department of Biochemistry & Biophysics, and Department of Biostatistics & Computational Biology, University of Rochester Medical Center, 601 Elmwood Avenue, Box 712, Rochester, NY 14642, United States.
| |
Collapse
|
14
|
Abstract
Many years of research in RNA biology have soundly established the importance of RNA-based regulation far beyond most early traditional presumptions. Importantly, the advances in "wet" laboratory techniques have produced unprecedented amounts of data that require efficient and precise computational analysis schemes and algorithms. Hence, many in silico methods that attempt topological and functional classification of novel putative RNA-based regulators are available. In this review, we technically outline thermodynamics-based standard RNA secondary structure and RNA-RNA interaction prediction approaches that have proven valuable to the RNA research community in the past and present. For these, we highlight their usability with a special focus on prokaryotic organisms and also briefly mention recent advances in whole-genome interactomics and how this may influence the field of predictive RNA research.
Collapse
|
15
|
Basu A, Jain N, Tolbert BS, Komar AA, Mazumder B. Conserved structures formed by heterogeneous RNA sequences drive silencing of an inflammation responsive post-transcriptional operon. Nucleic Acids Res 2018; 45:12987-13003. [PMID: 29069516 PMCID: PMC5727460 DOI: 10.1093/nar/gkx979] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2017] [Accepted: 10/09/2017] [Indexed: 11/21/2022] Open
Abstract
RNA–protein interactions with physiological outcomes usually rely on conserved sequences within the RNA element. By contrast, activity of the diverse gamma-interferon-activated inhibitor of translation (GAIT)-elements relies on the conserved RNA folding motifs rather than the conserved sequence motifs. These elements drive the translational silencing of a group of chemokine (CC/CXC) and chemokine receptor (CCR) mRNAs, thereby helping to resolve physiological inflammation. Despite sequence dissimilarity, these RNA elements adopt common secondary structures (as revealed by 2D-1H NMR spectroscopy), providing a basis for their interaction with the RNA-binding GAIT complex. However, many of these elements (e.g. those derived from CCL22, CXCL13, CCR4 and ceruloplasmin (Cp) mRNAs) have substantially different affinities for GAIT complex binding. Toeprinting analysis shows that different positions within the overall conserved GAIT element structure contribute to differential affinities of the GAIT protein complex towards the elements. Thus, heterogeneity of GAIT elements may provide hierarchical fine-tuning of the resolution of inflammation.
Collapse
Affiliation(s)
- Abhijit Basu
- Center for Gene Regulation in Health & Disease, Department of Biology, Geology and Environmental Sciences, Cleveland State University, 2121 Euclid Avenue, Cleveland, OH 44115, USA
| | - Niyati Jain
- Department of Chemistry, Case Western Reserve University, Cleveland, OH 44106, USA
| | - Blanton S Tolbert
- Department of Chemistry, Case Western Reserve University, Cleveland, OH 44106, USA
| | - Anton A Komar
- Center for Gene Regulation in Health & Disease, Department of Biology, Geology and Environmental Sciences, Cleveland State University, 2121 Euclid Avenue, Cleveland, OH 44115, USA
| | - Barsanjit Mazumder
- Center for Gene Regulation in Health & Disease, Department of Biology, Geology and Environmental Sciences, Cleveland State University, 2121 Euclid Avenue, Cleveland, OH 44115, USA
| |
Collapse
|
16
|
Choudhary K, Deng F, Aviran S. Comparative and integrative analysis of RNA structural profiling data: current practices and emerging questions. QUANTITATIVE BIOLOGY 2017; 5:3-24. [PMID: 28717530 PMCID: PMC5510538 DOI: 10.1007/s40484-017-0093-6] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2016] [Revised: 12/08/2016] [Accepted: 12/15/2016] [Indexed: 12/30/2022]
Abstract
BACKGROUND Structure profiling experiments provide single-nucleotide information on RNA structure. Recent advances in chemistry combined with application of high-throughput sequencing have enabled structure profiling at transcriptome scale and in living cells, creating unprecedented opportunities for RNA biology. Propelled by these experimental advances, massive data with ever-increasing diversity and complexity have been generated, which give rise to new challenges in interpreting and analyzing these data. RESULTS We review current practices in analysis of structure profiling data with emphasis on comparative and integrative analysis as well as highlight emerging questions. Comparative analysis has revealed structural patterns across transcriptomes and has become an integral component of recent profiling studies. Additionally, profiling data can be integrated into traditional structure prediction algorithms to improve prediction accuracy. CONCLUSIONS To keep pace with experimental developments, methods to facilitate, enhance and refine such analyses are needed. Parallel advances in analysis methodology will complement profiling technologies and help them reach their full potential.
Collapse
Affiliation(s)
| | | | - Sharon Aviran
- Department of Biomedical Engineering and Genome Center, University of California at Davis, Davis, CA 95616, USA
| |
Collapse
|
17
|
Li Y, Shi X, Liang Y, Xie J, Zhang Y, Ma Q. RNA-TVcurve: a Web server for RNA secondary structure comparison based on a multi-scale similarity of its triple vector curve representation. BMC Bioinformatics 2017; 18:51. [PMID: 28109252 PMCID: PMC5251234 DOI: 10.1186/s12859-017-1481-7] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2016] [Accepted: 01/10/2017] [Indexed: 01/10/2023] Open
Abstract
Background RNAs have been found to carry diverse functionalities in nature. Inferring the similarity between two given RNAs is a fundamental step to understand and interpret their functional relationship. The majority of functional RNAs show conserved secondary structures, rather than sequence conservation. Those algorithms relying on sequence-based features usually have limitations in their prediction performance. Hence, integrating RNA structure features is very critical for RNA analysis. Existing algorithms mainly fall into two categories: alignment-based and alignment-free. The alignment-free algorithms of RNA comparison usually have lower time complexity than alignment-based algorithms. Results An alignment-free RNA comparison algorithm was proposed, in which novel numerical representations RNA-TVcurve (triple vector curve representation) of RNA sequence and corresponding secondary structure features are provided. Then a multi-scale similarity score of two given RNAs was designed based on wavelet decomposition of their numerical representation. In support of RNA mutation and phylogenetic analysis, a web server (RNA-TVcurve) was designed based on this alignment-free RNA comparison algorithm. It provides three functional modules: 1) visualization of numerical representation of RNA secondary structure; 2) detection of single-point mutation based on secondary structure; and 3) comparison of pairwise and multiple RNA secondary structures. The inputs of the web server require RNA primary sequences, while corresponding secondary structures are optional. For the primary sequences alone, the web server can compute the secondary structures using free energy minimization algorithm in terms of RNAfold tool from Vienna RNA package. Conclusion RNA-TVcurve is the first integrated web server, based on an alignment-free method, to deliver a suite of RNA analysis functions, including visualization, mutation analysis and multiple RNAs structure comparison. The comparison results with two popular RNA comparison tools, RNApdist and RNAdistance, showcased that RNA-TVcurve can efficiently capture subtle relationships among RNAs for mutation detection and non-coding RNA classification. All the relevant results were shown in an intuitive graphical manner, and can be freely downloaded from this server. RNA-TVcurve, along with test examples and detailed documents, are available at: http://ml.jlu.edu.cn/tvcurve/. Electronic supplementary material The online version of this article (doi:10.1186/s12859-017-1481-7) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Ying Li
- College of Computer Science and Technology, Jilin University, Changchun, 130012, China.,Key Laboratory of Symbolic Computation and Knowledge Engineering (Jilin University), Ministry of Education, Changchun, 130012, China
| | - Xiaohu Shi
- College of Computer Science and Technology, Jilin University, Changchun, 130012, China.,Key Laboratory of Symbolic Computation and Knowledge Engineering (Jilin University), Ministry of Education, Changchun, 130012, China
| | - Yanchun Liang
- College of Computer Science and Technology, Jilin University, Changchun, 130012, China.,Key Laboratory of Symbolic Computation and Knowledge Engineering (Jilin University), Ministry of Education, Changchun, 130012, China.,Zhuhai Laboratory of Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, Zhuhai College of Jilin University, Zhuhai, 519041, China
| | - Juan Xie
- Department of Mathematics and Statistics, South Dakota State University, Brookings, SD, 57007, USA.,Bioinformatics and Mathematical Biosciences Lab, Department of Agronomy, Horticulture and Plant Science, South Dakota State University, Brookings, SD, 57007, USA.,BioSNTR, Brookings, SD, USA
| | - Yu Zhang
- College of Computer Science and Technology, Jilin University, Changchun, 130012, China. .,Key Laboratory of Symbolic Computation and Knowledge Engineering (Jilin University), Ministry of Education, Changchun, 130012, China.
| | - Qin Ma
- Department of Mathematics and Statistics, South Dakota State University, Brookings, SD, 57007, USA. .,Bioinformatics and Mathematical Biosciences Lab, Department of Agronomy, Horticulture and Plant Science, South Dakota State University, Brookings, SD, 57007, USA. .,BioSNTR, Brookings, SD, USA.
| |
Collapse
|
18
|
Nitsche A, Stadler PF. Evolutionary clues in lncRNAs. WILEY INTERDISCIPLINARY REVIEWS-RNA 2016; 8. [PMID: 27436689 DOI: 10.1002/wrna.1376] [Citation(s) in RCA: 43] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/24/2016] [Revised: 06/06/2016] [Accepted: 06/09/2016] [Indexed: 12/13/2022]
Abstract
The diversity of long non-coding RNAs (lncRNAs) in the human transcriptome is in stark contrast to the sparse exploration of their functions concomitant with their conservation and evolution. The pervasive transcription of the largely non-coding human genome makes the evolutionary age and conservation patterns of lncRNAs to a topic of interest. Yet it is a fairly unexplored field and not that easy to determine as for protein-coding genes. Although there are a few experimentally studied cases, which are conserved at the sequence level, most lncRNAs exhibit weak or untraceable primary sequence conservation. Recent studies shed light on the interspecies conservation of secondary structures among lncRNA homologs by using diverse computational methods. This highlights the importance of structure on functionality of lncRNAs as opposed to the poor impact of primary sequence changes. Further clues in the evolution of lncRNAs are given by selective constraints on non-coding gene structures (e.g., promoters or splice sites) as well as the conservation of prevalent spatio-temporal expression patterns. However, a rapid evolutionary turnover is observable throughout the heterogeneous group of lncRNAs. This still gives rise to questions about its functional meaning. WIREs RNA 2017, 8:e1376. doi: 10.1002/wrna.1376 For further resources related to this article, please visit the WIREs website.
Collapse
Affiliation(s)
- Anne Nitsche
- Bioinformatics Group, Department of Computer Science, University Leipzig, Leipzig, Germany.,Institute de Biologie Moléculaire et Cellulaire, Université de Strasbourg, Cedex, France
| | - Peter F Stadler
- Bioinformatics Group, Department of Computer Science, University Leipzig, Leipzig, Germany.,Interdisciplinary Center for Bioinformatics, University Leipzig, Leipzig, Germany.,Max Planck Institute for Mathematics in the Sciences, Leipzig, Germany.,Department of Diagnostics, Fraunhofer Institute for Cell Therapy and Immunology - IZI, Leipzig, Germany.,Center for Non-Coding RNA in Technology and Health, University of Copenhagen, Frederiksberg, Denmark.,Department of Theoretical Chemistry, University of Vienna, Wien, Austria.,Santa Fe Institute, Santa Fe, NM, USA
| |
Collapse
|
19
|
Chatzou M, Magis C, Chang JM, Kemena C, Bussotti G, Erb I, Notredame C. Multiple sequence alignment modeling: methods and applications. Brief Bioinform 2015; 17:1009-1023. [PMID: 26615024 DOI: 10.1093/bib/bbv099] [Citation(s) in RCA: 84] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2015] [Revised: 10/16/2015] [Indexed: 12/20/2022] Open
Abstract
This review provides an overview on the development of Multiple sequence alignment (MSA) methods and their main applications. It is focused on progress made over the past decade. The three first sections review recent algorithmic developments for protein, RNA/DNA and genomic alignments. The fourth section deals with benchmarks and explores the relationship between empirical and simulated data, along with the impact on method developments. The last part of the review gives an overview on available MSA local reliability estimators and their dependence on various algorithmic properties of available methods.
Collapse
|
20
|
Abstract
Genomic studies have greatly expanded our knowledge of structural non-coding RNAs (ncRNAs). These RNAs fold into characteristic secondary structures and perform specific-structure dependent biological functions. Hence RNA secondary structure prediction is one of the most well studied problems in computational RNA biology. Comparative sequence analysis is one of the more reliable RNA structure prediction approaches as it exploits information of multiple related sequences to infer the consensus secondary structure. This class of methods essentially learns a global secondary structure from the input sequences. In this paper, we consider the more general problem of unearthing common local secondary structure based patterns from a set of related sequences. The input sequences for example could correspond to 3(') or 5(') untranslated regions of a set of orthologous genes and the unearthed local patterns could correspond to regulatory motifs found in these regions. These sequences could also correspond to in vitro selected RNA, genomic segments housing ncRNA genes from the same family and so on. Here, we give a detailed review of the various computational techniques proposed in literature attempting to solve this general motif discovery problem. We also give empirical comparisons of some of the current state of the art methods and point out future directions of research.
Collapse
Affiliation(s)
- Avinash Achar
- Department of Computer and Information Science, Norwegian University of Science and Technology, Trondheim, Norway
| | - Pål Sætrom
- Department of Computer and Information Science, Norwegian University of Science and Technology, Trondheim, Norway.
- Department of Cancer Research and Molecular Medicine, Norwegian University of Science and Technology, Trondheim, Norway.
| |
Collapse
|
21
|
Chen Q, Luo H, Zhang C, Chen YPP. Bioinformatics in protein kinases regulatory network and drug discovery. Math Biosci 2015; 262:147-56. [PMID: 25656386 DOI: 10.1016/j.mbs.2015.01.010] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2014] [Revised: 01/16/2015] [Accepted: 01/22/2015] [Indexed: 10/24/2022]
Abstract
Protein kinases have been implicated in a number of diseases, where kinases participate many aspects that control cell growth, movement and death. The deregulated kinase activities and the knowledge of these disorders are of great clinical interest of drug discovery. The most critical issue is the development of safe and efficient disease diagnosis and treatment for less cost and in less time. It is critical to develop innovative approaches that aim at the root cause of a disease, not just its symptoms. Bioinformatics including genetic, genomic, mathematics and computational technologies, has become the most promising option for effective drug discovery, and has showed its potential in early stage of drug-target identification and target validation. It is essential that these aspects are understood and integrated into new methods used in drug discovery for diseases arisen from deregulated kinase activity. This article reviews bioinformatics techniques for protein kinase data management and analysis, kinase pathways and drug targets and describes their potential application in pharma ceutical industry.
Collapse
Affiliation(s)
- Qingfeng Chen
- School of Computer, Electronic and Information, Guangxi University, Nanning, 530004, China; State Key Laboratory for Conservation and Utilization of Subtropical Agro-bioresources, Guangxi University, China.
| | - Haiqiong Luo
- School of Public Health, Guangxi Medical University, Nanning, 530021, China.
| | - Chengqi Zhang
- Centre for Quantum Computation & Intelligent Systems, University of Technology, Sydney P.O. Box 123, Broadway, NSW 2007, Australia.
| | - Yi-Ping Phoebe Chen
- Department of Computer Science and Computer Engineering, La Trobe University, Vic 3086, Australia.
| |
Collapse
|
22
|
Otto C, Möhl M, Heyne S, Amit M, Landau GM, Backofen R, Will S. ExpaRNA-P: simultaneous exact pattern matching and folding of RNAs. BMC Bioinformatics 2014; 15:404. [PMID: 25551362 PMCID: PMC4302096 DOI: 10.1186/s12859-014-0404-0] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2014] [Accepted: 12/01/2014] [Indexed: 01/26/2023] Open
Abstract
Background Identifying sequence-structure motifs common to two RNAs can speed up the comparison of structural RNAs substantially. The core algorithm of the existent approach ExpaRNA solves this problem for a priori known input structures. However, such structures are rarely known; moreover, predicting them computationally is no rescue, since single sequence structure prediction is highly unreliable. Results The novel algorithm ExpaRNA-P computes exactly matching sequence-structure motifs in entire Boltzmann-distributed structure ensembles of two RNAs; thereby we match and fold RNAs simultaneously, analogous to the well-known “simultaneous alignment and folding” of RNAs. While this implies much higher flexibility compared to ExpaRNA, ExpaRNA-P has the same very low complexity (quadratic in time and space), which is enabled by its novel structure ensemble-based sparsification. Furthermore, we devise a generalized chaining algorithm to compute compatible subsets of ExpaRNA-P’s sequence-structure motifs. Resulting in the very fast RNA alignment approach ExpLoc-P, we utilize the best chain as anchor constraints for the sequence-structure alignment tool LocARNA. ExpLoc-P is benchmarked in several variants and versus state-of-the-art approaches. In particular, we formally introduce and evaluate strict and relaxed variants of the problem; the latter makes the approach sensitive to compensatory mutations. Across a benchmark set of typical non-coding RNAs, ExpLoc-P has similar accuracy to LocARNA but is four times faster (in both variants), while it achieves a speed-up over 30-fold for the longest benchmark sequences (≈400nt). Finally, different ExpLoc-P variants enable tailoring of the method to specific application scenarios. ExpaRNA-P and ExpLoc-P are distributed as part of the LocARNA package. The source code is freely available at http://www.bioinf.uni-freiburg.de/Software/ExpaRNA-P. Conclusions ExpaRNA-P’s novel ensemble-based sparsification reduces its complexity to quadratic time and space. Thereby, ExpaRNA-P significantly speeds up sequence-structure alignment while maintaining the alignment quality. Different ExpaRNA-P variants support a wide range of applications. Electronic supplementary material The online version of this article (doi:10.1186/s12859-014-0404-0) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Christina Otto
- Bioinformatics, Institute of Computer Science, University of Freiburg, Freiburg, Germany.
| | - Mathias Möhl
- Bioinformatics, Institute of Computer Science, University of Freiburg, Freiburg, Germany.
| | - Steffen Heyne
- Max Planck Institute of Immunobiology and Epigenetics, Stuebeweg 51, Freiburg, 79108, Germany.
| | - Mika Amit
- Department of Computer Science, University of Haifa, Mount Carmel, Haifa, Israel.
| | - Gad M Landau
- Department of Computer Science, University of Haifa, Mount Carmel, Haifa, Israel. .,Department of Computer Science and Engineering, NYU-Poly, Brooklyn, NY, USA.
| | - Rolf Backofen
- Bioinformatics, Institute of Computer Science, University of Freiburg, Freiburg, Germany. .,Center for Biological Signaling Studies (BIOSS), University of Freiburg, Freiburg, Germany. .,Centre for Biological Systems Analysis (ZBSA), University of Freiburg, Freiburg, Germany. .,Center for non-coding RNA in Technology and Health, University of Copenhagen, Grønnegårdsvej 3, Frederiksberg C, DK-1870, Denmark.
| | - Sebastian Will
- Bioinformatics, Institute of Computer Science, University of Freiburg, Freiburg, Germany. .,Bioinformatics, Department of Computer Science, University of Leipzig, Leipzig, Germany.
| |
Collapse
|
23
|
Badr G, Al-Turaiki I, Turcotte M, Mathkour H. IncMD: incremental trie-based structural motif discovery algorithm. J Bioinform Comput Biol 2014; 12:1450027. [PMID: 25362841 DOI: 10.1142/s0219720014500279] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
The discovery of common RNA secondary structure motifs is an important problem in bioinformatics. The presence of such motifs is usually associated with key biological functions. However, the identification of structural motifs is far from easy. Unlike motifs in sequences, which have conserved bases, structural motifs have common structure arrangements even if the underlying sequences are different. Over the past few years, hundreds of algorithms have been published for the discovery of sequential motifs, while less work has been done for the structural motifs case. Current structural motif discovery algorithms are limited in terms of accuracy and scalability. In this paper, we present an incremental and scalable algorithm for discovering RNA secondary structure motifs, namely IncMD. We consider the structural motif discovery as a frequent pattern mining problem and tackle it using a modified a priori algorithm. IncMD uses data structures, trie-based linked lists of prefixes (LLP), to accelerate the search and retrieval of patterns, support counting, and candidate generation. We modify the candidate generation step in order to adapt it to the RNA secondary structure representation. IncMD constructs the frequent patterns incrementally from RNA secondary structure basic elements, using nesting and joining operations. The notion of a motif group is introduced in order to simulate an alignment of motifs that only differ in the number of unpaired bases. In addition, we use a cluster beam approach to select motifs that will survive to the next iterations of the search. Results indicate that IncMD can perform better than some of the available structural motif discovery algorithms in terms of sensitivity (Sn), positive predictive value (PPV), and specificity (Sp). The empirical results also show that the algorithm is scalable and runs faster than all of the compared algorithms.
Collapse
Affiliation(s)
- Ghada Badr
- College of Computer and Information Sciences, King Saud University, Riyadh, Kingdom of Saudi Arabia , IRI - The City of Scientific Research and Technological Applications, University and Research District, P. O. 21934, New Borg Alarab, Alexandria, Egypt
| | | | | | | |
Collapse
|
24
|
Who watches the watchmen? An appraisal of benchmarks for multiple sequence alignment. Methods Mol Biol 2014; 1079:59-73. [PMID: 24170395 DOI: 10.1007/978-1-62703-646-7_4] [Citation(s) in RCA: 36] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Abstract
Multiple sequence alignment (MSA) is a fundamental and ubiquitous technique in bioinformatics used to infer related residues among biological sequences. Thus alignment accuracy is crucial to a vast range of analyses, often in ways difficult to assess in those analyses. To compare the performance of different aligners and help detect systematic errors in alignments, a number of benchmarking strategies have been pursued. Here we present an overview of the main strategies-based on simulation, consistency, protein structure, and phylogeny-and discuss their different advantages and associated risks. We outline a set of desirable characteristics for effective benchmarking, and evaluate each strategy in light of them. We conclude that there is currently no universally applicable means of benchmarking MSA, and that developers and users of alignment tools should base their choice of benchmark depending on the context of application-with a keen awareness of the assumptions underlying each benchmarking strategy.
Collapse
|
25
|
Abstract
De novo discovery of "motifs" capturing the commonalities among related noncoding ncRNA structured RNAs is among the most difficult problems in computational biology. This chapter outlines the challenges presented by this problem, together with some approaches towards solving them, with an emphasis on an approach based on the CMfinder CMfinder program as a case study. Applications to genomic screens for novel de novo structured ncRNA ncRNA s, including structured RNA elements in untranslated portions of protein-coding genes, are presented.
Collapse
Affiliation(s)
- Walter L Ruzzo
- Fred Hutchinson Cancer Research Center, Seattle, WA, 98109, USA
| | | |
Collapse
|
26
|
Andersen ES. The art of editing RNA structural alignments. Methods Mol Biol 2014; 1097:379-394. [PMID: 24639168 DOI: 10.1007/978-1-62703-709-9_17] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
Manual editing of RNA structural alignments may be considered more art than science, since it still requires an expert biologist to take multiple levels of information into account and be slightly creative when constructing high-quality alignments. Even though the task is rather tedious, it is rewarded by great insight into the evolution of structure and function of your favorite RNA molecule. In this chapter I will review the methods and considerations that go into constructing RNA structural alignments at the secondary and tertiary structure level; introduce software, databases, and algorithms that have proven useful in semiautomating the work process; and suggest future directions towards full automatization.
Collapse
|
27
|
Lai D, Proctor JR, Meyer IM. On the importance of cotranscriptional RNA structure formation. RNA (NEW YORK, N.Y.) 2013; 19:1461-1473. [PMID: 24131802 PMCID: PMC3851714 DOI: 10.1261/rna.037390.112] [Citation(s) in RCA: 116] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/29/2023]
Abstract
The expression of genes, both coding and noncoding, can be significantly influenced by RNA structural features of their corresponding transcripts. There is by now mounting experimental and some theoretical evidence that structure formation in vivo starts during transcription and that this cotranscriptional folding determines the functional RNA structural features that are being formed. Several decades of research in bioinformatics have resulted in a wide range of computational methods for predicting RNA secondary structures. Almost all state-of-the-art methods in terms of prediction accuracy, however, completely ignore the process of structure formation and focus exclusively on the final RNA structure. This review hopes to bridge this gap. We summarize the existing evidence for cotranscriptional folding and then review the different, currently used strategies for RNA secondary-structure prediction. Finally, we propose a range of ideas on how state-of-the-art methods could be potentially improved by explicitly capturing the process of cotranscriptional structure formation.
Collapse
|
28
|
Bussotti G, Notredame C, Enright AJ. Detecting and comparing non-coding RNAs in the high-throughput era. Int J Mol Sci 2013; 14:15423-58. [PMID: 23887659 PMCID: PMC3759867 DOI: 10.3390/ijms140815423] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2013] [Revised: 07/16/2013] [Accepted: 07/17/2013] [Indexed: 02/07/2023] Open
Abstract
In recent years there has been a growing interest in the field of non-coding RNA. This surge is a direct consequence of the discovery of a huge number of new non-coding genes and of the finding that many of these transcripts are involved in key cellular functions. In this context, accurately detecting and comparing RNA sequences has become important. Aligning nucleotide sequences is a key requisite when searching for homologous genes. Accurate alignments reveal evolutionary relationships, conserved regions and more generally any biologically relevant pattern. Comparing RNA molecules is, however, a challenging task. The nucleotide alphabet is simpler and therefore less informative than that of amino-acids. Moreover for many non-coding RNAs, evolution is likely to be mostly constrained at the structural level and not at the sequence level. This results in very poor sequence conservation impeding comparison of these molecules. These difficulties define a context where new methods are urgently needed in order to exploit experimental results to their full potential. This review focuses on the comparative genomics of non-coding RNAs in the context of new sequencing technologies and especially dealing with two extremely important and timely research aspects: the development of new methods to align RNAs and the analysis of high-throughput data.
Collapse
Affiliation(s)
- Giovanni Bussotti
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK; E-Mail:
| | - Cedric Notredame
- Bioinformatics and Genomics Program, Centre for Genomic Regulation (CRG), Aiguader, 88, 08003 Barcelona, Spain; E-Mail:
| | - Anton J. Enright
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK; E-Mail:
| |
Collapse
|
29
|
Badr G, Al-Turaiki I, Mathkour H. Classification and assessment tools for structural motif discovery algorithms. BMC Bioinformatics 2013; 14 Suppl 9:S4. [PMID: 23902564 PMCID: PMC3698030 DOI: 10.1186/1471-2105-14-s9-s4] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
Abstract
BACKGROUND Motif discovery is the problem of finding recurring patterns in biological data. Patterns can be sequential, mainly when discovered in DNA sequences. They can also be structural (e.g. when discovering RNA motifs). Finding common structural patterns helps to gain a better understanding of the mechanism of action (e.g. post-transcriptional regulation). Unlike DNA motifs, which are sequentially conserved, RNA motifs exhibit conservation in structure, which may be common even if the sequences are different. Over the past few years, hundreds of algorithms have been developed to solve the sequential motif discovery problem, while less work has been done for the structural case. METHODS In this paper, we survey, classify, and compare different algorithms that solve the structural motif discovery problem, where the underlying sequences may be different. We highlight their strengths and weaknesses. We start by proposing a benchmark dataset and a measurement tool that can be used to evaluate different motif discovery approaches. Then, we proceed by proposing our experimental setup. Finally, results are obtained using the proposed benchmark to compare available tools. To the best of our knowledge, this is the first attempt to compare tools solely designed for structural motif discovery. RESULTS Results show that the accuracy of discovered motifs is relatively low. The results also suggest a complementary behavior among tools where some tools perform well on simple structures, while other tools are better for complex structures. CONCLUSIONS We have classified and evaluated the performance of available structural motif discovery tools. In addition, we have proposed a benchmark dataset with tools that can be used to evaluate newly developed tools.
Collapse
Affiliation(s)
- Ghada Badr
- King Saud University, College of Computer and Information Sciences, Riyadh, Kingdom of Saudi Arabia
- IRI - The City of Scientific Research and Technological Applications, University and Research District, P.O. 21934 New Borg Alarab, Alexandria, Egypt
| | - Isra Al-Turaiki
- King Saud University, College of Computer and Information Sciences, Riyadh, Kingdom of Saudi Arabia
| | - Hassan Mathkour
- King Saud University, College of Computer and Information Sciences, Riyadh, Kingdom of Saudi Arabia
| |
Collapse
|
30
|
Will S, Siebauer MF, Heyne S, Engelhardt J, Stadler PF, Reiche K, Backofen R. LocARNAscan: Incorporating thermodynamic stability in sequence and structure-based RNA homology search. Algorithms Mol Biol 2013; 8:14. [PMID: 23601347 PMCID: PMC3716875 DOI: 10.1186/1748-7188-8-14] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2013] [Accepted: 03/28/2013] [Indexed: 12/15/2022] Open
Abstract
Background The search for distant homologs has become an import issue in genome annotation. A particular difficulty is posed by divergent homologs that have lost recognizable sequence similarity. This same problem also arises in the recognition of novel members of large classes of RNAs such as snoRNAs or microRNAs that consist of families unrelated by common descent. Current homology search tools for structured RNAs are either based entirely on sequence similarity (such as blast or hmmer) or combine sequence and secondary structure. The most prominent example of the latter class of tools is Infernal. Alternatives are descriptor-based methods. In most practical applications published to-date, however, the information contained in covariance models or manually prescribed search patterns is dominated by sequence information. Here we ask two related questions: (1) Is secondary structure alone informative for homology search and the detection of novel members of RNA classes? (2) To what extent is the thermodynamic propensity of the target sequence to fold into the correct secondary structure helpful for this task? Results Sequence-structure alignment can be used as an alternative search strategy. In this scenario, the query consists of a base pairing probability matrix, which can be derived either from a single sequence or from a multiple alignment representing a set of known representatives. Sequence information can be optionally added to the query. The target sequence is pre-processed to obtain local base pairing probabilities. As a search engine we devised a semi-global scanning variant of LocARNA’s algorithm for sequence-structure alignment. The LocARNAscan tool is optimized for speed and low memory consumption. In benchmarking experiments on artificial data we observe that the inclusion of thermodynamic stability is helpful, albeit only in a regime of extremely low sequence information in the query. We observe, furthermore, that the sensitivity is bounded in particular by the limited accuracy of the predicted local structures of the target sequence. Conclusions Although we demonstrate that a purely structure-based homology search is feasible in principle, it is unlikely to outperform tools such as Infernal in most application scenarios, where a substantial amount of sequence information is typically available. The LocARNAscan approach will profit, however, from high throughput methods to determine RNA secondary structure. In transcriptome-wide applications, such methods will provide accurate structure annotations on the target side. Availability Source code of the free software LocARNAscan 1.0 and supplementary data are available at
http://www.bioinf.uni-leipzig.de/Software/LocARNAscan.
Collapse
|
31
|
Abstract
MOTIVATION To recognize remote relationships between RNA molecules, one must be able to align structures without regard to sequence similarity. We have implemented a method, which is swift [O(n(2))], sensitive and tolerant of large gaps and insertions. Molecules are broken into overlapping fragments, which are characterized by their memberships in a probabilistic classification based on local geometry and H-bonding descriptors. This leads to a probabilistic similarity measure that is used in a conventional dynamic programming method. RESULTS Examples are given of database searching, the detection of structural similarities, which would not be found using sequence based methods, and comparisons with a previously published approach. AVAILABILITY AND IMPLEMENTATION Source code (C and perl) and binaries for linux are freely available at www.zbh.uni-hamburg.de/fries.
Collapse
Affiliation(s)
- Tim Wiegels
- Centre for Bioinformatics, University of Hamburg, Bundesstr. 43, D-20146 Hamburg, Germany.
| | | | | |
Collapse
|
32
|
Lei J, Techa-Angkoon P, Sun Y. Chain-RNA: a comparative ncRNA search tool based on the two-dimensional chain algorithm. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2013; 10:274-285. [PMID: 23929857 DOI: 10.1109/tcbb.2012.137] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/02/2023]
Abstract
Noncoding RNA (ncRNA) identification is highly important to modern biology. The state-of-the-art method for ncRNA identification is based on comparative genomics, in which evolutionary conservations of sequences and secondary structures provide important evidence for ncRNA search. For ncRNAs with low sequence conservation but high structural similarity, conventional local alignment tools such as BLAST yield low sensitivity. Thus, there is a need for ncRNA search methods that can incorporate both sequence and structural similarities. We introduce chain-RNA, a pairwise structural alignment tool that can effectively locate cross-species conserved RNA elements with low sequence similarity. In chain-RNA, stem-loop structures are extracted from dot plots generated by an efficient local-folding algorithm. Then, we formulate stem alignment as an extended 2D chain problem and employ existing chain algorithms. Chain-RNA is tested on a data set containing annotated ncRNA homologs and is applied to novel ncRNA search in a transcriptomic data set. The experimental results show that chain-RNA has better tradeoff between sensitivity and false positive rate in ncRNA prediction than conventional sequence similarity search tools and is more time efficient than structural alignment tools. The source codes of chain-RNA can be downloaded at http://sourceforge.net/projects/chain-rna/ or at http://www.cse.msu.edu/~leijikai/chain-rna/.
Collapse
Affiliation(s)
- Jikai Lei
- Michigan State University, East Lansing, MI 48824, USA
| | | | | |
Collapse
|
33
|
Havgaard J, Kaur S, Gorodkin J. Comparative ncRNA gene and structure prediction using Foldalign and FoldalignM. ACTA ACUST UNITED AC 2012; Chapter 12:12.11.1-12.11.15. [PMID: 22948726 DOI: 10.1002/0471250953.bi1211s39] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
This unit describes how to use Foldalign and FoldalignM to make structural alignments of non-protein-coding-RNA (ncRNA). These tools can be used to find new ncRNAs, to find the structure of novel ncRNAs, and to improve alignments for known ncRNAs.
Collapse
Affiliation(s)
- Jakob Havgaard
- Center for Non-Coding RNA in Technology and Health, Department of Veterinary Clinical and Animal Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Simranjeet Kaur
- Center for Non-Coding RNA in Technology and Health, Department of Veterinary Clinical and Animal Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Jan Gorodkin
- Center for Non-Coding RNA in Technology and Health, Department of Veterinary Clinical and Animal Sciences, University of Copenhagen, Copenhagen, Denmark
| |
Collapse
|
34
|
Mendes ND, Heyne S, Freitas AT, Sagot MF, Backofen R. Navigating the unexplored seascape of pre-miRNA candidates in single-genome approaches. Bioinformatics 2012; 28:3034-41. [PMID: 23052038 PMCID: PMC3516144 DOI: 10.1093/bioinformatics/bts574] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Motivation: The computational search for novel microRNA (miRNA) precursors often involves some sort of structural analysis with the aim of identifying which type of structures are prone to being recognized and processed by the cellular miRNA-maturation machinery. A natural way to tackle this problem is to perform clustering over the candidate structures along with known miRNA precursor structures. Mixed clusters allow then the identification of candidates that are similar to known precursors. Given the large number of pre-miRNA candidates that can be identified in single-genome approaches, even after applying several filters for precursor robustness and stability, a conventional structural clustering approach is unfeasible. Results: We propose a method to represent candidate structures in a feature space, which summarizes key sequence/structure characteristics of each candidate. We demonstrate that proximity in this feature space is related to sequence/structure similarity, and we select candidates that have a high similarity to known precursors. Additional filtering steps are then applied to further reduce the number of candidates to those with greater transcriptional potential. Our method is compared with another single-genome method (TripletSVM) in two datasets, showing better performance in one and comparable performance in the other, for larger training sets. Additionally, we show that our approach allows for a better interpretation of the results. Availability and Implementation: The MinDist method is implemented using Perl scripts and is freely available at http://www.cravela.org/?mindist=1. Contact:backofen@informatik.uni-freiburg.de Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Nuno D Mendes
- Équipe BAOBAB, Laboratoire de Biométrie et Biologie Évolutive (UMR 5558), CNRS, University of Lyon 1, Lyon 69622, Villeurbanne Cedex, France
| | | | | | | | | |
Collapse
|
35
|
Bompfünewerer AF, Flamm C, Fried C, Fritzsch G, Hofacker IL, Lehmann J, Missal K, Mosig A, Müller B, Prohaska SJ, Stadler BMR, Stadler PF, Tanzer A, Washietl S, Witwer C. Evolutionary patterns of non-coding RNAs. Theory Biosci 2012; 123:301-69. [PMID: 18202870 DOI: 10.1016/j.thbio.2005.01.002] [Citation(s) in RCA: 59] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2004] [Accepted: 01/24/2005] [Indexed: 01/04/2023]
Abstract
A plethora of new functions of non-coding RNAs (ncRNAs) have been discovered in past few years. In fact, RNA is emerging as the central player in cellular regulation, taking on active roles in multiple regulatory layers from transcription, RNA maturation, and RNA modification to translational regulation. Nevertheless, very little is known about the evolution of this "Modern RNA World" and its components. In this contribution, we attempt to provide at least a cursory overview of the diversity of ncRNAs and functional RNA motifs in non-translated regions of regular messenger RNAs (mRNAs) with an emphasis on evolutionary questions. This survey is complemented by an in-depth analysis of examples from different classes of RNAs focusing mostly on their evolution in the vertebrate lineage. We present a survey of Y RNA genes in vertebrates and study the molecular evolution of the U7 snRNA, the snoRNAs E1/U17, E2, and E3, the Y RNA family, the let-7 microRNA (miRNA) family, and the mRNA-like evf-1 gene. We furthermore discuss the statistical distribution of miRNAs in metazoans, which suggests an explosive increase in the miRNA repertoire in vertebrates. The analysis of the transcription of ncRNAs suggests that small RNAs in general are genetically mobile in the sense that their association with a hostgene (e.g. when transcribed from introns of a mRNA) can change on evolutionary time scales. The let-7 family demonstrates, that even the mode of transcription (as intron or as exon) can change among paralogous ncRNA.
Collapse
|
36
|
Will S, Joshi T, Hofacker IL, Stadler PF, Backofen R. LocARNA-P: accurate boundary prediction and improved detection of structural RNAs. RNA (NEW YORK, N.Y.) 2012; 18:900-14. [PMID: 22450757 PMCID: PMC3334699 DOI: 10.1261/rna.029041.111] [Citation(s) in RCA: 240] [Impact Index Per Article: 20.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/01/2011] [Accepted: 01/18/2012] [Indexed: 05/18/2023]
Abstract
Current genomic screens for noncoding RNAs (ncRNAs) predict a large number of genomic regions containing potential structural ncRNAs. The analysis of these data requires highly accurate prediction of ncRNA boundaries and discrimination of promising candidate ncRNAs from weak predictions. Existing methods struggle with these goals because they rely on sequence-based multiple sequence alignments, which regularly misalign RNA structure and therefore do not support identification of structural similarities. To overcome this limitation, we compute columnwise and global reliabilities of alignments based on sequence and structure similarity; we refer to these structure-based alignment reliabilities as STARs. The columnwise STARs of alignments, or STAR profiles, provide a versatile tool for the manual and automatic analysis of ncRNAs. In particular, we improve the boundary prediction of the widely used ncRNA gene finder RNAz by a factor of 3 from a median deviation of 47 to 13 nt. Post-processing RNAz predictions, LocARNA-P's STAR score allows much stronger discrimination between true- and false-positive predictions than RNAz's own evaluation. The improved accuracy, in this scenario increased from AUC 0.71 to AUC 0.87, significantly reduces the cost of successive analysis steps. The ready-to-use software tool LocARNA-P produces structure-based multiple RNA alignments with associated columnwise STARs and predicts ncRNA boundaries. We provide additional results, a web server for LocARNA/LocARNA-P, and the software package, including documentation and a pipeline for refining screens for structural ncRNA, at http://www.bioinf.uni-freiburg.de/Supplements/LocARNA-P/.
Collapse
Affiliation(s)
- Sebastian Will
- Chair for Bioinformatics, Institute of Computer Science, Albert-Ludwigs-Universität, D-79110 Freiburg, Germany
- Computation and Biology Group, CSAIL and Mathematics Department, MIT, Cambridge, Massachusetts 02139, USA
| | - Tejal Joshi
- Center for Biological Sequence Analysis, Department of Systems Biology, Technical University of Denmark, DK-2800 Kongens Lyngby, Denmark
| | - Ivo L. Hofacker
- Department of Theoretical Chemistry, University of Vienna, A-1090 Wien, Austria
| | - Peter F. Stadler
- Department of Theoretical Chemistry, University of Vienna, A-1090 Wien, Austria
- Bioinformatics Group, Department of Computer Science, Interdisciplinary Center of Bioinformatics, University of Leipzig, D-04107 Leipzig, Germany
- Max-Planck-Institute for Mathematics in the Sciences, D-04103 Leipzig, Germany
- Fraunhofer Institute for Cell Therapy and Immunology, D-04103 Leipzig, Germany
- Santa Fe Institute, Santa Fe, New Mexico 87501, USA
| | - Rolf Backofen
- Chair for Bioinformatics, Institute of Computer Science, Albert-Ludwigs-Universität, D-79110 Freiburg, Germany
- Center for Biological Signaling Studies (BIOSS), University of Freiburg, D-79104 Freiburg, Germany
| |
Collapse
|
37
|
Sun Y, Aljawad O, Lei J, Liu A. Genome-scale NCRNA homology search using a Hamming distance-based filtration strategy. BMC Bioinformatics 2012; 13 Suppl 3:S12. [PMID: 22536896 PMCID: PMC3311100 DOI: 10.1186/1471-2105-13-s3-s12] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022] Open
Abstract
BACKGROUND NCRNAs (noncoding RNAs) play important roles in many biological processes. Existing genome-scale ncRNA search tools identify ncRNAs in local sequence alignments generated by conventional sequence comparison methods. However, some types of ncRNA lack strong sequence conservation and tend to be missed or mis-aligned by conventional sequence comparison. RESULTS In this paper, we propose an ncRNA identification framework that is complementary to existing sequence comparison tools. By integrating a filtration step based on Hamming distance and ncRNA alignment programs such as FOLDALIGN or PLAST-ncRNA, the proposed ncRNA search framework can identify ncRNAs that lack strong sequence conservation. In addition, as the ratio of transition and transversion mutation is often used as a discriminative feature for functional ncRNA identification, we incorporate this feature into the filtration step using a coding strategy. We apply Hamming distance seeds to ncRNA search in the intergenic regions of human and mouse genomes and between the Burkholderia cenocepacia J2315 genome and the Ralstonia solanacearum genome. The experimental results demonstrate that a carefully designed Hamming distance seed can achieve better sensitivity in searching for poorly conserved ncRNAs than conventional sequence comparison tools. CONCLUSIONS Hamming distance seeds provide better sensitivity as a filtration strategy for genome-wide ncRNA homology search than the existing seeding strategies used in BLAST-like tools. By combining Hamming distance seeds matching and ncRNA alignment, we are able to find ncRNAs with sequence similarities below 60%.
Collapse
Affiliation(s)
- Yanni Sun
- Department of Computer Science and Engineering, Michigan State University, East Lansing, MI 48824, USA
| | - Osama Aljawad
- Department of Computer Science and Engineering, Michigan State University, East Lansing, MI 48824, USA
| | - Jikai Lei
- Department of Computer Science and Engineering, Michigan State University, East Lansing, MI 48824, USA
| | - Alex Liu
- Department of Computer Science and Engineering, Michigan State University, East Lansing, MI 48824, USA
| |
Collapse
|
38
|
Martin JS, Halvorsen M, Davis-Neulander L, Ritz J, Gopinath C, Beauregard A, Laederach A. Structural effects of linkage disequilibrium on the transcriptome. RNA (NEW YORK, N.Y.) 2012; 18:77-87. [PMID: 22109839 PMCID: PMC3261746 DOI: 10.1261/rna.029900.111] [Citation(s) in RCA: 36] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/14/2023]
Abstract
A majority of SNPs (single nucleotide polymorphisms) map to noncoding and intergenic regions of the genome. Noncoding SNPs are often identified in genome-wide association studies (GWAS) as strongly associated with human disease. Two such disease-associated SNPs in the 5' UTR of the human FTL (Ferritin Light Chain) gene are predicted to alter the ensemble of structures adopted by the mRNA. High-accuracy single nucleotide resolution chemical mapping reveals that these SNPs result in substantial changes in the structural ensemble in agreement with the computational prediction. Furthermore six rescue mutations are correctly predicted to restore the mRNA to its wild-type ensemble. Our data confirm that the FTL 5' UTR is a "RiboSNitch," an RNA that changes structure if a particular disease-associated SNP is present. The structural change observed is analogous to that of a bacterial Riboswitch in that it likely regulates translation. These data further suggest that specific pairs of SNPs in high linkage disequilibrium (LD) will form RNA structure-stabilizing haplotypes (SSHs). We identified 484 SNP pairs that form SSHs in UTRs of the human genome, and in eight of the 10 SSH-containing transcripts, SNP pairs stabilize RNA protein binding sites. The ubiquitous nature of SSHs in the transcriptome suggests that certain haplotypes are conserved to avoid RiboSNitch formation.
Collapse
Affiliation(s)
- Joshua S. Martin
- Department of Biology, University of North Carolina, Chapel Hill, North Carolina 27599, USA
| | - Matthew Halvorsen
- Department of Biology, University of North Carolina, Chapel Hill, North Carolina 27599, USA
| | - Lauren Davis-Neulander
- Developmental Genetics and Bioinformatics, Wadsworth Center, Albany, New York 12208, USA
| | - Justin Ritz
- Department of Biology, University of North Carolina, Chapel Hill, North Carolina 27599, USA
| | - Chetna Gopinath
- Biomedical Sciences Department, University at Albany, Albany, New York 12208, USA
| | - Arthur Beauregard
- Biomedical Sciences Department, University at Albany, Albany, New York 12208, USA
| | - Alain Laederach
- Department of Biology, University of North Carolina, Chapel Hill, North Carolina 27599, USA
- Corresponding author.E-mail .
| |
Collapse
|
39
|
Abstract
MOTIVATION Homology search for RNAs can use secondary structure information to increase power by modeling base pairs, as in covariance models, but the resulting computational costs are high. Typical acceleration strategies rely on at least one filtering stage using sequence-only search. RESULTS Here we present the multi-segment CYK (MSCYK) filter, which implements a heuristic of ungapped structural alignment for RNA homology search. Compared to gapped alignment, this approximation has lower computation time requirements (O(N⁴) reduced to O(N³), and space requirements (O(N³) reduced to O(N²). A vector-parallel implementation of this method gives up to 100-fold speed-up; vector-parallel implementations of standard gapped alignment at two levels of precision give 3- and 6-fold speed-ups. These approaches are combined to create a filtering pipeline that scores RNA secondary structure at all stages, with results that are synergistic with existing methods.
Collapse
Affiliation(s)
- Diana L Kolbe
- Janelia Farm Research Campus, Howard Hughes Medical Institute, Ashburn, VA 20147, USA
| | | |
Collapse
|
40
|
Lerman YV, Kennedy SD, Shankar N, Parisien M, Major F, Turner DH. NMR structure of a 4 x 4 nucleotide RNA internal loop from an R2 retrotransposon: identification of a three purine-purine sheared pair motif and comparison to MC-SYM predictions. RNA (NEW YORK, N.Y.) 2011; 17:1664-77. [PMID: 21778280 PMCID: PMC3162332 DOI: 10.1261/rna.2641911] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/24/2011] [Accepted: 05/08/2011] [Indexed: 05/31/2023]
Abstract
The NMR solution structure is reported of a duplex, 5'GUGAAGCCCGU/3'UCACAGGAGGC, containing a 4 × 4 nucleotide internal loop from an R2 retrotransposon RNA. The loop contains three sheared purine-purine pairs and reveals a structural element found in other RNAs, which we refer to as the 3RRs motif. Optical melting measurements of the thermodynamics of the duplex indicate that the internal loop is 1.6 kcal/mol more stable at 37°C than predicted. The results identify the 3RRs motif as a common structural element that can facilitate prediction of 3D structure. Known examples include internal loops having the pairings: 5'GAA/3'AGG, 5'GAG/3'AGG, 5'GAA/3'AAG, and 5'AAG/3'AGG. The structural information is compared with predictions made with the MC-Sym program.
Collapse
Affiliation(s)
- Yelena V. Lerman
- Department of Chemistry, University of Rochester, Rochester, New York 14627, USA
| | - Scott D. Kennedy
- Department of Biochemistry and Biophysics, University of Rochester School of Medicine and Dentistry, Rochester, New York 14642, USA
| | - Neelaabh Shankar
- Department of Biochemistry and Biophysics, University of Rochester School of Medicine and Dentistry, Rochester, New York 14642, USA
| | - Marc Parisien
- Department of Computer Science and Operations Research, University of Montreal, Montreal, Quebec H3C CJ7, Canada
| | - Francois Major
- Department of Computer Science and Operations Research, University of Montreal, Montreal, Quebec H3C CJ7, Canada
| | - Douglas H. Turner
- Department of Chemistry, University of Rochester, Rochester, New York 14627, USA
| |
Collapse
|
41
|
Zakov S, Tsur D, Ziv-Ukelson M. Reducing the worst case running times of a family of RNA and CFG problems, using Valiant's approach. Algorithms Mol Biol 2011; 6:20. [PMID: 21851589 PMCID: PMC3741081 DOI: 10.1186/1748-7188-6-20] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2010] [Accepted: 08/18/2011] [Indexed: 01/11/2023] Open
Abstract
Background RNA secondary structure prediction is a mainstream bioinformatic domain, and is key to computational analysis of functional RNA. In more than 30 years, much research has been devoted to defining different variants of RNA structure prediction problems, and to developing techniques for improving prediction quality. Nevertheless, most of the algorithms in this field follow a similar dynamic programming approach as that presented by Nussinov and Jacobson in the late 70's, which typically yields cubic worst case running time algorithms. Recently, some algorithmic approaches were applied to improve the complexity of these algorithms, motivated by new discoveries in the RNA domain and by the need to efficiently analyze the increasing amount of accumulated genome-wide data. Results We study Valiant's classical algorithm for Context Free Grammar recognition in sub-cubic time, and extract features that are common to problems on which Valiant's approach can be applied. Based on this, we describe several problem templates, and formulate generic algorithms that use Valiant's technique and can be applied to all problems which abide by these templates, including many problems within the world of RNA Secondary Structures and Context Free Grammars. Conclusions The algorithms presented in this paper improve the theoretical asymptotic worst case running time bounds for a large family of important problems. It is also possible that the suggested techniques could be applied to yield a practical speedup for these problems. For some of the problems (such as computing the RNA partition function and base-pair binding probabilities), the presented techniques are the only ones which are currently known for reducing the asymptotic running time bounds of the standard algorithms.
Collapse
|
42
|
Meyer F, Kurtz S, Backofen R, Will S, Beckstette M. Structator: fast index-based search for RNA sequence-structure patterns. BMC Bioinformatics 2011; 12:214. [PMID: 21619640 PMCID: PMC3154205 DOI: 10.1186/1471-2105-12-214] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2010] [Accepted: 05/27/2011] [Indexed: 12/26/2022] Open
Abstract
BACKGROUND The secondary structure of RNA molecules is intimately related to their function and often more conserved than the sequence. Hence, the important task of searching databases for RNAs requires to match sequence-structure patterns. Unfortunately, current tools for this task have, in the best case, a running time that is only linear in the size of sequence databases. Furthermore, established index data structures for fast sequence matching, like suffix trees or arrays, cannot benefit from the complementarity constraints introduced by the secondary structure of RNAs. RESULTS We present a novel method and readily applicable software for time efficient matching of RNA sequence-structure patterns in sequence databases. Our approach is based on affix arrays, a recently introduced index data structure, preprocessed from the target database. Affix arrays support bidirectional pattern search, which is required for efficiently handling the structural constraints of the pattern. Structural patterns like stem-loops can be matched inside out, such that the loop region is matched first and then the pairing bases on the boundaries are matched consecutively. This allows to exploit base pairing information for search space reduction and leads to an expected running time that is sublinear in the size of the sequence database. The incorporation of a new chaining approach in the search of RNA sequence-structure patterns enables the description of molecules folding into complex secondary structures with multiple ordered patterns. The chaining approach removes spurious matches from the set of intermediate results, in particular of patterns with little specificity. In benchmark experiments on the Rfam database, our method runs up to two orders of magnitude faster than previous methods. CONCLUSIONS The presented method's sublinear expected running time makes it well suited for RNA sequence-structure pattern matching in large sequence databases. RNA molecules containing several stem-loop substructures can be described by multiple sequence-structure patterns and their matches are efficiently handled by a novel chaining method. Beyond our algorithmic contributions, we provide with Structator a complete and robust open-source software solution for index-based search of RNA sequence-structure patterns. The Structator software is available at http://www.zbh.uni-hamburg.de/Structator.
Collapse
Affiliation(s)
- Fernando Meyer
- Center for Bioinformatics, University of Hamburg, Bundesstrasse 43, 20146 Hamburg, Germany
| | | | | | | | | |
Collapse
|
43
|
Backofen R, Tsur D, Zakov S, Ziv-Ukelson M. Sparse RNA folding: Time and space efficient algorithms. ACTA ACUST UNITED AC 2011. [DOI: 10.1016/j.jda.2010.09.001] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
|
44
|
Sahraeian SME, Yoon BJ. PicXAA-R: efficient structural alignment of multiple RNA sequences using a greedy approach. BMC Bioinformatics 2011; 12 Suppl 1:S38. [PMID: 21342569 PMCID: PMC3044294 DOI: 10.1186/1471-2105-12-s1-s38] [Citation(s) in RCA: 61] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/30/2023] Open
Abstract
Background Accurate and efficient structural alignment of non-coding RNAs (ncRNAs) has grasped more and more attentions as recent studies unveiled the significance of ncRNAs in living organisms. While the Sankoff style structural alignment algorithms cannot efficiently serve for multiple sequences, mostly progressive schemes are used to reduce the complexity. However, this idea tends to propagate the early stage errors throughout the entire process, thereby degrading the quality of the final alignment. For multiple protein sequence alignment, we have recently proposed PicXAA which constructs an accurate alignment in a non-progressive fashion. Results Here, we propose PicXAA-R as an extension to PicXAA for greedy structural alignment of ncRNAs. PicXAA-R efficiently grasps both folding information within each sequence and local similarities between sequences. It uses a set of probabilistic consistency transformations to improve the posterior base-pairing and base alignment probabilities using the information of all sequences in the alignment. Using a graph-based scheme, we greedily build up the structural alignment from sequence regions with high base-pairing and base alignment probabilities. Conclusions Several experiments on datasets with different characteristics confirm that PicXAA-R is one of the fastest algorithms for structural alignment of multiple RNAs and it consistently yields accurate alignment results, especially for datasets with locally similar sequences. PicXAA-R source code is freely available at: http://www.ece.tamu.edu/~bjyoon/picxaa/.
Collapse
|
45
|
Patel V, Wang JTL, Setia S, Verma A, Warden CD, Zhang K. On comparing two structured RNA multiple alignments. J Bioinform Comput Biol 2010; 8:967-80. [PMID: 21121021 DOI: 10.1142/s021972001000504x] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2010] [Accepted: 08/02/2010] [Indexed: 11/18/2022]
Abstract
We present a method, called BlockMatch, for aligning two blocks, where a block is an RNA multiple sequence alignment with the consensus secondary structure of the alignment in Stockholm format. The method employs a quadratic-time dynamic programming algorithm for aligning columns and column pairs of the multiple alignments in the blocks. Unlike many other tools that can perform pairwise alignment of either single sequences or structures only, BlockMatch takes into account the characteristics of all the sequences in the blocks along with their consensus structures during the alignment process, thus being able to achieve a high-quality alignment result. We apply BlockMatch to phylogeny reconstruction on a set of 5S rRNA sequences taken from fifteen bacteria species. Experimental results showed that the phylogenetic tree generated by our method is more accurate than the tree constructed based on the widely used ClustalW tool. The BlockMatch algorithm is implemented into a web server, accessible at http://bioinformatics.njit.edu/blockmatch. A jar file of the program is also available for download from the web server.
Collapse
Affiliation(s)
- Vandanaben Patel
- Department of Computer Science, New Jersey Institute of Technology, Newark, NJ 07102, USA.
| | | | | | | | | | | |
Collapse
|
46
|
Mathews DH, Moss WN, Turner DH. Folding and finding RNA secondary structure. Cold Spring Harb Perspect Biol 2010; 2:a003665. [PMID: 20685845 DOI: 10.1101/cshperspect.a003665] [Citation(s) in RCA: 106] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
Optimal exploitation of the expanding database of sequences requires rapid finding and folding of RNAs. Methods are reviewed that automate folding and discovery of RNAs with algorithms that couple thermodynamics with chemical mapping, NMR, and/or sequence comparison. New functional noncoding RNAs in genome sequences can be found by combining sequence comparison with the assumption that functional noncoding RNAs will have more favorable folding free energies than other RNAs. When a new RNA is discovered, experiments and sequence comparison can restrict folding space so that secondary structure can be rapidly determined with the help of predicted free energies. In turn, secondary structure restricts folding in three dimensions, which allows modeling of three-dimensional structure. An example from a domain of a retrotransposon is described. Discovery of new RNAs and their structures will provide insights into evolution, biology, and design of therapeutics. Applications to studies of evolution are also reviewed.
Collapse
Affiliation(s)
- David H Mathews
- Department of Biochemistry and Biophysics and Center for RNA Biology, University of Rochester School of Medicine and Dentistry, Rochester, New York 14642, USA
| | | | | |
Collapse
|
47
|
Ziv-Ukelson M, Gat-Viks I, Wexler Y, Shamir R. A Faster Algorithm for Simultaneous Alignment and Folding of RNA. J Comput Biol 2010; 17:1051-65. [DOI: 10.1089/cmb.2009.0197] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Affiliation(s)
- Michal Ziv-Ukelson
- Department of Computer Science, Ben-Gurion University, Beer Sheva, Israel
| | - Irit Gat-Viks
- Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA
| | - Ydo Wexler
- Microsoft Research, Microsoft Corporation, Redmond, WA
| | - Ron Shamir
- School of Computer Science, Tel-Aviv University, Tel-Aviv, Israel
| |
Collapse
|
48
|
Wiebe NJP, Meyer IM. TRANSAT-- method for detecting the conserved helices of functional RNA structures, including transient, pseudo-knotted and alternative structures. PLoS Comput Biol 2010; 6:e1000823. [PMID: 20589081 PMCID: PMC2891591 DOI: 10.1371/journal.pcbi.1000823] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2009] [Accepted: 05/19/2010] [Indexed: 12/20/2022] Open
Abstract
The prediction of functional RNA structures has attracted increased interest, as it allows us to study the potential functional roles of many genes. RNA structure prediction methods, however, assume that there is a unique functional RNA structure and also do not predict functional features required for in vivo folding. In order to understand how functional RNA structures form in vivo, we require sophisticated experiments or reliable prediction methods. So far, there exist only a few, experimentally validated transient RNA structures. On the computational side, there exist several computer programs which aim to predict the co-transcriptional folding pathway in vivo, but these make a range of simplifying assumptions and do not capture all features known to influence RNA folding in vivo. We want to investigate if evolutionarily related RNA genes fold in a similar way in vivo. To this end, we have developed a new computational method, Transat, which detects conserved helices of high statistical significance. We introduce the method, present a comprehensive performance evaluation and show that Transat is able to predict the structural features of known reference structures including pseudo-knotted ones as well as those of known alternative structural configurations. Transat can also identify unstructured sub-sequences bound by other molecules and provides evidence for new helices which may define folding pathways, supporting the notion that homologous RNA sequence not only assume a similar reference RNA structure, but also fold similarly. Finally, we show that the structural features predicted by Transat differ from those assuming thermodynamic equilibrium. Unlike the existing methods for predicting folding pathways, our method works in a comparative way. This has the disadvantage of not being able to predict features as function of time, but has the considerable advantage of highlighting conserved features and of not requiring a detailed knowledge of the cellular environment. Many non-coding genes exert their function via an RNA structure which starts emerging while the RNA sequence is being transcribed from the genome. The resulting folding pathway is known to depend on a variety of features such as the transcription speed, the concentration of various ions and the binding of proteins and other molecules. Not all of these influences can be adequately captured by the existing computational methods which try to replicate what happens in vivo. So far, it has been challenging to experimentally investigate co-transcriptional folding pathways in vivo and only little data from in vitro experiments exists. In order to investigate if functionally similar RNA sequences from different organisms fold in a similar way, we have developed a new computational method, called Transat, which does not require the detailed computational modeling of the cellular environment. We show in a comprehensive analysis that our method is capable of detecting known structural features and provide evidence that structural features of the in vivo folding pathways have been conserved for several biologically interesting classes of RNA sequences.
Collapse
Affiliation(s)
- Nicholas J. P. Wiebe
- Centre for High-Throughput Biology & Department of Computer Science and Department of Medical Genetics, University of British Columbia, Vancouver, British Columbia, Canada
| | - Irmtraud M. Meyer
- Centre for High-Throughput Biology & Department of Computer Science and Department of Medical Genetics, University of British Columbia, Vancouver, British Columbia, Canada
- * E-mail:
| |
Collapse
|
49
|
Bremges A, Schirmer S, Giegerich R. Fine-tuning structural RNA alignments in the twilight zone. BMC Bioinformatics 2010; 11:222. [PMID: 20433706 PMCID: PMC2876130 DOI: 10.1186/1471-2105-11-222] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2009] [Accepted: 04/30/2010] [Indexed: 11/25/2022] Open
Abstract
Background A widely used method to find conserved secondary structure in RNA is to first construct a multiple sequence alignment, and then fold the alignment, optimizing a score based on thermodynamics and covariance. This method works best around 75% sequence similarity. However, in a "twilight zone" below 55% similarity, the sequence alignment tends to obscure the covariance signal used in the second phase. Therefore, while the overall shape of the consensus structure may still be found, the degree of conservation cannot be estimated reliably. Results Based on a combination of available methods, we present a method named planACstar for improving structure conservation in structural alignments in the twilight zone. After constructing a consensus structure by alignment folding, planACstar abandons the original sequence alignment, refolds the sequences individually, but consistent with the consensus, aligns the structures, irrespective of sequence, by a pure structure alignment method, and derives an improved sequence alignment from the alignment of structures, to be re-submitted to alignment folding, etc.. This circle may be iterated as long as structural conservation improves, but normally, one step suffices. Conclusions Employing the tools ClustalW, RNAalifold, and RNAforester, we find that for sequences with 30-55% sequence identity, structural conservation can be improved by 10% on average, with a large variation, measured in terms of RNAalifold's own criterion, the structure conservation index.
Collapse
Affiliation(s)
- Andreas Bremges
- Faculty of Technology, Bielefeld University, 33615 Bielefeld, Germany
| | | | | |
Collapse
|
50
|
Reuter JS, Mathews DH. RNAstructure: software for RNA secondary structure prediction and analysis. BMC Bioinformatics 2010; 11:129. [PMID: 20230624 PMCID: PMC2984261 DOI: 10.1186/1471-2105-11-129] [Citation(s) in RCA: 1271] [Impact Index Per Article: 90.8] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2009] [Accepted: 03/15/2010] [Indexed: 11/16/2022] Open
Abstract
Background To understand an RNA sequence's mechanism of action, the structure must be known. Furthermore, target RNA structure is an important consideration in the design of small interfering RNAs and antisense DNA oligonucleotides. RNA secondary structure prediction, using thermodynamics, can be used to develop hypotheses about the structure of an RNA sequence. Results RNAstructure is a software package for RNA secondary structure prediction and analysis. It uses thermodynamics and utilizes the most recent set of nearest neighbor parameters from the Turner group. It includes methods for secondary structure prediction (using several algorithms), prediction of base pair probabilities, bimolecular structure prediction, and prediction of a structure common to two sequences. This contribution describes new extensions to the package, including a library of C++ classes for incorporation into other programs, a user-friendly graphical user interface written in JAVA, and new Unix-style text interfaces. The original graphical user interface for Microsoft Windows is still maintained. Conclusion The extensions to RNAstructure serve to make RNA secondary structure prediction user-friendly. The package is available for download from the Mathews lab homepage at http://rna.urmc.rochester.edu/RNAstructure.html.
Collapse
Affiliation(s)
- Jessica S Reuter
- Department of Biochemistry & Biophysics and Center for RNA Biology, University of Rochester Medical Center, 601 Elmwood Avenue, Box 712, Rochester, NY 14642, USA
| | | |
Collapse
|