1
|
Backofen R, Gorodkin J, Hofacker IL, Stadler PF. Comparative RNA Genomics. Methods Mol Biol 2024; 2802:347-393. [PMID: 38819565 DOI: 10.1007/978-1-0716-3838-5_12] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/01/2024]
Abstract
Over the last quarter of a century it has become clear that RNA is much more than just a boring intermediate in protein expression. Ancient RNAs still appear in the core information metabolism and comprise a surprisingly large component in bacterial gene regulation. A common theme with these types of mostly small RNAs is their reliance of conserved secondary structures. Large-scale sequencing projects, on the other hand, have profoundly changed our understanding of eukaryotic genomes. Pervasively transcribed, they give rise to a plethora of large and evolutionarily extremely flexible non-coding RNAs that exert a vastly diverse array of molecule functions. In this chapter we provide a-necessarily incomplete-overview of the current state of comparative analysis of non-coding RNAs, emphasizing computational approaches as a means to gain a global picture of the modern RNA world.
Collapse
Affiliation(s)
- Rolf Backofen
- Bioinformatics Group, Department of Computer Science, University of Freiburg, Freiburg, Germany
- Center for Non-coding RNA in Technology and Health, University of Copenhagen, Frederiksberg, Denmark
| | - Jan Gorodkin
- Center for Non-coding RNA in Technology and Health, Department of Veterinary and Animal Sciences, University of Copenhagen, Frederiksberg, Denmark
| | - Ivo L Hofacker
- Institute for Theoretical Chemistry, University of Vienna, Wien, Austria
- Bioinformatics and Computational Biology research group, University of Vienna, Vienna, Austria
- Center for Non-coding RNA in Technology and Health, University of Copenhagen, Frederiksberg, Denmark
| | - Peter F Stadler
- Bioinformatics Group, Department of Computer Science, University of Leipzig, Leipzig, Germany.
- Interdisciplinary Center for Bioinformatics, University of Leipzig, Leipzig, Germany.
- Max Planck Institute for Mathematics in the Sciences, Leipzig, Germany.
- Universidad National de Colombia, Bogotá, Colombia.
- Institute for Theoretical Chemistry, University of Vienna, Wien, Austria.
- Center for Non-coding RNA in Technology and Health, University of Copenhagen, Frederiksberg, Denmark.
- Santa Fe Institute, Santa Fe, NM, USA.
| |
Collapse
|
2
|
Abstract
Over the last two decades it has become clear that RNA is much more than just a boring intermediate in protein expression. Ancient RNAs still appear in the core information metabolism and comprise a surprisingly large component in bacterial gene regulation. A common theme with these types of mostly small RNAs is their reliance of conserved secondary structures. Large scale sequencing projects, on the other hand, have profoundly changed our understanding of eukaryotic genomes. Pervasively transcribed, they give rise to a plethora of large and evolutionarily extremely flexible noncoding RNAs that exert a vastly diverse array of molecule functions. In this chapter we provide a-necessarily incomplete-overview of the current state of comparative analysis of noncoding RNAs, emphasizing computational approaches as a means to gain a global picture of the modern RNA world.
Collapse
Affiliation(s)
- Rolf Backofen
- Bioinformatics Group, Department of Computer Science, University of Freiburg, Georges-Köhler-Allee 106, D-79110 Freiburg, Germany.,Center for non-coding RNA in Technology and Health, Department of Veterinary and Animal Sciences, University of Copenhagen, Grønnegårdsvej 3, DK-1870 Frederiksberg C, Denmark
| | - Jan Gorodkin
- Center for non-coding RNA in Technology and Health, Department of Veterinary and Animal Sciences, University of Copenhagen, Grønnegårdsvej 3, DK-1870 Frederiksberg C, Denmark
| | - Ivo L Hofacker
- Center for non-coding RNA in Technology and Health, Department of Veterinary and Animal Sciences, University of Copenhagen, Grønnegårdsvej 3, DK-1870 Frederiksberg C, Denmark.,Institute for Theoretical Chemistry, University of Vienna, Währingerstraße 17, A-1090 Wien, Austria.,Bioinformatics and Computational Biology Research Group, University of Vienna, Währingerstraße 17, A-1090 Vienna, Austria
| | - Peter F Stadler
- Center for non-coding RNA in Technology and Health, Department of Veterinary and Animal Sciences, University of Copenhagen, Grønnegårdsvej 3, DK-1870 Frederiksberg C, Denmark. .,Institute for Theoretical Chemistry, University of Vienna, Währingerstraße 17, A-1090 Wien, Austria. .,Bioinformatics Group, Department of Computer Science, Interdisciplinary Center for Bioinformatics, University of Leipzig, Härtelstraße 16-18, D-04107 Leipzig, Germany. .,Max Planck Institute for Mathematics in the Sciences, Inselstraße 22, D-04103 Leipzig, Germany. .,Fraunhofer Institute for Cell Therapy and Immunology, Perlickstraße 1, D-04103 Leipzig, Germany. .,Santa Fe Institute, 1399 Hyde Park Rd, Santa Fe, NM 87501, USA.
| |
Collapse
|
3
|
Pervouchine DD. IRBIS: a systematic search for conserved complementarity. RNA (NEW YORK, N.Y.) 2014; 20:1519-31. [PMID: 25142064 PMCID: PMC4174434 DOI: 10.1261/rna.045088.114] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/11/2014] [Accepted: 06/26/2014] [Indexed: 05/28/2023]
Abstract
IRBIS is a computational pipeline for detecting conserved complementary regions in unaligned orthologous sequences. Unlike other methods, it follows the "first-fold-then-align" principle in which all possible combinations of complementary k-mers are searched for simultaneous conservation. The novel trimming procedure reduces the size of the search space and improves the performance to the point where large-scale analyses of intra- and intermolecular RNA-RNA interactions become possible. In this article, I provide a rigorous description of the method, benchmarking on simulated and real data, and a set of stringent predictions of intramolecular RNA structure in placental mammals, drosophilids, and nematodes. I discuss two particular cases of long-range RNA structures that are likely to have a causal effect on single- and multiple-exon skipping, one in the mammalian gene Dystonin and the other in the insect gene Ca-α1D. In Dystonin, one of the two complementary boxes contains a binding site of Rbfox protein similar to one recently described in Enah gene. I also report that snoRNAs and long noncoding RNAs (lncRNAs) have a high capacity of base-pairing to introns of protein-coding genes, suggesting possible involvement of these transcripts in splicing regulation. I also find that conserved sequences that occur equally likely on both strands of DNA (e.g., transcription factor binding sites) contribute strongly to the false-discovery rate and, therefore, would confound every such analysis. IRBIS is an open-source software that is available at http://genome.crg.es/~dmitri/irbis/.
Collapse
Affiliation(s)
- Dmitri D Pervouchine
- Centre for Genomic Regulation and UPF, Barcelona 08003, Spain Faculty of Bioengineering and Bioinformatics, Moscow State University, 119992 Moscow, Russia
| |
Collapse
|
4
|
Backofen R, Amman F, Costa F, Findeiß S, Richter AS, Stadler PF. Bioinformatics of prokaryotic RNAs. RNA Biol 2014; 11:470-83. [PMID: 24755880 PMCID: PMC4152356 DOI: 10.4161/rna.28647] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2014] [Revised: 03/17/2014] [Accepted: 03/25/2014] [Indexed: 02/02/2023] Open
Abstract
The genome of most prokaryotes gives rise to surprisingly complex transcriptomes, comprising not only protein-coding mRNAs, often organized as operons, but also harbors dozens or even hundreds of highly structured small regulatory RNAs and unexpectedly large levels of anti-sense transcripts. Comprehensive surveys of prokaryotic transcriptomes and the need to characterize also their non-coding components is heavily dependent on computational methods and workflows, many of which have been developed or at least adapted specifically for the use with bacterial and archaeal data. This review provides an overview on the state-of-the-art of RNA bioinformatics focusing on applications to prokaryotes.
Collapse
Affiliation(s)
- Rolf Backofen
- Bioinformatics Group; Department of Computer Science; University of Freiburg; Georges-Köhler-Allee 106; D-79110 Freiburg, Germany
- Center for non-coding RNA in Technology and Health; University of Copenhagen; Grønnegårdsvej 3; DK-1870 Frederiksberg C, Denmark
| | - Fabian Amman
- Institute for Theoretical Chemistry; University of Vienna; Währingerstraße 17; A-1090 Wien, Austria
- Bioinformatics Group; Department of Computer Science, and Interdisciplinary Center for Bioinformatics; University of Leipzig; Härtelstraße 16-18; D-04107 Leipzig, Germany
| | - Fabrizio Costa
- Bioinformatics Group; Department of Computer Science; University of Freiburg; Georges-Köhler-Allee 106; D-79110 Freiburg, Germany
| | - Sven Findeiß
- Institute for Theoretical Chemistry; University of Vienna; Währingerstraße 17; A-1090 Wien, Austria
- Bioinformatics and Computational Biology Research Group; University of Vienna; Währingerstraße 29; A-1090 Wien, Austria
| | - Andreas S Richter
- Bioinformatics Group; Department of Computer Science; University of Freiburg; Georges-Köhler-Allee 106; D-79110 Freiburg, Germany
- Max Planck Institute of Immunobiology and Epigenetics; Stübeweg 51; D-79108 Freiburg, Germany
| | - Peter F Stadler
- Center for non-coding RNA in Technology and Health; University of Copenhagen; Grønnegårdsvej 3; DK-1870 Frederiksberg C, Denmark
- Institute for Theoretical Chemistry; University of Vienna; Währingerstraße 17; A-1090 Wien, Austria
- Bioinformatics Group; Department of Computer Science, and Interdisciplinary Center for Bioinformatics; University of Leipzig; Härtelstraße 16-18; D-04107 Leipzig, Germany
- Max Planck Institute for Mathematics in the Sciences; Inselstraße 22; D-04103 Leipzig, Germany
- Fraunhofer Institute for Cell Therapy and Immunology – IZI; Perlickstraße 1; D-04103 Leipzig, Germany
- Santa Fe Institute; Santa Fe, NM USA
| |
Collapse
|
5
|
Abstract
We describe different tools and approaches for RNA-RNA interaction prediction. Recognition of ncRNA targets is predominantly governed by two principles, namely the stability of the duplex between the two interacting RNAs and the internal structure of both mRNA and ncRNA. Thus, approaches can be distinguished into different major categories depending on how they consider inter- and intramolecular structure. The first class completely neglects the internal structure and measures only the stability of the duplex. The second class of approaches abstracts from specific intramolecular structures and uses an ensemble-based approach to calculate the effect of internal structure on a putative binding site, thus measuring the accessibility of the binding sites.Since accessibility-based approaches can handle only one continuous interaction site, two addition types of approaches were introduced which predict a joint structure for the interacting RNAs. Since this problem is NP-complete, the approaches can handle only a restricted class of joint structures. The first are co-folding approaches, which predict a joint structure that is nested when the both sequences are concatenated. The last and most complex class of approaches impose only the restriction that they discard zipper-like structures. Finally, we will discuss the use of conservation information in RNA-target prediction.
Collapse
Affiliation(s)
- Rolf Backofen
- Lehrstuhl fur Bioinformatik, Albert-Ludwigs-Universitat, Freiburg, Germany
| |
Collapse
|
6
|
Bindewald E, Shapiro BA. Computational detection of abundant long-range nucleotide covariation in Drosophila genomes. RNA (NEW YORK, N.Y.) 2013; 19:1171-82. [PMID: 23887147 PMCID: PMC3753924 DOI: 10.1261/rna.037630.112] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/07/2012] [Accepted: 06/08/2013] [Indexed: 06/02/2023]
Abstract
Functionally important nucleotide base-pairing often manifests itself in sequence alignments in the form of compensatory base changes (covariation). We developed a novel index-based computational method (CovaRNA) to detect long-range covariation on a genomic scale, as well as another computational method (CovStat) for determining the statistical significance of observed covariation patterns in alignment pairs. Here we present an all-versus-all search for nucleotide covariation in Drosophila genomic alignments. The search is genome wide, with the restriction that only alignments that correspond to euchromatic regions, which consist of at least 10 Drosophila species, are being considered (59% of the euchromatic genome of Drosophila melanogaster). We find that long-range covariations are especially prevalent between exons of mRNAs as well as noncoding RNAs; the majority of the observed covariations appear as not reverse complementary, but as synchronized mutations, which could be due to interactions with common interaction partners or due to the involvement of genomic elements that are antisense of annotated transcripts. The involved genes are enriched for functions related to regionalization as well as neural and developmental processes. These results are computational evidence that RNA-RNA long-range interactions are a widespread phenomenon that is of fundamental importance to a variety of cellular processes.
Collapse
Affiliation(s)
- Eckart Bindewald
- Basic Science Program, SAIC-Frederick, Incorporated, Center for Cancer Research Nanobiology Program, Frederick National Laboratory for Cancer Research, Frederick, Maryland 21702, USA
| | - Bruce A. Shapiro
- Center for Cancer Research Nanobiology Program, National Cancer Institute, Frederick, Maryland 21702, USA
| |
Collapse
|
7
|
Richter AS, Backofen R. Accessibility and conservation: general features of bacterial small RNA-mRNA interactions? RNA Biol 2012; 9:954-65. [PMID: 22767260 PMCID: PMC3495738 DOI: 10.4161/rna.20294] [Citation(s) in RCA: 51] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/26/2023] Open
Abstract
Bacterial small RNAs (sRNAs) are a class of structural RNAs that often regulate mRNA targets via post-transcriptional base pair interactions. We determined features that discriminate functional from non-functional interactions and assessed the influence of these features on genome-wide target predictions. For this purpose, we compiled a set of 71 experimentally verified sRNA–target pairs from Escherichia coli and Salmonella enterica. Furthermore, we collected full-length 5′ untranslated regions by using genome-wide experimentally verified transcription start sites.
Only interaction sites in sRNAs, but not in targets, show significant sequence conservation. In addition to this observation, we found that the base pairing between sRNAs and their targets is not conserved in general across more distantly related species. A closer inspection of RybB and RyhB sRNAs and their targets revealed that the base pairing complementarity is only conserved in a small subset of the targets. In contrast to conservation, accessibility of functional interaction sites is significantly higher in both sRNAs and targets in comparison to non-functional sites. Based on the above observations, we successfully used the following constraints to improve the specificity of genome-wide target predictions: the region of interaction initiation must be located in (1) highly accessible regions in both interaction partners or (2) unstructured conserved sRNA regions derived from reliability profiles of multiple sRNA alignments.
Aligned sequences of homologous sRNAs, functional and non-functional targets, and a sup document with sup tables, figures and references are available at www.bioinf.uni-freiburg.de/Supplements/srna-interact-feat/.
Collapse
Affiliation(s)
- Andreas S Richter
- University of Freiburg, Department of Computer Science, Georges-Köhler-Allee 106, Freiburg 79110, Germany
| | | |
Collapse
|
8
|
Poolsap U, Kato Y, Sato K, Akutsu T. Using binding profiles to predict binding sites of target RNAs. J Bioinform Comput Biol 2012; 9:697-713. [PMID: 22084009 DOI: 10.1142/s0219720011005628] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2010] [Revised: 03/25/2011] [Accepted: 05/24/2011] [Indexed: 11/18/2022]
Abstract
Prediction of RNA-RNA interaction is a key to elucidating possible functions of small non-coding RNAs, and a number of computational methods have been proposed to analyze interacting RNA secondary structures. In this article, we focus on predicting binding sites of target RNAs that are expected to interact with regulatory antisense RNAs in a general form of interaction. For this purpose, we propose bistaRNA, a novel method for predicting multiple binding sites of target RNAs. bistaRNA employs binding profiles that represent scores for hybridized structures, leading to reducing the computational cost for interaction prediction. bistaRNA considers an ensemble of equilibrium interacting structures and seeks to maximize expected accuracy using dynamic programming. Experimental results on real interaction data validate good accuracy and fast computation time of bistaRNA as compared with several competitive methods. Moreover, we aim to find new targets given specific antisense RNAs, which provides interesting insights into antisense RNA regulation. bistaRNA is implemented in C++. The program and Supplementary Material are available at http://rna.naist.jp/program/bistarna/.
Collapse
Affiliation(s)
- Unyanee Poolsap
- Bioinformatics Center, Institute for Chemical Research, Kyoto University, Gokasho, Uji, Kyoto 611-0011, Japan.
| | | | | | | |
Collapse
|
9
|
Seemann SE, Menzel P, Backofen R, Gorodkin J. The PETfold and PETcofold web servers for intra- and intermolecular structures of multiple RNA sequences. Nucleic Acids Res 2011; 39:W107-11. [PMID: 21609960 PMCID: PMC3125731 DOI: 10.1093/nar/gkr248] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023] Open
Abstract
The function of non-coding RNA genes largely depends on their secondary structure and the interaction with other molecules. Thus, an accurate prediction of secondary structure and RNA–RNA interaction is essential for the understanding of biological roles and pathways associated with a specific RNA gene. We present web servers to analyze multiple RNA sequences for common RNA structure and for RNA interaction sites. The web servers are based on the recent PET (Probabilistic Evolutionary and Thermodynamic) models PETfold and PETcofold, but add user friendly features ranging from a graphical layer to interactive usage of the predictors. Additionally, the web servers provide direct access to annotated RNA alignments, such as the Rfam 10.0 database and multiple alignments of 16 vertebrate genomes with human. The web servers are freely available at: http://rth.dk/resources/petfold/
Collapse
Affiliation(s)
- Stefan E Seemann
- Center for Non-coding RNA in Technology and Health, Division of Genetics and Bioinformatics, IBHV, University of Copenhagen, Grønnegårdsvej 3, DK-1870 Frederiksberg, Denmark
| | | | | | | |
Collapse
|
10
|
Tafer H, Amman F, Eggenhofer F, Stadler PF, Hofacker IL. Fast accessibility-based prediction of RNA–RNA interactions. Bioinformatics 2011; 27:1934-40. [DOI: 10.1093/bioinformatics/btr281] [Citation(s) in RCA: 39] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022] Open
|
11
|
Li AX, Marz M, Qin J, Reidys CM. RNA-RNA interaction prediction based on multiple sequence alignments. ACTA ACUST UNITED AC 2010; 27:456-63. [PMID: 21134894 DOI: 10.1093/bioinformatics/btq659] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]
Abstract
MOTIVATION Many computerized methods for RNA-RNA interaction structure prediction have been developed. Recently, O(N(6)) time and O(N(4)) space dynamic programming algorithms have become available that compute the partition function of RNA-RNA interaction complexes. However, few of these methods incorporate the knowledge concerning related sequences, thus relevant evolutionary information is often neglected from the structure determination. Therefore, it is of considerable practical interest to introduce a method taking into consideration both: thermodynamic stability as well as sequence/structure covariation. RESULTS We present the a priori folding algorithm ripalign, whose input consists of two (given) multiple sequence alignments (MSA). ripalign outputs (i) the partition function, (ii) base pairing probabilities, (iii) hybrid probabilities and (iv) a set of Boltzmann-sampled suboptimal structures consisting of canonical joint structures that are compatible to the alignments. Compared to the single sequence-pair folding algorithm rip, ripalign requires negligible additional memory resource but offers much better sensitivity and specificity, once alignments of suitable quality are given. ripalign additionally allows to incorporate structure constraints as input parameters. AVAILABILITY The algorithm described here is implemented in C as part of the rip package.
Collapse
Affiliation(s)
- Andrew X Li
- Tianjin Key Laboratory of Combinatorics, Nankai University Tianjin 300071, People's Republic of China
| | | | | | | |
Collapse
|
12
|
Seemann SE, Richter AS, Gesell T, Backofen R, Gorodkin J. PETcofold: predicting conserved interactions and structures of two multiple alignments of RNA sequences. ACTA ACUST UNITED AC 2010; 27:211-9. [PMID: 21088024 PMCID: PMC3018821 DOI: 10.1093/bioinformatics/btq634] [Citation(s) in RCA: 41] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023]
Abstract
Motivation: Predicting RNA–RNA interactions is essential for determining the function of putative non-coding RNAs. Existing methods for the prediction of interactions are all based on single sequences. Since comparative methods have already been useful in RNA structure determination, we assume that conserved RNA–RNA interactions also imply conserved function. Of these, we further assume that a non-negligible amount of the existing RNA–RNA interactions have also acquired compensating base changes throughout evolution. We implement a method, PETcofold, that can take covariance information in intra-molecular and inter-molecular base pairs into account to predict interactions and secondary structures of two multiple alignments of RNA sequences. Results:PETcofold's ability to predict RNA–RNA interactions was evaluated on a carefully curated dataset of 32 bacterial small RNAs and their targets, which was manually extracted from the literature. For evaluation of both RNA–RNA interaction and structure prediction, we were able to extract only a few high-quality examples: one vertebrate small nucleolar RNA and four bacterial small RNAs. For these we show that the prediction can be improved by our comparative approach. Furthermore, PETcofold was evaluated on controlled data with phylogenetically simulated sequences enriched for covariance patterns at the interaction sites. We observed increased performance with increased amounts of covariance. Availability: The program PETcofold is available as source code and can be downloaded from http://rth.dk/resources/petcofold. Contact:gorodkin@rth.dk; backofen@informatik.uni-freiburg.de Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Stefan E Seemann
- Center for non-coding RNA in Technology and Health, University of Copenhagen, Frederiksberg C, Denmark
| | | | | | | | | |
Collapse
|