1
|
Zhou Y, Pedrielli G, Zhang F, Wu T. Predicting RNA sequence-structure likelihood via structure-aware deep learning. BMC Bioinformatics 2024; 25:316. [PMID: 39350066 PMCID: PMC11443715 DOI: 10.1186/s12859-024-05916-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2024] [Accepted: 08/27/2024] [Indexed: 10/04/2024] Open
Abstract
BACKGROUND The active functionalities of RNA are recognized to be heavily dependent on the structure and sequence. Therefore, a model that can accurately evaluate a design by giving RNA sequence-structure pairs would be a valuable tool for many researchers. Machine learning methods have been explored to develop such tools, showing promising results. However, two key issues remain. Firstly, the performance of machine learning models is affected by the features used to characterize RNA. Currently, there is no consensus on which features are the most effective for characterizing RNA sequence-structure pairs. Secondly, most existing machine learning methods extract features describing entire RNA molecule. We argue that it is essential to define additional features that characterize nucleotides and specific sections of RNA structure to enhance the overall efficacy of the RNA design process. RESULTS We develop two deep learning models for evaluating RNA sequence-secondary structure pairs. The first model, NU-ResNet, uses a convolutional neural network architecture that solves the aforementioned problems by explicitly encoding RNA sequence-structure information into a 3D matrix. Building upon NU-ResNet, our second model, NUMO-ResNet, incorporates additional information derived from the characterizations of RNA, specifically the 2D folding motifs. In this work, we introduce an automated method to extract these motifs based on fundamental secondary structure descriptions. We evaluate the performance of both models on an independent testing dataset. Our proposed models outperform the models from literatures in this independent testing dataset. To assess the robustness of our models, we conduct 10-fold cross validation. To evaluate the generalization ability of NU-ResNet and NUMO-ResNet across different RNA families, we train and test our proposed models in different RNA families. Our proposed models show superior performance compared to the models from literatures when being tested across different independent RNA families. CONCLUSIONS In this study, we propose two deep learning models, NU-ResNet and NUMO-ResNet, to evaluate RNA sequence-secondary structure pairs. These two models expand the field of data-driven approaches for learning RNA. Furthermore, these two models provide the new method to encode RNA sequence-secondary structure pairs.
Collapse
Affiliation(s)
- You Zhou
- School of Computing and Augmented Intelligence, Arizona State University, 699 S Mill Ave, Tempe, AZ, 85281, USA
- ASU-Mayo Center for Innovative Imaging, Arizona State University, 699 S Mill Ave, Tempe, AZ, 85281, USA
| | - Giulia Pedrielli
- School of Computing and Augmented Intelligence, Arizona State University, 699 S Mill Ave, Tempe, AZ, 85281, USA.
- ASU-Mayo Center for Innovative Imaging, Arizona State University, 699 S Mill Ave, Tempe, AZ, 85281, USA.
| | - Fei Zhang
- Department of Chemistry, Rutgers University, 73 Warren St, Newark, NJ, 07102, USA
| | - Teresa Wu
- School of Computing and Augmented Intelligence, Arizona State University, 699 S Mill Ave, Tempe, AZ, 85281, USA
- ASU-Mayo Center for Innovative Imaging, Arizona State University, 699 S Mill Ave, Tempe, AZ, 85281, USA
| |
Collapse
|
2
|
Yao HT, Marchand B, Berkemer SJ, Ponty Y, Will S. Infrared: a declarative tree decomposition-powered framework for bioinformatics. Algorithms Mol Biol 2024; 19:13. [PMID: 38493130 PMCID: PMC10943887 DOI: 10.1186/s13015-024-00258-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2023] [Accepted: 02/13/2024] [Indexed: 03/18/2024] Open
Abstract
MOTIVATION Many bioinformatics problems can be approached as optimization or controlled sampling tasks, and solved exactly and efficiently using Dynamic Programming (DP). However, such exact methods are typically tailored towards specific settings, complex to develop, and hard to implement and adapt to problem variations. METHODS We introduce the Infrared framework to overcome such hindrances for a large class of problems. Its underlying paradigm is tailored toward problems that can be declaratively formalized as sparse feature networks, a generalization of constraint networks. Classic Boolean constraints specify a search space, consisting of putative solutions whose evaluation is performed through a combination of features. Problems are then solved using generic cluster tree elimination algorithms over a tree decomposition of the feature network. Their overall complexities are linear on the number of variables, and only exponential in the treewidth of the feature network. For sparse feature networks, associated with low to moderate treewidths, these algorithms allow to find optimal solutions, or generate controlled samples, with practical empirical efficiency. RESULTS Implementing these methods, the Infrared software allows Python programmers to rapidly develop exact optimization and sampling applications based on a tree decomposition-based efficient processing. Instead of directly coding specialized algorithms, problems are declaratively modeled as sets of variables over finite domains, whose dependencies are captured by constraints and functions. Such models are then automatically solved by generic DP algorithms. To illustrate the applicability of Infrared in bioinformatics and guide new users, we model and discuss variants of bioinformatics applications. We provide reimplementations and extensions of methods for RNA design, RNA sequence-structure alignment, parsimony-driven inference of ancestral traits in phylogenetic trees/networks, and design of coding sequences. Moreover, we demonstrate multidimensional Boltzmann sampling. These applications of the framework-together with our novel results-underline the practical relevance of Infrared. Remarkably, the achieved complexities are typically equivalent to the ones of specialized algorithms and implementations. AVAILABILITY Infrared is available at https://amibio.gitlabpages.inria.fr/Infrared with extensive documentation, including various usage examples and API reference; it can be installed using Conda or from source.
Collapse
Affiliation(s)
- Hua-Ting Yao
- LIX, CNRS UMR 7161, Ecole Polytechnique, Institut Polytechnique de Paris, Palaiseau, France.
- Department of Theoretical Chemistry, University of Vienna, Vienna, Austria.
- School of Computer Science, McGill University, Montreal, Canada.
| | - Bertrand Marchand
- LIX, CNRS UMR 7161, Ecole Polytechnique, Institut Polytechnique de Paris, Palaiseau, France
| | - Sarah J Berkemer
- LIX, CNRS UMR 7161, Ecole Polytechnique, Institut Polytechnique de Paris, Palaiseau, France
- Earth-Life Science Institute, Tokyo Institute of Technology, Tokyo, Japan
| | - Yann Ponty
- LIX, CNRS UMR 7161, Ecole Polytechnique, Institut Polytechnique de Paris, Palaiseau, France
| | - Sebastian Will
- LIX, CNRS UMR 7161, Ecole Polytechnique, Institut Polytechnique de Paris, Palaiseau, France.
| |
Collapse
|
3
|
Yao HT, Ponty Y, Will S. Developing Complex RNA Design Applications in the Infrared Framework. Methods Mol Biol 2024; 2726:285-313. [PMID: 38780736 DOI: 10.1007/978-1-0716-3519-3_12] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/25/2024]
Abstract
Applications in biotechnology and bio-medical research call for effective strategies to design novel RNAs with very specific properties. Such advanced design tasks require support by computational tools but at the same time put high demands on their flexibility and expressivity to model the application-specific requirements. To address such demands, we present the computational framework Infrared. It supports developing advanced customized design tools, which generate RNA sequences with specific properties, often in a few lines of Python code. This text guides the reader in tutorial format through the development of complex design applications. Thanks to the declarative, compositional approach of Infrared, we can describe this development as a step-by-step extension of an elementary design task. Thus, we start with generating sequences that are compatible with a single RNA structure and go all the way to RNA design targeting complex positive and negative design objectives with respect to single or even multiple target structures. Finally, we present a "real-world" application of computational design to create an RNA device for biotechnology: we use Infrared to generate design candidates of an artificial "AND" riboswitch, which activates gene expression in the simultaneous presence of two different small metabolites. In these applications, we exploit that the system can generate, in an efficient (fixed-parameter tractable) way, multiple diverse designs that satisfy a number of constraints and have high quality w.r.t. to an objective (by sampling from a Boltzmann distribution).
Collapse
Affiliation(s)
- Hua-Ting Yao
- LIX, CNRS UMR 7161, Ecole Polytechnique, Institut Polytechnique de Paris, Palaiseau, France
- School of Computer Science, McGill University, Montreal, Canada
- Department of Theoretical Chemistry, University of Vienna, Vienna, Austria
| | - Yann Ponty
- LIX, CNRS UMR 7161, Ecole Polytechnique, Institut Polytechnique de Paris, Palaiseau, France
| | - Sebastian Will
- LIX, CNRS UMR 7161, Ecole Polytechnique, Institut Polytechnique de Paris, Palaiseau, France.
| |
Collapse
|
4
|
Zhou T, Dai N, Li S, Ward M, Mathews DH, Huang L. RNA design via structure-aware multifrontier ensemble optimization. Bioinformatics 2023; 39:i563-i571. [PMID: 37387188 DOI: 10.1093/bioinformatics/btad252] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/01/2023] Open
Abstract
MOTIVATION RNA design is the search for a sequence or set of sequences that will fold to desired structure, also known as the inverse problem of RNA folding. However, the sequences designed by existing algorithms often suffer from low ensemble stability, which worsens for long sequence design. Additionally, for many methods only a small number of sequences satisfying the MFE criterion can be found by each run of design. These drawbacks limit their use cases. RESULTS We propose an innovative optimization paradigm, SAMFEO, which optimizes ensemble objectives (equilibrium probability or ensemble defect) by iterative search and yields a very large number of successfully designed RNA sequences as byproducts. We develop a search method which leverages structure level and ensemble level information at different stages of the optimization: initialization, sampling, mutation, and updating. Our work, while being less complicated than others, is the first algorithm that is able to design thousands of RNA sequences for the puzzles from the Eterna100 benchmark. In addition, our algorithm solves the most Eterna100 puzzles among all the general optimization based methods in our study. The only baseline solving more puzzles than our work is dependent on handcrafted heuristics designed for a specific folding model. Surprisingly, our approach shows superiority on designing long sequences for structures adapted from the database of 16S Ribosomal RNAs. AVAILABILITY AND IMPLEMENTATION Our source code and data used in this article is available at https://github.com/shanry/SAMFEO.
Collapse
Affiliation(s)
- Tianshuo Zhou
- School of Electrical Engineering and Computer Science, Oregon State University, Corvalli OR 97330, United States
| | - Ning Dai
- School of Electrical Engineering and Computer Science, Oregon State University, Corvalli OR 97330, United States
| | - Sizhen Li
- School of Electrical Engineering and Computer Science, Oregon State University, Corvalli OR 97330, United States
| | - Max Ward
- Department of Computer Science and Software Engineering, The University of Western Australia, Perth, Australia
| | - David H Mathews
- Department of Biochemistry and Biophysics, University of Rochester Medical Center, Rochester, NY 14642, United States
- Center for RNA Biology, University of Rochester Medical Center, Rochester, NY 14642, United States
- Department of Biostatistics & Computational Biology, University of Rochester Medical Center, Rochester, NY 14642, United States
| | - Liang Huang
- School of Electrical Engineering and Computer Science, Oregon State University, Corvalli OR 97330, United States
| |
Collapse
|
5
|
Advanced Design of Structural RNAs Using RNARedPrint. Methods Mol Biol 2021. [PMID: 33835434 DOI: 10.1007/978-1-0716-1307-8_1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register]
Abstract
RNA design addresses the need to build novel RNAs, e.g., for biotechnological applications in synthetic biology, equipped with desired functional properties. This chapter describes how to use the software RNARedPrint for the de novo rational design of RNA sequences adopting one or several desired secondary structures. Depending on the application, these structures could represent alternate configurations or kinetic pathways. The software makes such design convenient and sufficiently fast for practical routine, where it even overcomes notorious problems in the application of RNA design, e.g., it maintains realistic GC content.
Collapse
|
6
|
Huang FW, Barrett CL, Reidys CM. The energy-spectrum of bicompatible sequences. Algorithms Mol Biol 2021; 16:7. [PMID: 34074304 PMCID: PMC8167974 DOI: 10.1186/s13015-021-00187-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2020] [Accepted: 05/24/2021] [Indexed: 12/04/2022] Open
Abstract
Background Genotype-phenotype maps provide a meaningful filtration of sequence space and RNA secondary structures are particular such phenotypes. Compatible sequences, which satisfy the base-pairing constraints of a given RNA structure, play an important role in the context of neutral evolution. Sequences that are simultaneously compatible with two given structures (bicompatible sequences), are beacons in phenotypic transitions, induced by erroneously replicating populations of RNA sequences. RNA riboswitches, which are capable of expressing two distinct secondary structures without changing the underlying sequence, are one example of bicompatible sequences in living organisms. Results We present a full loop energy model Boltzmann sampler of bicompatible sequences for pairs of structures. The sequence sampler employs a dynamic programming routine whose time complexity is polynomial when assuming the maximum number of exposed vertices, \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\kappa $$\end{document}κ, is a constant. The parameter \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\kappa $$\end{document}κ depends on the two structures and can be very large. We introduce a novel topological framework encapsulating the relations between loops that sheds light on the understanding of \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\kappa $$\end{document}κ. Based on this framework, we give an algorithm to sample sequences with minimum \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\kappa $$\end{document}κ on a particular topologically classified case as well as giving hints to the solution in the other cases. As a result, we utilize our sequence sampler to study some established riboswitches. Conclusion Our analysis of riboswitch sequences shows that a pair of structures needs to satisfy key properties in order to facilitate phenotypic transitions and that pairs of random structures are unlikely to do so. Our analysis observes a distinct signature of riboswitch sequences, suggesting a new criterion for identifying native sequences and sequences subjected to evolutionary pressure. Our free software is available at: https://github.com/FenixHuang667/Bifold.
Collapse
|
7
|
Retwitzer MD, Reinharz V, Churkin A, Ponty Y, Waldispühl J, Barash D. incaRNAfbinv 2.0: a webserver and software with motif control for fragment-based design of RNAs. Bioinformatics 2020; 36:2920-2922. [PMID: 31971575 DOI: 10.1093/bioinformatics/btaa039] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2019] [Revised: 11/25/2019] [Accepted: 01/15/2020] [Indexed: 11/12/2022] Open
Abstract
SUMMARY RNA design has conceptually evolved from the inverse RNA folding problem. In the classical inverse RNA problem, the user inputs an RNA secondary structure and receives an output RNA sequence that folds into it. Although modern RNA design methods are based on the same principle, a finer control over the resulting sequences is sought. As an important example, a substantial number of non-coding RNA families show high preservation in specific regions, while being more flexible in others and this information should be utilized in the design. By using the additional information, RNA design tools can help solve problems of practical interest in the growing fields of synthetic biology and nanotechnology. incaRNAfbinv 2.0 utilizes a fragment-based approach, enabling a control of specific RNA secondary structure motifs. The new version allows significantly more control over the general RNA shape, and also allows to express specific restrictions over each motif separately, in addition to other advanced features. AVAILABILITY AND IMPLEMENTATION incaRNAfbinv 2.0 is available through a standalone package and a web-server at https://www.cs.bgu.ac.il/incaRNAfbinv. Source code, command-line and GUI wrappers can be found at https://github.com/matandro/RNAsfbinv. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Matan Drory Retwitzer
- Department of Computer Science, Ben Gurion University of the Negev, Beer Sheva 84105, Israel
| | - Vladimir Reinharz
- Department of Computer Science, Université du Québec à Montréal, Montreal, H2X 3Y7, Canada.,Institute for Basic Science, Daejeon 34126, South Korea
| | - Alexander Churkin
- Software Engineering Department, Sami Shamoon College of Engineering, Beer-Sheva 84100, Israel
| | - Yann Ponty
- Laboratoire d'Informatique de l'École Polytechnique (LIX CNRS UMR 7161), Ecole Polytechnique, Palaiseau 91120, France
| | - Jérôme Waldispühl
- School of Computer Science, McGill University Montréal H3A 0E9, Canada
| | - Danny Barash
- Department of Computer Science, Ben Gurion University of the Negev, Beer Sheva 84105, Israel
| |
Collapse
|
8
|
Su C, Weir JD, Zhang F, Yan H, Wu T. ENTRNA: a framework to predict RNA foldability. BMC Bioinformatics 2019; 20:373. [PMID: 31269893 PMCID: PMC6610807 DOI: 10.1186/s12859-019-2948-5] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2018] [Accepted: 06/12/2019] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND RNA molecules play many crucial roles in living systems. The spatial complexity that exists in RNA structures determines their cellular functions. Therefore, understanding RNA folding conformations, in particular, RNA secondary structures, is critical for elucidating biological functions. Existing literature has focused on RNA design as either an RNA structure prediction problem or an RNA inverse folding problem where free energy has played a key role. RESULTS In this research, we propose a Positive-Unlabeled data- driven framework termed ENTRNA. Other than free energy and commonly studied sequence and structural features, we propose a new feature, Sequence Segment Entropy (SSE), to measure the diversity of RNA sequences. ENTRNA is trained and cross-validated using 1024 pseudoknot-free RNAs and 1060 pseudoknotted RNAs from the RNASTRAND database respectively. To test the robustness of the ENTRNA, the models are further blind tested on 206 pseudoknot-free and 93 pseudoknotted RNAs from the PDB database. For pseudoknot-free RNAs, ENTRNA has 86.5% sensitivity on the training dataset and 80.6% sensitivity on the testing dataset. For pseudoknotted RNAs, ENTRNA shows 81.5% sensitivity on the training dataset and 71.0% on the testing dataset. To test the applicability of ENTRNA to long structural-complex RNA, we collect 5 laboratory synthetic RNAs ranging from 1618 to 1790 nucleotides. ENTRNA is able to predict the foldability of 4 RNAs. CONCLUSION In this article, we reformulate the RNA design problem as a foldability prediction problem which is to predict the likelihood of the co-existence of a sequence-structure pair. This new construct has the potential for both RNA structure prediction and the inverse folding problem. In addition, this new construct enables us to explore data-driven approaches in RNA research.
Collapse
Affiliation(s)
- Congzhe Su
- School of Computing, Informatics, Decision Systems Engineering, Arizona State University, Tempe, AZ 85281 USA
| | - Jeffery D. Weir
- Department of Operational Sciences, Graduate School of Engineering and Management, Air Force Institute of Technology, Wright-Patterson AFB, Dayton, OH 45433 USA
| | - Fei Zhang
- Biodesign Center for Molecular Design and Biomimetics, The Biodesign Institute & School of Molecular Sciences, Arizona State University, Tempe, AZ 85281 USA
| | - Hao Yan
- Biodesign Center for Molecular Design and Biomimetics, The Biodesign Institute & School of Molecular Sciences, Arizona State University, Tempe, AZ 85281 USA
| | - Teresa Wu
- School of Computing, Informatics, Decision Systems Engineering, Arizona State University, Tempe, AZ 85281 USA
| |
Collapse
|
9
|
Evolving methods for rational de novo design of functional RNA molecules. Methods 2019; 161:54-63. [PMID: 31059832 DOI: 10.1016/j.ymeth.2019.04.022] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2019] [Revised: 04/26/2019] [Accepted: 04/29/2019] [Indexed: 12/16/2022] Open
Abstract
Artificial RNA molecules with novel functionality have many applications in synthetic biology, pharmacy and white biotechnology. The de novo design of such devices using computational methods and prediction tools is a resource-efficient alternative to experimental screening and selection pipelines. In this review, we describe methods common to many such computational approaches, thoroughly dissect these methods and highlight open questions for the individual steps. Initially, it is essential to investigate the biological target system, the regulatory mechanism that will be exploited, as well as the desired components in order to define design objectives. Subsequent computational design is needed to combine the selected components and to obtain novel functionality. This process can usually be split into constrained sequence sampling, the formulation of an optimization problem and an in silico analysis to narrow down the number of candidates with respect to secondary goals. Finally, experimental analysis is important to check whether the defined design objectives are indeed met in the target environment and detailed characterization experiments should be performed to improve the mechanistic models and detect missing design requirements.
Collapse
|
10
|
Hammer S, Wang W, Will S, Ponty Y. Fixed-parameter tractable sampling for RNA design with multiple target structures. BMC Bioinformatics 2019; 20:209. [PMID: 31023239 PMCID: PMC6482512 DOI: 10.1186/s12859-019-2784-7] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2018] [Accepted: 03/28/2019] [Indexed: 01/09/2023] Open
Abstract
Background The design of multi-stable RNA molecules has important applications in biology, medicine, and biotechnology. Synthetic design approaches profit strongly from effective in-silico methods, which substantially reduce the need for costly wet-lab experiments. Results We devise a novel approach to a central ingredient of most in-silico design methods: the generation of sequences that fold well into multiple target structures. Based on constraint networks, our approach supports generic Boltzmann-weighted sampling, which enables the positive design of RNA sequences with specific free energies (for each of multiple, possibly pseudoknotted, target structures) and GC-content. Moreover, we study general properties of our approach empirically and generate biologically relevant multi-target Boltzmann-weighted designs for an established design benchmark. Our results demonstrate the efficacy and feasibility of the method in practice as well as the benefits of Boltzmann sampling over the previously best multi-target sampling strategy—even for the case of negative design of multi-stable RNAs. Besides empirically studies, we finally justify the algorithmic details due to a fundamental theoretic result about multi-stable RNA design, namely the #P-hardness of the counting of designs. Conclusion introduces a novel, flexible, and effective approach to multi-target RNA design, which promises broad applicability and extensibility. Our free software is available at: https://github.com/yannponty/RNARedPrint
Supplementary data are available online. Electronic supplementary material The online version of this article (10.1186/s12859-019-2784-7) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Stefan Hammer
- Dept. Computer Science, and Interdisciplinary Center for Bioinformatics, Univ. Leipzig, Härtelstr. 16-18, Leipzig, D-04107, Germany.,Dept. Theoretical Chemistry, Univ. Vienna, Währingerstr. 17, Wien, A-1090, Austria.,Bioinformatics and Computational Biology Research Group, Univ. Vienna, Währingerstr. 17, Wien, A-1090, Austria
| | - Wei Wang
- CNRS UMR 7161 LIX, Ecole Polytechnique, Bat. Alan Turing, Palaiseau, 91120, France
| | - Sebastian Will
- Dept. Theoretical Chemistry, Univ. Vienna, Währingerstr. 17, Wien, A-1090, Austria. .,Bioinformatics and Computational Biology Research Group, Univ. Vienna, Währingerstr. 17, Wien, A-1090, Austria.
| | - Yann Ponty
- CNRS UMR 7161 LIX, Ecole Polytechnique, Bat. Alan Turing, Palaiseau, 91120, France.
| |
Collapse
|
11
|
Hammer S, Tschiatschek B, Flamm C, Hofacker IL, Findeiß S. RNAblueprint: flexible multiple target nucleic acid sequence design. Bioinformatics 2018; 33:2850-2858. [PMID: 28449031 PMCID: PMC5870862 DOI: 10.1093/bioinformatics/btx263] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2016] [Accepted: 04/21/2017] [Indexed: 01/06/2023] Open
Abstract
Motivation Realizing the value of synthetic biology in biotechnology and medicine requires the design of molecules with specialized functions. Due to its close structure to function relationship, and the availability of good structure prediction methods and energy models, RNA is perfectly suited to be synthetically engineered with predefined properties. However, currently available RNA design tools cannot be easily adapted to accommodate new design specifications. Furthermore, complicated sampling and optimization methods are often developed to suit a specific RNA design goal, adding to their inflexibility. Results We developed a C ++ library implementing a graph coloring approach to stochastically sample sequences compatible with structural and sequence constraints from the typically very large solution space. The approach allows to specify and explore the solution space in a well defined way. Our library also guarantees uniform sampling, which makes optimization runs performant by not only avoiding re-evaluation of already found solutions, but also by raising the probability of finding better solutions for long optimization runs. We show that our software can be combined with any other software package to allow diverse RNA design applications. Scripting interfaces allow the easy adaption of existing code to accommodate new scenarios, making the whole design process very flexible. We implemented example design approaches written in Python to demonstrate these advantages. Availability and implementation RNAblueprint, Python implementations and benchmark datasets are available at github: https://github.com/ViennaRNA. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Stefan Hammer
- Faculty of Chemistry, Department of Theoretical Chemistry.,Faculty of Computer Science, Research Group Bioinformatics and Computational Biology
| | - Birgit Tschiatschek
- Faculty of Computer Science, Research Group Bioinformatics and Computational Biology
| | - Christoph Flamm
- Faculty of Chemistry, Department of Theoretical Chemistry.,Research Network Chemistry Meets Microbiology, University of Vienna, 1090 Vienna, Austria
| | - Ivo L Hofacker
- Faculty of Chemistry, Department of Theoretical Chemistry.,Faculty of Computer Science, Research Group Bioinformatics and Computational Biology.,Center for Non-Coding RNA in Technology and Health, University of Copenhagen, Copenhagen DK-1870, Denmark
| | - Sven Findeiß
- Faculty of Chemistry, Department of Theoretical Chemistry.,Faculty of Computer Science, Research Group Bioinformatics and Computational Biology
| |
Collapse
|
12
|
Lotfi M, Zare-Mirakabad F, Montaseri S. RNA design using simulated SHAPE data. Genes Genet Syst 2018; 92:257-265. [PMID: 28757510 DOI: 10.1266/ggs.16-00067] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022] Open
Abstract
It has long been established that in addition to being involved in protein translation, RNA plays essential roles in numerous other cellular processes, including gene regulation and DNA replication. Such roles are known to be dictated by higher-order structures of RNA molecules. It is therefore of prime importance to find an RNA sequence that can fold to acquire a particular function that is desirable for use in pharmaceuticals and basic research. The challenge of finding an RNA sequence for a given structure is known as the RNA design problem. Although there are several algorithms to solve this problem, they mainly consider hard constraints, such as minimum free energy, to evaluate the predicted sequences. Recently, SHAPE data has emerged as a new soft constraint for RNA secondary structure prediction. To take advantage of this new experimental constraint, we report here a new method for accurate design of RNA sequences based on their secondary structures using SHAPE data as pseudo-free energy. We then compare our algorithm with four others: INFO-RNA, ERD, MODENA and RNAifold 2.0. Our algorithm precisely predicts 26 out of 29 new sequences for the structures extracted from the Rfam dataset, while the other four algorithms predict no more than 22 out of 29. The proposed algorithm is comparable to the above algorithms on RNA-SSD datasets, where they can predict up to 33 appropriate sequences for RNA secondary structures out of 34.
Collapse
Affiliation(s)
- Mohadeseh Lotfi
- Faculty of Mathematics and Computer Science, Amirkabir University of Technology
| | | | - Soheila Montaseri
- School of Mathematics, Statistics and Computer Science, College of Science, Enghelab Avenue, University of Tehran
| |
Collapse
|
13
|
Churkin A, Retwitzer MD, Reinharz V, Ponty Y, Waldispühl J, Barash D. Design of RNAs: comparing programs for inverse RNA folding. Brief Bioinform 2018; 19:350-358. [PMID: 28049135 PMCID: PMC6018860 DOI: 10.1093/bib/bbw120] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022] Open
Abstract
Computational programs for predicting RNA sequences with desired folding properties have been extensively developed and expanded in the past several years. Given a secondary structure, these programs aim to predict sequences that fold into a target minimum free energy secondary structure, while considering various constraints. This procedure is called inverse RNA folding. Inverse RNA folding has been traditionally used to design optimized RNAs with favorable properties, an application that is expected to grow considerably in the future in light of advances in the expanding new fields of synthetic biology and RNA nanostructures. Moreover, it was recently demonstrated that inverse RNA folding can successfully be used as a valuable preprocessing step in computational detection of novel noncoding RNAs. This review describes the most popular freeware programs that have been developed for such purposes, starting from RNAinverse that was devised when formulating the inverse RNA folding problem. The most recently published ones that consider RNA secondary structure as input are antaRNA, RNAiFold and incaRNAfbinv, each having different features that could be beneficial to specific biological problems in practice. The various programs also use distinct approaches, ranging from ant colony optimization to constraint programming, in addition to adaptive walk, simulated annealing and Boltzmann sampling. This review compares between the various programs and provides a simple description of the various possibilities that would benefit practitioners in selecting the most suitable program. It is geared for specific tasks requiring RNA design based on input secondary structure, with an outlook toward the future of RNA design programs.
Collapse
Affiliation(s)
- Alexander Churkin
- Shamoon College of Engineering and Physics Department at Ben-Gurion University, Beer-Sheva, Israel
| | | | - Vladimir Reinharz
- Department of Computer Science, Ben-Gurion University, Beer-Sheva, Israel
- School of Computer Science, McGill University, Montréal QC, Canada
| | - Yann Ponty
- Laboratoire d’informatique, École Polytechnique, Palaiseau, France
| | | | - Danny Barash
- Department of Computer Science, Ben-Gurion University, Beer-Sheva, Israel
| |
Collapse
|
14
|
Yang X, Yoshizoe K, Taneda A, Tsuda K. RNA inverse folding using Monte Carlo tree search. BMC Bioinformatics 2017; 18:468. [PMID: 29110632 PMCID: PMC5674771 DOI: 10.1186/s12859-017-1882-7] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2017] [Accepted: 10/26/2017] [Indexed: 11/10/2022] Open
Abstract
Background Artificially synthesized RNA molecules provide important ways for creating a variety of novel functional molecules. State-of-the-art RNA inverse folding algorithms can design simple and short RNA sequences of specific GC content, that fold into the target RNA structure. However, their performance is not satisfactory in complicated cases. Result We present a new inverse folding algorithm called MCTS-RNA, which uses Monte Carlo tree search (MCTS), a technique that has shown exceptional performance in Computer Go recently, to represent and discover the essential part of the sequence space. To obtain high accuracy, initial sequences generated by MCTS are further improved by a series of local updates. Our algorithm has an ability to control the GC content precisely and can deal with pseudoknot structures. Using common benchmark datasets for evaluation, MCTS-RNA showed a lot of promise as a standard method of RNA inverse folding. Conclusion MCTS-RNA is available at https://github.com/tsudalab/MCTS-RNA. Electronic supplementary material The online version of this article (doi:10.1186/s12859-017-1882-7) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Xiufeng Yang
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, 5-1-5 Kashiwanoha, Kashiwa, 277-8561, Japan
| | - Kazuki Yoshizoe
- RIKEN Center for Advanced Intelligence Project, 1-4-1 Nihombashi Chuo-ku, Tokyo, 103-0027, Japan
| | - Akito Taneda
- Graduate School of Science and Technology, Hirosaki University, 3 Bunkyo-cho, Hirosaki, 036-8561, Japan
| | - Koji Tsuda
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, 5-1-5 Kashiwanoha, Kashiwa, 277-8561, Japan. .,Center for Materials Research by Information Integration, National Institute for Materials Science, 1-2-1 Sengen, Tsukuba, 305-0047, Japan. .,RIKEN Center for Advanced Intelligence Project, 1-4-1 Nihombashi Chuo-ku, Tokyo, 103-0027, Japan.
| |
Collapse
|
15
|
Huang F, Reidys C, Rezazadegan R. Fatgraph models of RNA structure. COMPUTATIONAL AND MATHEMATICAL BIOPHYSICS 2017. [DOI: 10.1515/mlbmb-2017-0001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
Abstract In this review paper we discuss fatgraphs as a conceptual framework for RNA structures. We discuss various notions of coarse-grained RNA structures and relate them to fatgraphs.We motivate and discuss the main intuition behind the fatgraph model and showcase its applicability to canonical as well as noncanonical base pairs. Recent discoveries regarding novel recursions of pseudoknotted (pk) configurations as well as their translation into context-free grammars for pk-structures are discussed. This is shown to allow for extending the concept of partition functions of sequences w.r.t. a fixed structure having non-crossing arcs to pk-structures. We discuss minimum free energy folding of pk-structures and combine these above results outlining how to obtain an inverse folding algorithm for PK structures.
Collapse
Affiliation(s)
- Fenix Huang
- 1Biocomplexity Institute of Virginia Tech, 1015 Life Science Circle, VA 24060, Blacksburg, United States of America
| | - Christian Reidys
- 1Biocomplexity Institute of Virginia Tech, 1015 Life Science Circle, VA 24060, Blacksburg, United States of America
| | - Reza Rezazadegan
- 2Biocomplexity Institute of Virginia Tech, 1015 Life Science Circle, VA 24060, Blacksburg, U.S.A., United States of America
| |
Collapse
|
16
|
Tremblay-Savard O, Reinharz V, Waldispühl J. Reconstruction of ancestral RNA sequences under multiple structural constraints. BMC Genomics 2016; 17:862. [PMID: 28185557 PMCID: PMC5123390 DOI: 10.1186/s12864-016-3105-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Secondary structures form the scaffold of multiple sequence alignment of non-coding RNA (ncRNA) families. An accurate reconstruction of ancestral ncRNAs must use this structural signal. However, the inference of ancestors of a single ncRNA family with a single consensus structure may bias the results towards sequences with high affinity to this structure, which are far from the true ancestors. METHODS In this paper, we introduce achARNement, a maximum parsimony approach that, given two alignments of homologous ncRNA families with consensus secondary structures and a phylogenetic tree, simultaneously calculates ancestral RNA sequences for these two families. RESULTS We test our methodology on simulated data sets, and show that achARNement outperforms classical maximum parsimony approaches in terms of accuracy, but also reduces by several orders of magnitude the number of candidate sequences. To conclude this study, we apply our algorithms on the Glm clan and the FinP-traJ clan from the Rfam database. CONCLUSIONS Our results show that our methods reconstruct small sets of high-quality candidate ancestors with better agreement to the two target structures than with classical approaches. Our program is freely available at: http://csb.cs.mcgill.ca/acharnement .
Collapse
Affiliation(s)
- Olivier Tremblay-Savard
- School of Computer Science, McGill University, Montreal, H3A 0E9, Canada.,Department of Computer Science, University of Manitoba, Winnipeg, R3T 2N2, Canada
| | - Vladimir Reinharz
- School of Computer Science, McGill University, Montreal, H3A 0E9, Canada
| | - Jérôme Waldispühl
- School of Computer Science, McGill University, Montreal, H3A 0E9, Canada.
| |
Collapse
|
17
|
Garcia-Martin JA, Bayegan AH, Dotu I, Clote P. RNAdualPF: software to compute the dual partition function with sample applications in molecular evolution theory. BMC Bioinformatics 2016; 17:424. [PMID: 27756204 PMCID: PMC5069997 DOI: 10.1186/s12859-016-1280-6] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2016] [Accepted: 09/26/2016] [Indexed: 12/01/2022] Open
Abstract
Background RNA inverse folding is the problem of finding one or more sequences that fold into a user-specified target structure s0, i.e. whose minimum free energy secondary structure is identical to the target s0. Here we consider the ensemble of all RNA sequences that have low free energy with respect to a given target s0. Results We introduce the program RNAdualPF, which computes the dual partition functionZ∗, defined as the sum of Boltzmann factors exp(−E(a,s0)/RT) of all RNA nucleotide sequences a compatible with target structure s0. Using RNAdualPF, we efficiently sample RNA sequences that approximately fold into s0, where additionally the user can specify IUPAC sequence constraints at certain positions, and whether to include dangles (energy terms for stacked, single-stranded nucleotides). Moreover, since we also compute the dual partition functionZ∗(k) over all sequences having GC-content k, the user can require that all sampled sequences have a precise, specified GC-content. Using Z∗, we compute the dual expected energy 〈E∗〉, and use it to show that natural RNAs from the Rfam 12.0 database have higher minimum free energy than expected, thus suggesting that functional RNAs are under evolutionary pressure to be only marginally thermodynamically stable. We show that C. elegans precursor microRNA (pre-miRNA) is significantly non-robust with respect to mutations, by comparing the robustness of each wild type pre-miRNA sequence with 2000 [resp. 500] sequences of the same GC-content generated by RNAdualPF, which approximately [resp. exactly] fold into the wild type target structure. We confirm and strengthen earlier findings that precursor microRNAs and bacterial small noncoding RNAs display plasticity, a measure of structural diversity. Conclusion We describe RNAdualPF, which rapidly computes the dual partition functionZ∗ and samples sequences having low energy with respect to a target structure, allowing sequence constraints and specified GC-content. Using different inverse folding software, another group had earlier shown that pre-miRNA is mutationally robust, even controlling for compositional bias. Our opposite conclusion suggests a cautionary note that computationally based insights into molecular evolution may heavily depend on the software used. C/C++-software for RNAdualPF is available at http://bioinformatics.bc.edu/clotelab/RNAdualPF. Electronic supplementary material The online version of this article (doi:10.1186/s12859-016-1280-6) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Juan Antonio Garcia-Martin
- Biology Department, Boston College, 140 Commonwealth Avenue, Chestnut Hill, 02467, MA, USA.,Present Address: Systems Biology Program Centro Nacional de Biotecnología Consejo Superior de Investigaciones Científicas (CSIC) C/ Darwin 3, Madrid, 28049, Spain
| | - Amir H Bayegan
- Biology Department, Boston College, 140 Commonwealth Avenue, Chestnut Hill, 02467, MA, USA
| | - Ivan Dotu
- Research Programme on Biomedical Informatics (GRIB), Department of Experimental and Health Sciences, Universitat Pompeu Fabra, IMIM (Hospital del Mar Medical Research Institute), Dr. Aiguader, 88, Barcelona, Spain
| | - Peter Clote
- Biology Department, Boston College, 140 Commonwealth Avenue, Chestnut Hill, 02467, MA, USA.
| |
Collapse
|
18
|
Zandi K, Butler G, Kharma N. An Adaptive Defect Weighted Sampling Algorithm to Design Pseudoknotted RNA Secondary Structures. Front Genet 2016; 7:129. [PMID: 27499762 PMCID: PMC4956659 DOI: 10.3389/fgene.2016.00129] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2016] [Accepted: 07/06/2016] [Indexed: 01/18/2023] Open
Abstract
Computational design of RNA sequences that fold into targeted secondary structures has many applications in biomedicine, nanotechnology and synthetic biology. An RNA molecule is made of different types of secondary structure elements and an important RNA element named pseudoknot plays a key role in stabilizing the functional form of the molecule. However, due to the computational complexities associated with characterizing pseudoknotted RNA structures, most of the existing RNA sequence designer algorithms generally ignore this important structural element and therefore limit their applications. In this paper we present a new algorithm to design RNA sequences for pseudoknotted secondary structures. We use NUPACK as the folding algorithm to compute the equilibrium characteristics of the pseudoknotted RNAs, and describe a new adaptive defect weighted sampling algorithm named Enzymer to design low ensemble defect RNA sequences for targeted secondary structures including pseudoknots. We used a biological data set of 201 pseudoknotted structures from the Pseudobase library to benchmark the performance of our algorithm. We compared the quality characteristics of the RNA sequences we designed by Enzymer with the results obtained from the state of the art MODENA and antaRNA. Our results show our method succeeds more frequently than MODENA and antaRNA do, and generates sequences that have lower ensemble defect, lower probability defect and higher thermostability. Finally by using Enzymer and by constraining the design to a naturally occurring and highly conserved Hammerhead motif, we designed 8 sequences for a pseudoknotted cis-acting Hammerhead ribozyme. Enzymer is available for download at https://bitbucket.org/casraz/enzymer.
Collapse
Affiliation(s)
- Kasra Zandi
- Computer Science Department, Concordia UniversityMontreal, QC, Canada
| | - Gregory Butler
- Computer Science Department, Concordia UniversityMontreal, QC, Canada
- Centre for Structural and Functional Genomics, Concordia UniversityMontreal, QC, Canada
| | - Nawwaf Kharma
- Centre for Structural and Functional Genomics, Concordia UniversityMontreal, QC, Canada
- Electrical and Computer Engineering Department, Concordia UniversityMontreal, QC, Canada
| |
Collapse
|
19
|
Drory Retwitzer M, Reinharz V, Ponty Y, Waldispühl J, Barash D. incaRNAfbinv: a web server for the fragment-based design of RNA sequences. Nucleic Acids Res 2016; 44:W308-14. [PMID: 27185893 PMCID: PMC5741205 DOI: 10.1093/nar/gkw440] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2016] [Accepted: 05/06/2016] [Indexed: 01/02/2023] Open
Abstract
In recent years, new methods for computational RNA design have been developed and applied to various problems in synthetic biology and nanotechnology. Lately, there is considerable interest in incorporating essential biological information when solving the inverse RNA folding problem. Correspondingly, RNAfbinv aims at including biologically meaningful constraints and is the only program to-date that performs a fragment-based design of RNA sequences. In doing so it allows the design of sequences that do not necessarily exactly fold into the target, as long as the overall coarse-grained tree graph shape is preserved. Augmented by the weighted sampling algorithm of incaRNAtion, our web server called incaRNAfbinv implements the method devised in RNAfbinv and offers an interactive environment for the inverse folding of RNA using a fragment-based design approach. It takes as input: a target RNA secondary structure; optional sequence and motif constraints; optional target minimum free energy, neutrality and GC content. In addition to the design of synthetic regulatory sequences, it can be used as a pre-processing step for the detection of novel natural occurring RNAs. The two complementary methodologies RNAfbinv and incaRNAtion are merged together and fully implemented in our web server incaRNAfbinv, available at http://www.cs.bgu.ac.il/incaRNAfbinv.
Collapse
Affiliation(s)
| | - Vladimir Reinharz
- School of Computer Science & McGill Centre for Bioinformatics, McGill University, Montréal, QC H3A 0E9, Canada
| | - Yann Ponty
- Laboratoire d'Informatique (LIX)-CNRS UMR 7161, École Polytechnique, 91128 Palaiseau, France AMIB team/project, INRIA Saclay, Bâtiment Alan Turing, 91128 Palaiseau, France
| | - Jérôme Waldispühl
- School of Computer Science & McGill Centre for Bioinformatics, McGill University, Montréal, QC H3A 0E9, Canada
| | - Danny Barash
- Department of Computer Science, Ben-Gurion University, Beer-Sheva 84105, Israel
| |
Collapse
|
20
|
Taneda A. Multi-objective optimization for RNA design with multiple target secondary structures. BMC Bioinformatics 2015; 16:280. [PMID: 26335276 PMCID: PMC4559319 DOI: 10.1186/s12859-015-0706-x] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2014] [Accepted: 08/17/2015] [Indexed: 12/24/2022] Open
Abstract
Background RNAs are attractive molecules as the biological parts for synthetic biology. In particular, the ability of conformational changes, which can be encoded in designer RNAs, enables us to create multistable molecular switches that function in biological circuits. Although various algorithms for designing such RNA switches have been proposed, the previous algorithms optimize the RNA sequences against the weighted sum of objective functions, where empirical weights among objective functions are used. In addition, an RNA design algorithm for multiple pseudoknot targets is currently not available. Results We developed a novel computational tool for automatically designing RNA sequences which fold into multiple target secondary structures. Our algorithm designs RNA sequences based on multi-objective genetic algorithm, by which we can explore the RNA sequences having good objective function values without empirical weight parameters among the objective functions. Our algorithm has great flexibility by virtue of this weight-free nature. We benchmarked our multi-target RNA design algorithm with the datasets of two, three, and four target structures and found that our algorithm shows better or comparable design performances compared with the previous algorithms, RNAdesign and Frnakenstein. In addition to the benchmarks with pseudoknot-free datasets, we benchmarked MODENA with two-target pseudoknot datasets and found that MODENA can design the RNAs which have the target pseudoknotted secondary structures whose free energies are close to the lowest free energy. Moreover, we applied our algorithm to a ribozyme-based ON-switch which takes a ribozyme-inactive secondary structure when the theophylline aptamer structure is assumed. Conclusions Currently, MODENA is the only RNA design software which can be applied to multiple pseudoknot targets. Successful design results for the multiple targets and an RNA device indicate usefulness of our multi-objective RNA design algorithm. Electronic supplementary material The online version of this article (doi:10.1186/s12859-015-0706-x) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Akito Taneda
- Graduate School of Science and Technology, Hirosaki University, 3 Bunkyo-cho, Hirosaki, Aomori, Japan.
| |
Collapse
|
21
|
Kleinkauf R, Mann M, Backofen R. antaRNA: ant colony-based RNA sequence design. Bioinformatics 2015; 31:3114-21. [PMID: 26023105 PMCID: PMC4576691 DOI: 10.1093/bioinformatics/btv319] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2015] [Accepted: 05/18/2015] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION RNA sequence design is studied at least as long as the classical folding problem. Although for the latter the functional fold of an RNA molecule is to be found ,: inverse folding tries to identify RNA sequences that fold into a function-specific target structure. In combination with RNA-based biotechnology and synthetic biology ,: reliable RNA sequence design becomes a crucial step to generate novel biochemical components. RESULTS In this article ,: the computational tool antaRNA is presented. It is capable of compiling RNA sequences for a given structure that comply in addition with an adjustable full range objective GC-content distribution ,: specific sequence constraints and additional fuzzy structure constraints. antaRNA applies ant colony optimization meta-heuristics and its superior performance is shown on a biological datasets. AVAILABILITY AND IMPLEMENTATION http://www.bioinf.uni-freiburg.de/Software/antaRNA CONTACT: backofen@informatik.uni-freiburg.de SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Robert Kleinkauf
- Bioinformatics Group, Department of Computer Science, University of Freiburg, Georges-Köhler-Allee 106, 79110 Freiburg, Germany
| | - Martin Mann
- Bioinformatics Group, Department of Computer Science, University of Freiburg, Georges-Köhler-Allee 106, 79110 Freiburg, Germany
| | - Rolf Backofen
- Bioinformatics Group, Department of Computer Science, University of Freiburg, Georges-Köhler-Allee 106, 79110 Freiburg, Germany, Center for Biological Signaling Studies (BIOSS), University of Freiburg, Germany, Center for Biological Systems Analysis (ZBSA), University of Freiburg, Germany and Center for non-coding RNA in Technology and Health, University of Copenhagen, Grønnegårdsvej 3, DK-1870 Frederiksberg C, Denmark
| |
Collapse
|
22
|
Garcia-Martin JA, Dotu I, Clote P. RNAiFold 2.0: a web server and software to design custom and Rfam-based RNA molecules. Nucleic Acids Res 2015; 43:W513-21. [PMID: 26019176 PMCID: PMC4489274 DOI: 10.1093/nar/gkv460] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2015] [Accepted: 04/27/2015] [Indexed: 12/25/2022] Open
Abstract
Several algorithms for RNA inverse folding have been used to design synthetic riboswitches, ribozymes and thermoswitches, whose activity has been experimentally validated. The RNAiFold software is unique among approaches for inverse folding in that (exhaustive) constraint programming is used instead of heuristic methods. For that reason, RNAiFold can generate all sequences that fold into the target structure or determine that there is no solution. RNAiFold 2.0 is a complete overhaul of RNAiFold 1.0, rewritten from the now defunct COMET language to C++. The new code properly extends the capabilities of its predecessor by providing a user-friendly pipeline to design synthetic constructs having the functionality of given Rfam families. In addition, the new software supports amino acid constraints, even for proteins translated in different reading frames from overlapping coding sequences; moreover, structure compatibility/incompatibility constraints have been expanded. With these features, RNAiFold 2.0 allows the user to design single RNA molecules as well as hybridization complexes of two RNA molecules. Availability: the web server, source code and linux binaries are publicly accessible at http://bioinformatics.bc.edu/clotelab/RNAiFold2.0.
Collapse
Affiliation(s)
| | - Ivan Dotu
- Research Programme on Biomedical Informatics (GRIB), Department of Experimental and Health Sciences, Universitat Pompeu Fabra, IMIM (Hospital del Mar Medical Research Institute); C/Dr. Aiguader 88, Barcelona E-08003, Spain
| | - Peter Clote
- Biology Department, Boston College, 140 Commonwealth Avenue, Chestnut Hill, MA 02467, USA
| |
Collapse
|
23
|
Esmaili-Taheri A, Ganjtabesh M. ERD: a fast and reliable tool for RNA design including constraints. BMC Bioinformatics 2015; 16:20. [PMID: 25626878 PMCID: PMC4384295 DOI: 10.1186/s12859-014-0444-5] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2014] [Accepted: 11/19/2014] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The function of an RNA in cellular processes is directly related to its structure. The free energy of RNA structure in another important key to its function as only some structures with a specific level of free energy can take part in cellular reactions. Therefore, to perform a specific function, a particular RNA structure with specific level of free energy is required. For a given RNA structure, the goal of the RNA design problem is to design an RNA sequence that folds into the given structure. To mimic the biological features of RNA sequences and structures, some sequence and energy constraints should be considered in designing RNA. Although the level of free energy is important, it is not considered in the available approaches for RNA design problem. RESULTS In this paper, we present a new version of our evolutionary algorithm for RNA design problem, entitled ERD, and extend it to handle some sequence and energy constraints. In the sequence constraints, one can restrict sequence positions to a fixed nucleotide or to a subset of nucleotides. As for the energy constraint, one can specify an interval for the free energy ranges of the designed sequences. We compare our algorithm with INFO-RNA, MODENA, NUPACK, and RNAiFold approaches for some artificial and natural RNA secondary structures and constraints. CONCLUSIONS The results indicate that our algorithm outperforms the other mentioned approaches in terms of accuracy, speedup, divergency, nucleotides distribution, and similarity to the natural RNA sequences. Particularly, the designed RNA sequences in our method are much more reliable and similar to the natural counterparts. The generated sequences are more diverse and they have closer nucleotides distribution to the natural one. The ERD tool and web server are freely available at http://mostafa.ut.ac.ir/corna/erd-cons/ .
Collapse
Affiliation(s)
- Ali Esmaili-Taheri
- Department of Computer Science, School of Mathematics, Statistics, and Computer Science, College of Science, University of Tehran, Tehran, Iran.
| | - Mohammad Ganjtabesh
- Department of Computer Science, School of Mathematics, Statistics, and Computer Science, College of Science, University of Tehran, Tehran, Iran. .,Laboratoire d'Informatique (LIX), Ecole Polytechnique, Palaiseau CEDEX, 91128, France.
| |
Collapse
|
24
|
Abstract
In this chapter, we review both computational and experimental aspects of de novo RNA sequence design. We give an overview of currently available design software and their limitations, and discuss the necessary setup to experimentally validate proper function in vitro and in vivo. We focus on transcription-regulating riboswitches, a task that has just recently lead to first successful designs of such RNA elements.
Collapse
Affiliation(s)
- Sven Findeiß
- Research Group Bioinformatics and Computational Biology, Faculty of Computer Science, University of Vienna, Vienna, Austria; Institute for Theoretical Chemistry, University of Vienna, Vienna, Austria
| | - Manja Wachsmuth
- Institute for Biochemistry, University of Leipzig, Leipzig, Germany
| | - Mario Mörl
- Institute for Biochemistry, University of Leipzig, Leipzig, Germany.
| | - Peter F Stadler
- Institute for Theoretical Chemistry, University of Vienna, Vienna, Austria; Bioinformatics Group, Department of Computer Science and the Interdisciplinary Center for Bioinformatic, University of Leipzig, Leipzig, Germany; Center for RNA in Technology and Health, University of Copenhagen, Frederiksberg, Denmark; Max Planck Institute for Mathematics in the Sciences, Leipzig, Germany; Fraunhofer Institute for Cell Therapy and Immunology, Leipzig, Germany; Santa Fe Institute, Santa Fe, New Mexico, USA
| |
Collapse
|
25
|
Dotu I, Garcia-Martin JA, Slinger BL, Mechery V, Meyer MM, Clote P. Complete RNA inverse folding: computational design of functional hammerhead ribozymes. Nucleic Acids Res 2014; 42:11752-62. [PMID: 25209235 PMCID: PMC4191386 DOI: 10.1093/nar/gku740] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
Abstract
Nanotechnology and synthetic biology currently constitute one of the most innovative, interdisciplinary fields of research, poised to radically transform society in the 21st century. This paper concerns the synthetic design of ribonucleic acid molecules, using our recent algorithm, RNAiFold, which can determine all RNA sequences whose minimum free energy secondary structure is a user-specified target structure. Using RNAiFold, we design ten cis-cleaving hammerhead ribozymes, all of which are shown to be functional by a cleavage assay. We additionally use RNAiFold to design a functional cis-cleaving hammerhead as a modular unit of a synthetic larger RNA. Analysis of kinetics on this small set of hammerheads suggests that cleavage rate of computationally designed ribozymes may be correlated with positional entropy, ensemble defect, structural flexibility/rigidity and related measures. Artificial ribozymes have been designed in the past either manually or by SELEX (Systematic Evolution of Ligands by Exponential Enrichment); however, this appears to be the first purely computational design and experimental validation of novel functional ribozymes. RNAiFold is available at http://bioinformatics.bc.edu/clotelab/RNAiFold/.
Collapse
Affiliation(s)
- Ivan Dotu
- Biology Department, Boston College, 140 Commonwealth Avenue, Chestnut Hill, MA 02467, USA
| | | | - Betty L Slinger
- Biology Department, Boston College, 140 Commonwealth Avenue, Chestnut Hill, MA 02467, USA
| | - Vinodh Mechery
- Hofstra North Shore-LIJ School of Medicine, Hempstead, NY 11549, USA
| | - Michelle M Meyer
- Biology Department, Boston College, 140 Commonwealth Avenue, Chestnut Hill, MA 02467, USA
| | - Peter Clote
- Biology Department, Boston College, 140 Commonwealth Avenue, Chestnut Hill, MA 02467, USA
| |
Collapse
|