1
|
Runge F, Franke J, Fertmann D, Backofen R, Hutter F. Partial RNA design. Bioinformatics 2024; 40:i437-i445. [PMID: 38940170 DOI: 10.1093/bioinformatics/btae222] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/29/2024] Open
Abstract
MOTIVATION RNA design is a key technique to achieve new functionality in fields like synthetic biology or biotechnology. Computational tools could help to find such RNA sequences but they are often limited in their formulation of the search space. RESULTS In this work, we propose partial RNA design, a novel RNA design paradigm that addresses the limitations of current RNA design formulations. Partial RNA design describes the problem of designing RNAs from arbitrary RNA sequences and structure motifs with multiple design goals. By separating the design space from the objectives, our formulation enables the design of RNAs with variable lengths and desired properties, while still allowing precise control over sequence and structure constraints at individual positions. Based on this formulation, we introduce a new algorithm, libLEARNA, capable of efficiently solving different constraint RNA design tasks. A comprehensive analysis of various problems, including a realistic riboswitch design task, reveals the outstanding performance of libLEARNA and its robustness. AVAILABILITY AND IMPLEMENTATION libLEARNA is open-source and publicly available at: https://github.com/automl/learna_tools.
Collapse
Affiliation(s)
- Frederic Runge
- Department of Computer Science, University of Freiburg, Freiburg 79110, Germany
| | - Jörg Franke
- Department of Computer Science, University of Freiburg, Freiburg 79110, Germany
| | - Daniel Fertmann
- Department of Computer Science, University of Freiburg, Freiburg 79110, Germany
| | - Rolf Backofen
- Department of Computer Science, University of Freiburg, Freiburg 79110, Germany
| | - Frank Hutter
- Department of Computer Science, University of Freiburg, Freiburg 79110, Germany
| |
Collapse
|
2
|
Zhou T, Dai N, Li S, Ward M, Mathews DH, Huang L. RNA design via structure-aware multifrontier ensemble optimization. Bioinformatics 2023; 39:i563-i571. [PMID: 37387188 DOI: 10.1093/bioinformatics/btad252] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/01/2023] Open
Abstract
MOTIVATION RNA design is the search for a sequence or set of sequences that will fold to desired structure, also known as the inverse problem of RNA folding. However, the sequences designed by existing algorithms often suffer from low ensemble stability, which worsens for long sequence design. Additionally, for many methods only a small number of sequences satisfying the MFE criterion can be found by each run of design. These drawbacks limit their use cases. RESULTS We propose an innovative optimization paradigm, SAMFEO, which optimizes ensemble objectives (equilibrium probability or ensemble defect) by iterative search and yields a very large number of successfully designed RNA sequences as byproducts. We develop a search method which leverages structure level and ensemble level information at different stages of the optimization: initialization, sampling, mutation, and updating. Our work, while being less complicated than others, is the first algorithm that is able to design thousands of RNA sequences for the puzzles from the Eterna100 benchmark. In addition, our algorithm solves the most Eterna100 puzzles among all the general optimization based methods in our study. The only baseline solving more puzzles than our work is dependent on handcrafted heuristics designed for a specific folding model. Surprisingly, our approach shows superiority on designing long sequences for structures adapted from the database of 16S Ribosomal RNAs. AVAILABILITY AND IMPLEMENTATION Our source code and data used in this article is available at https://github.com/shanry/SAMFEO.
Collapse
Affiliation(s)
- Tianshuo Zhou
- School of Electrical Engineering and Computer Science, Oregon State University, Corvalli OR 97330, United States
| | - Ning Dai
- School of Electrical Engineering and Computer Science, Oregon State University, Corvalli OR 97330, United States
| | - Sizhen Li
- School of Electrical Engineering and Computer Science, Oregon State University, Corvalli OR 97330, United States
| | - Max Ward
- Department of Computer Science and Software Engineering, The University of Western Australia, Perth, Australia
| | - David H Mathews
- Department of Biochemistry and Biophysics, University of Rochester Medical Center, Rochester, NY 14642, United States
- Center for RNA Biology, University of Rochester Medical Center, Rochester, NY 14642, United States
- Department of Biostatistics & Computational Biology, University of Rochester Medical Center, Rochester, NY 14642, United States
| | - Liang Huang
- School of Electrical Engineering and Computer Science, Oregon State University, Corvalli OR 97330, United States
| |
Collapse
|
3
|
Merleau NSC, Smerlak M. aRNAque: an evolutionary algorithm for inverse pseudoknotted RNA folding inspired by Lévy flights. BMC Bioinformatics 2022; 23:335. [PMID: 35964008 PMCID: PMC9375295 DOI: 10.1186/s12859-022-04866-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2022] [Accepted: 07/29/2022] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND We study in this work the inverse folding problem for RNA, which is the discovery of sequences that fold into given target secondary structures. RESULTS We implement a Lévy mutation scheme in an updated version of aRNAque an evolutionary inverse folding algorithm and apply it to the design of RNAs with and without pseudoknots. We find that the Lévy mutation scheme increases the diversity of designed RNA sequences and reduces the average number of evaluations of the evolutionary algorithm. Compared to antaRNA, aRNAque CPU time is higher but more successful in finding designed sequences that fold correctly into the target structures. CONCLUSION We propose that a Lévy flight offers a better standard mutation scheme for optimizing RNA design. Our new version of aRNAque is available on GitHub as a python script and the benchmark results show improved performance on both Pseudobase++ and the Eterna100 datasets, compared to existing inverse folding tools.
Collapse
Affiliation(s)
- Nono S C Merleau
- Max Planck Institute for Mathematics in the Sciences, Inselstrasse 22, 04103, Leipzig, Germany.
| | - Matteo Smerlak
- Max Planck Institute for Mathematics in the Sciences, Inselstrasse 22, 04103, Leipzig, Germany
| |
Collapse
|
4
|
Churkin A, Retwitzer MD, Reinharz V, Ponty Y, Waldispühl J, Barash D. Design of RNAs: comparing programs for inverse RNA folding. Brief Bioinform 2018; 19:350-358. [PMID: 28049135 PMCID: PMC6018860 DOI: 10.1093/bib/bbw120] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022] Open
Abstract
Computational programs for predicting RNA sequences with desired folding properties have been extensively developed and expanded in the past several years. Given a secondary structure, these programs aim to predict sequences that fold into a target minimum free energy secondary structure, while considering various constraints. This procedure is called inverse RNA folding. Inverse RNA folding has been traditionally used to design optimized RNAs with favorable properties, an application that is expected to grow considerably in the future in light of advances in the expanding new fields of synthetic biology and RNA nanostructures. Moreover, it was recently demonstrated that inverse RNA folding can successfully be used as a valuable preprocessing step in computational detection of novel noncoding RNAs. This review describes the most popular freeware programs that have been developed for such purposes, starting from RNAinverse that was devised when formulating the inverse RNA folding problem. The most recently published ones that consider RNA secondary structure as input are antaRNA, RNAiFold and incaRNAfbinv, each having different features that could be beneficial to specific biological problems in practice. The various programs also use distinct approaches, ranging from ant colony optimization to constraint programming, in addition to adaptive walk, simulated annealing and Boltzmann sampling. This review compares between the various programs and provides a simple description of the various possibilities that would benefit practitioners in selecting the most suitable program. It is geared for specific tasks requiring RNA design based on input secondary structure, with an outlook toward the future of RNA design programs.
Collapse
Affiliation(s)
- Alexander Churkin
- Shamoon College of Engineering and Physics Department at Ben-Gurion University, Beer-Sheva, Israel
| | | | - Vladimir Reinharz
- Department of Computer Science, Ben-Gurion University, Beer-Sheva, Israel
- School of Computer Science, McGill University, Montréal QC, Canada
| | - Yann Ponty
- Laboratoire d’informatique, École Polytechnique, Palaiseau, France
| | | | - Danny Barash
- Department of Computer Science, Ben-Gurion University, Beer-Sheva, Israel
| |
Collapse
|
5
|
Clote P, Bayegan AH. RNA folding kinetics using Monte Carlo and Gillespie algorithms. J Math Biol 2017; 76:1195-1227. [PMID: 28780735 DOI: 10.1007/s00285-017-1169-7] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2016] [Revised: 07/09/2017] [Indexed: 11/26/2022]
Abstract
RNA secondary structure folding kinetics is known to be important for the biological function of certain processes, such as the hok/sok system in E. coli. Although linear algebra provides an exact computational solution of secondary structure folding kinetics with respect to the Turner energy model for tiny ([Formula: see text]20 nt) RNA sequences, the folding kinetics for larger sequences can only be approximated by binning structures into macrostates in a coarse-grained model, or by repeatedly simulating secondary structure folding with either the Monte Carlo algorithm or the Gillespie algorithm. Here we investigate the relation between the Monte Carlo algorithm and the Gillespie algorithm. We prove that asymptotically, the expected time for a K-step trajectory of the Monte Carlo algorithm is equal to [Formula: see text] times that of the Gillespie algorithm, where [Formula: see text] denotes the Boltzmann expected network degree. If the network is regular (i.e. every node has the same degree), then the mean first passage time (MFPT) computed by the Monte Carlo algorithm is equal to MFPT computed by the Gillespie algorithm multiplied by [Formula: see text]; however, this is not true for non-regular networks. In particular, RNA secondary structure folding kinetics, as computed by the Monte Carlo algorithm, is not equal to the folding kinetics, as computed by the Gillespie algorithm, although the mean first passage times are roughly correlated. Simulation software for RNA secondary structure folding according to the Monte Carlo and Gillespie algorithms is publicly available, as is our software to compute the expected degree of the network of secondary structures of a given RNA sequence-see http://bioinformatics.bc.edu/clote/RNAexpNumNbors .
Collapse
Affiliation(s)
- Peter Clote
- Department of Biology, Boston College, Chestnut Hill, MA, 02467, USA.
| | - Amir H Bayegan
- Department of Biology, Boston College, Chestnut Hill, MA, 02467, USA
| |
Collapse
|
6
|
Drory Retwitzer M, Reinharz V, Ponty Y, Waldispühl J, Barash D. incaRNAfbinv: a web server for the fragment-based design of RNA sequences. Nucleic Acids Res 2016; 44:W308-14. [PMID: 27185893 PMCID: PMC5741205 DOI: 10.1093/nar/gkw440] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2016] [Accepted: 05/06/2016] [Indexed: 01/02/2023] Open
Abstract
In recent years, new methods for computational RNA design have been developed and applied to various problems in synthetic biology and nanotechnology. Lately, there is considerable interest in incorporating essential biological information when solving the inverse RNA folding problem. Correspondingly, RNAfbinv aims at including biologically meaningful constraints and is the only program to-date that performs a fragment-based design of RNA sequences. In doing so it allows the design of sequences that do not necessarily exactly fold into the target, as long as the overall coarse-grained tree graph shape is preserved. Augmented by the weighted sampling algorithm of incaRNAtion, our web server called incaRNAfbinv implements the method devised in RNAfbinv and offers an interactive environment for the inverse folding of RNA using a fragment-based design approach. It takes as input: a target RNA secondary structure; optional sequence and motif constraints; optional target minimum free energy, neutrality and GC content. In addition to the design of synthetic regulatory sequences, it can be used as a pre-processing step for the detection of novel natural occurring RNAs. The two complementary methodologies RNAfbinv and incaRNAtion are merged together and fully implemented in our web server incaRNAfbinv, available at http://www.cs.bgu.ac.il/incaRNAfbinv.
Collapse
Affiliation(s)
| | - Vladimir Reinharz
- School of Computer Science & McGill Centre for Bioinformatics, McGill University, Montréal, QC H3A 0E9, Canada
| | - Yann Ponty
- Laboratoire d'Informatique (LIX)-CNRS UMR 7161, École Polytechnique, 91128 Palaiseau, France AMIB team/project, INRIA Saclay, Bâtiment Alan Turing, 91128 Palaiseau, France
| | - Jérôme Waldispühl
- School of Computer Science & McGill Centre for Bioinformatics, McGill University, Montréal, QC H3A 0E9, Canada
| | - Danny Barash
- Department of Computer Science, Ben-Gurion University, Beer-Sheva 84105, Israel
| |
Collapse
|
7
|
Garcia-Martin JA, Dotu I, Clote P. RNAiFold 2.0: a web server and software to design custom and Rfam-based RNA molecules. Nucleic Acids Res 2015; 43:W513-21. [PMID: 26019176 PMCID: PMC4489274 DOI: 10.1093/nar/gkv460] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2015] [Accepted: 04/27/2015] [Indexed: 12/25/2022] Open
Abstract
Several algorithms for RNA inverse folding have been used to design synthetic riboswitches, ribozymes and thermoswitches, whose activity has been experimentally validated. The RNAiFold software is unique among approaches for inverse folding in that (exhaustive) constraint programming is used instead of heuristic methods. For that reason, RNAiFold can generate all sequences that fold into the target structure or determine that there is no solution. RNAiFold 2.0 is a complete overhaul of RNAiFold 1.0, rewritten from the now defunct COMET language to C++. The new code properly extends the capabilities of its predecessor by providing a user-friendly pipeline to design synthetic constructs having the functionality of given Rfam families. In addition, the new software supports amino acid constraints, even for proteins translated in different reading frames from overlapping coding sequences; moreover, structure compatibility/incompatibility constraints have been expanded. With these features, RNAiFold 2.0 allows the user to design single RNA molecules as well as hybridization complexes of two RNA molecules. Availability: the web server, source code and linux binaries are publicly accessible at http://bioinformatics.bc.edu/clotelab/RNAiFold2.0.
Collapse
Affiliation(s)
| | - Ivan Dotu
- Research Programme on Biomedical Informatics (GRIB), Department of Experimental and Health Sciences, Universitat Pompeu Fabra, IMIM (Hospital del Mar Medical Research Institute); C/Dr. Aiguader 88, Barcelona E-08003, Spain
| | - Peter Clote
- Biology Department, Boston College, 140 Commonwealth Avenue, Chestnut Hill, MA 02467, USA
| |
Collapse
|