1
|
Jain S, Laederach A, Ramos SBV, Schlick T. A pipeline for computational design of novel RNA-like topologies. Nucleic Acids Res 2019; 46:7040-7051. [PMID: 30137633 PMCID: PMC6101589 DOI: 10.1093/nar/gky524] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2017] [Accepted: 05/24/2018] [Indexed: 12/11/2022] Open
Abstract
Designing novel RNA topologies is a challenge, with important therapeutic and industrial applications. We describe a computational pipeline for design of novel RNA topologies based on our coarse-grained RNA-As-Graphs (RAG) framework. RAG represents RNA structures as tree graphs and describes RNA secondary (2D) structure topologies (currently up to 13 vertices, ≈260 nucleotides). We have previously identified novel graph topologies that are RNA-like among these. Here we describe a systematic design pipeline and illustrate design for six broad design problems using recently developed tools for graph-partitioning and fragment assembly (F-RAG). Following partitioning of the target graph, corresponding atomic fragments from our RAG-3D database are combined using F-RAG, and the candidate atomic models are scored using a knowledge-based potential developed for 3D structure prediction. The sequences of the top scoring models are screened further using available tools for 2D structure prediction. The results indicate that our modular approach based on RNA-like topologies rather than specific 2D structures allows for greater flexibility in the design process, and generates a large number of candidate sequences quickly. Experimental structure probing using SHAPE-MaP for two sequences agree with our predictions and suggest that our combined tools yield excellent candidates for further sequence and experimental screening.
Collapse
Affiliation(s)
- Swati Jain
- Department of Chemistry, New York University, 1001 Silver, 100 Washington Square East, New York, NY 10003, USA
| | - Alain Laederach
- Department of Biology, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Silvia B V Ramos
- Department of Biochemistry and Biophysics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Tamar Schlick
- Department of Chemistry, New York University, 1001 Silver, 100 Washington Square East, New York, NY 10003, USA.,Courant Institute of Mathematical Sciences, New York University, 251 Mercer Street, New York, NY 10012, USA.,NYU-ECNU Center for Computational Chemistry at New York University Shanghai, Room 340, Geography Building, North Zhongshan Road, 3663 Shanghai, China
| |
Collapse
|
2
|
Evolving methods for rational de novo design of functional RNA molecules. Methods 2019; 161:54-63. [PMID: 31059832 DOI: 10.1016/j.ymeth.2019.04.022] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2019] [Revised: 04/26/2019] [Accepted: 04/29/2019] [Indexed: 12/16/2022] Open
Abstract
Artificial RNA molecules with novel functionality have many applications in synthetic biology, pharmacy and white biotechnology. The de novo design of such devices using computational methods and prediction tools is a resource-efficient alternative to experimental screening and selection pipelines. In this review, we describe methods common to many such computational approaches, thoroughly dissect these methods and highlight open questions for the individual steps. Initially, it is essential to investigate the biological target system, the regulatory mechanism that will be exploited, as well as the desired components in order to define design objectives. Subsequent computational design is needed to combine the selected components and to obtain novel functionality. This process can usually be split into constrained sequence sampling, the formulation of an optimization problem and an in silico analysis to narrow down the number of candidates with respect to secondary goals. Finally, experimental analysis is important to check whether the defined design objectives are indeed met in the target environment and detailed characterization experiments should be performed to improve the mechanistic models and detect missing design requirements.
Collapse
|
3
|
Lotfi M, Zare-Mirakabad F, Montaseri S. RNA design using simulated SHAPE data. Genes Genet Syst 2018; 92:257-265. [PMID: 28757510 DOI: 10.1266/ggs.16-00067] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022] Open
Abstract
It has long been established that in addition to being involved in protein translation, RNA plays essential roles in numerous other cellular processes, including gene regulation and DNA replication. Such roles are known to be dictated by higher-order structures of RNA molecules. It is therefore of prime importance to find an RNA sequence that can fold to acquire a particular function that is desirable for use in pharmaceuticals and basic research. The challenge of finding an RNA sequence for a given structure is known as the RNA design problem. Although there are several algorithms to solve this problem, they mainly consider hard constraints, such as minimum free energy, to evaluate the predicted sequences. Recently, SHAPE data has emerged as a new soft constraint for RNA secondary structure prediction. To take advantage of this new experimental constraint, we report here a new method for accurate design of RNA sequences based on their secondary structures using SHAPE data as pseudo-free energy. We then compare our algorithm with four others: INFO-RNA, ERD, MODENA and RNAifold 2.0. Our algorithm precisely predicts 26 out of 29 new sequences for the structures extracted from the Rfam dataset, while the other four algorithms predict no more than 22 out of 29. The proposed algorithm is comparable to the above algorithms on RNA-SSD datasets, where they can predict up to 33 appropriate sequences for RNA secondary structures out of 34.
Collapse
Affiliation(s)
- Mohadeseh Lotfi
- Faculty of Mathematics and Computer Science, Amirkabir University of Technology
| | | | - Soheila Montaseri
- School of Mathematics, Statistics and Computer Science, College of Science, Enghelab Avenue, University of Tehran
| |
Collapse
|
4
|
Churkin A, Retwitzer MD, Reinharz V, Ponty Y, Waldispühl J, Barash D. Design of RNAs: comparing programs for inverse RNA folding. Brief Bioinform 2018; 19:350-358. [PMID: 28049135 PMCID: PMC6018860 DOI: 10.1093/bib/bbw120] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022] Open
Abstract
Computational programs for predicting RNA sequences with desired folding properties have been extensively developed and expanded in the past several years. Given a secondary structure, these programs aim to predict sequences that fold into a target minimum free energy secondary structure, while considering various constraints. This procedure is called inverse RNA folding. Inverse RNA folding has been traditionally used to design optimized RNAs with favorable properties, an application that is expected to grow considerably in the future in light of advances in the expanding new fields of synthetic biology and RNA nanostructures. Moreover, it was recently demonstrated that inverse RNA folding can successfully be used as a valuable preprocessing step in computational detection of novel noncoding RNAs. This review describes the most popular freeware programs that have been developed for such purposes, starting from RNAinverse that was devised when formulating the inverse RNA folding problem. The most recently published ones that consider RNA secondary structure as input are antaRNA, RNAiFold and incaRNAfbinv, each having different features that could be beneficial to specific biological problems in practice. The various programs also use distinct approaches, ranging from ant colony optimization to constraint programming, in addition to adaptive walk, simulated annealing and Boltzmann sampling. This review compares between the various programs and provides a simple description of the various possibilities that would benefit practitioners in selecting the most suitable program. It is geared for specific tasks requiring RNA design based on input secondary structure, with an outlook toward the future of RNA design programs.
Collapse
Affiliation(s)
- Alexander Churkin
- Shamoon College of Engineering and Physics Department at Ben-Gurion University, Beer-Sheva, Israel
| | | | - Vladimir Reinharz
- Department of Computer Science, Ben-Gurion University, Beer-Sheva, Israel
- School of Computer Science, McGill University, Montréal QC, Canada
| | - Yann Ponty
- Laboratoire d’informatique, École Polytechnique, Palaiseau, France
| | | | - Danny Barash
- Department of Computer Science, Ben-Gurion University, Beer-Sheva, Israel
| |
Collapse
|
5
|
Yang X, Yoshizoe K, Taneda A, Tsuda K. RNA inverse folding using Monte Carlo tree search. BMC Bioinformatics 2017; 18:468. [PMID: 29110632 PMCID: PMC5674771 DOI: 10.1186/s12859-017-1882-7] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2017] [Accepted: 10/26/2017] [Indexed: 11/10/2022] Open
Abstract
Background Artificially synthesized RNA molecules provide important ways for creating a variety of novel functional molecules. State-of-the-art RNA inverse folding algorithms can design simple and short RNA sequences of specific GC content, that fold into the target RNA structure. However, their performance is not satisfactory in complicated cases. Result We present a new inverse folding algorithm called MCTS-RNA, which uses Monte Carlo tree search (MCTS), a technique that has shown exceptional performance in Computer Go recently, to represent and discover the essential part of the sequence space. To obtain high accuracy, initial sequences generated by MCTS are further improved by a series of local updates. Our algorithm has an ability to control the GC content precisely and can deal with pseudoknot structures. Using common benchmark datasets for evaluation, MCTS-RNA showed a lot of promise as a standard method of RNA inverse folding. Conclusion MCTS-RNA is available at https://github.com/tsudalab/MCTS-RNA. Electronic supplementary material The online version of this article (doi:10.1186/s12859-017-1882-7) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Xiufeng Yang
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, 5-1-5 Kashiwanoha, Kashiwa, 277-8561, Japan
| | - Kazuki Yoshizoe
- RIKEN Center for Advanced Intelligence Project, 1-4-1 Nihombashi Chuo-ku, Tokyo, 103-0027, Japan
| | - Akito Taneda
- Graduate School of Science and Technology, Hirosaki University, 3 Bunkyo-cho, Hirosaki, 036-8561, Japan
| | - Koji Tsuda
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, 5-1-5 Kashiwanoha, Kashiwa, 277-8561, Japan. .,Center for Materials Research by Information Integration, National Institute for Materials Science, 1-2-1 Sengen, Tsukuba, 305-0047, Japan. .,RIKEN Center for Advanced Intelligence Project, 1-4-1 Nihombashi Chuo-ku, Tokyo, 103-0027, Japan.
| |
Collapse
|
6
|
Wolfe BR, Porubsky NJ, Zadeh JN, Dirks RM, Pierce NA. Constrained Multistate Sequence Design for Nucleic Acid Reaction Pathway Engineering. J Am Chem Soc 2017; 139:3134-3144. [DOI: 10.1021/jacs.6b12693] [Citation(s) in RCA: 72] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/14/2023]
Affiliation(s)
- Brian R. Wolfe
- Division of Biology & Biological Engineering, California Institute of Technology, Pasadena, California 91125, United States
| | - Nicholas J. Porubsky
- Division of Chemistry & Chemical Engineering, California Institute of Technology, Pasadena, California 91125, United States
| | - Joseph N. Zadeh
- Division of Biology & Biological Engineering, California Institute of Technology, Pasadena, California 91125, United States
| | - Robert M. Dirks
- Division of Biology & Biological Engineering, California Institute of Technology, Pasadena, California 91125, United States
| | - Niles A. Pierce
- Division of Biology & Biological Engineering, California Institute of Technology, Pasadena, California 91125, United States
- Division of Engineering & Applied Science, California Institute of Technology, Pasadena, California 91125, United States
- Weatherall
Institute of Molecular Medicine, University of Oxford, Oxford OX3 9DS, United Kingdom
| |
Collapse
|
7
|
Accurate Classification of RNA Structures Using Topological Fingerprints. PLoS One 2016; 11:e0164726. [PMID: 27755571 PMCID: PMC5068708 DOI: 10.1371/journal.pone.0164726] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2016] [Accepted: 09/29/2016] [Indexed: 12/26/2022] Open
Abstract
While RNAs are well known to possess complex structures, functionally similar RNAs often have little sequence similarity. While the exact size and spacing of base-paired regions vary, functionally similar RNAs have pronounced similarity in the arrangement, or topology, of base-paired stems. Furthermore, predicted RNA structures often lack pseudoknots (a crucial aspect of biological activity), and are only partially correct, or incomplete. A topological approach addresses all of these difficulties. In this work we describe each RNA structure as a graph that can be converted to a topological spectrum (RNA fingerprint). The set of subgraphs in an RNA structure, its RNA fingerprint, can be compared with the fingerprints of other RNA structures to identify and correctly classify functionally related RNAs. Topologically similar RNAs can be identified even when a large fraction, up to 30%, of the stems are omitted, indicating that highly accurate structures are not necessary. We investigate the performance of the RNA fingerprint approach on a set of eight highly curated RNA families, with diverse sizes and functions, containing pseudoknots, and with little sequence similarity-an especially difficult test set. In spite of the difficult test set, the RNA fingerprint approach is very successful (ROC AUC > 0.95). Due to the inclusion of pseudoknots, the RNA fingerprint approach both covers a wider range of possible structures than methods based only on secondary structure, and its tolerance for incomplete structures suggests that it can be applied even to predicted structures. Source code is freely available at https://github.rcac.purdue.edu/mgribsko/XIOS_RNA_fingerprint.
Collapse
|
8
|
Zandi K, Butler G, Kharma N. An Adaptive Defect Weighted Sampling Algorithm to Design Pseudoknotted RNA Secondary Structures. Front Genet 2016; 7:129. [PMID: 27499762 PMCID: PMC4956659 DOI: 10.3389/fgene.2016.00129] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2016] [Accepted: 07/06/2016] [Indexed: 01/18/2023] Open
Abstract
Computational design of RNA sequences that fold into targeted secondary structures has many applications in biomedicine, nanotechnology and synthetic biology. An RNA molecule is made of different types of secondary structure elements and an important RNA element named pseudoknot plays a key role in stabilizing the functional form of the molecule. However, due to the computational complexities associated with characterizing pseudoknotted RNA structures, most of the existing RNA sequence designer algorithms generally ignore this important structural element and therefore limit their applications. In this paper we present a new algorithm to design RNA sequences for pseudoknotted secondary structures. We use NUPACK as the folding algorithm to compute the equilibrium characteristics of the pseudoknotted RNAs, and describe a new adaptive defect weighted sampling algorithm named Enzymer to design low ensemble defect RNA sequences for targeted secondary structures including pseudoknots. We used a biological data set of 201 pseudoknotted structures from the Pseudobase library to benchmark the performance of our algorithm. We compared the quality characteristics of the RNA sequences we designed by Enzymer with the results obtained from the state of the art MODENA and antaRNA. Our results show our method succeeds more frequently than MODENA and antaRNA do, and generates sequences that have lower ensemble defect, lower probability defect and higher thermostability. Finally by using Enzymer and by constraining the design to a naturally occurring and highly conserved Hammerhead motif, we designed 8 sequences for a pseudoknotted cis-acting Hammerhead ribozyme. Enzymer is available for download at https://bitbucket.org/casraz/enzymer.
Collapse
Affiliation(s)
- Kasra Zandi
- Computer Science Department, Concordia UniversityMontreal, QC, Canada
| | - Gregory Butler
- Computer Science Department, Concordia UniversityMontreal, QC, Canada
- Centre for Structural and Functional Genomics, Concordia UniversityMontreal, QC, Canada
| | - Nawwaf Kharma
- Centre for Structural and Functional Genomics, Concordia UniversityMontreal, QC, Canada
- Electrical and Computer Engineering Department, Concordia UniversityMontreal, QC, Canada
| |
Collapse
|
9
|
Drory Retwitzer M, Reinharz V, Ponty Y, Waldispühl J, Barash D. incaRNAfbinv: a web server for the fragment-based design of RNA sequences. Nucleic Acids Res 2016; 44:W308-14. [PMID: 27185893 PMCID: PMC5741205 DOI: 10.1093/nar/gkw440] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2016] [Accepted: 05/06/2016] [Indexed: 01/02/2023] Open
Abstract
In recent years, new methods for computational RNA design have been developed and applied to various problems in synthetic biology and nanotechnology. Lately, there is considerable interest in incorporating essential biological information when solving the inverse RNA folding problem. Correspondingly, RNAfbinv aims at including biologically meaningful constraints and is the only program to-date that performs a fragment-based design of RNA sequences. In doing so it allows the design of sequences that do not necessarily exactly fold into the target, as long as the overall coarse-grained tree graph shape is preserved. Augmented by the weighted sampling algorithm of incaRNAtion, our web server called incaRNAfbinv implements the method devised in RNAfbinv and offers an interactive environment for the inverse folding of RNA using a fragment-based design approach. It takes as input: a target RNA secondary structure; optional sequence and motif constraints; optional target minimum free energy, neutrality and GC content. In addition to the design of synthetic regulatory sequences, it can be used as a pre-processing step for the detection of novel natural occurring RNAs. The two complementary methodologies RNAfbinv and incaRNAtion are merged together and fully implemented in our web server incaRNAfbinv, available at http://www.cs.bgu.ac.il/incaRNAfbinv.
Collapse
Affiliation(s)
| | - Vladimir Reinharz
- School of Computer Science & McGill Centre for Bioinformatics, McGill University, Montréal, QC H3A 0E9, Canada
| | - Yann Ponty
- Laboratoire d'Informatique (LIX)-CNRS UMR 7161, École Polytechnique, 91128 Palaiseau, France AMIB team/project, INRIA Saclay, Bâtiment Alan Turing, 91128 Palaiseau, France
| | - Jérôme Waldispühl
- School of Computer Science & McGill Centre for Bioinformatics, McGill University, Montréal, QC H3A 0E9, Canada
| | - Danny Barash
- Department of Computer Science, Ben-Gurion University, Beer-Sheva 84105, Israel
| |
Collapse
|
10
|
Wolfe BR, Pierce NA. Sequence Design for a Test Tube of Interacting Nucleic Acid Strands. ACS Synth Biol 2015; 4:1086-100. [PMID: 25329866 DOI: 10.1021/sb5002196] [Citation(s) in RCA: 41] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023]
Abstract
We describe an algorithm for designing the equilibrium base-pairing properties of a test tube of interacting nucleic acid strands. A target test tube is specified as a set of desired "on-target" complexes, each with a target secondary structure and target concentration, and a set of undesired "off-target" complexes, each with vanishing target concentration. Sequence design is performed by optimizing the test tube ensemble defect, corresponding to the concentration of incorrectly paired nucleotides at equilibrium evaluated over the ensemble of the test tube. To reduce the computational cost of accepting or rejecting mutations to a random initial sequence, the structural ensemble of each on-target complex is hierarchically decomposed into a tree of conditional subensembles, yielding a forest of decomposition trees. Candidate sequences are evaluated efficiently at the leaf level of the decomposition forest by estimating the test tube ensemble defect from conditional physical properties calculated over the leaf subensembles. As optimized subsequences are merged toward the root level of the forest, any emergent defects are eliminated via ensemble redecomposition and sequence reoptimization. After successfully merging subsequences to the root level, the exact test tube ensemble defect is calculated for the first time, explicitly checking for the effect of the previously neglected off-target complexes. Any off-target complexes that form at appreciable concentration are hierarchically decomposed, added to the decomposition forest, and actively destabilized during subsequent forest reoptimization. For target test tubes representative of design challenges in the molecular programming and synthetic biology communities, our test tube design algorithm typically succeeds in achieving a normalized test tube ensemble defect ≤1% at a design cost within an order of magnitude of the cost of test tube analysis.
Collapse
Affiliation(s)
- Brian R. Wolfe
- Division of Biology and Biological
Engineering and ‡Division of Engineering and Applied
Science, California Institute of Technology, Pasadena, California 91125, United States
| | - Niles A. Pierce
- Division of Biology and Biological
Engineering and ‡Division of Engineering and Applied
Science, California Institute of Technology, Pasadena, California 91125, United States
| |
Collapse
|
11
|
Jabbari H, Aminpour M, Montemagno C. Computational Approaches to Nucleic Acid Origami. ACS COMBINATORIAL SCIENCE 2015; 17:535-47. [PMID: 26348196 DOI: 10.1021/acscombsci.5b00079] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]
Abstract
Recent advances in experimental DNA origami have dramatically expanded the horizon of DNA nanotechnology. Complex 3D suprastructures have been designed and developed using DNA origami with applications in biomaterial science, nanomedicine, nanorobotics, and molecular computation. Ribonucleic acid (RNA) origami has recently been realized as a new approach. Similar to DNA, RNA molecules can be designed to form complex 3D structures through complementary base pairings. RNA origami structures are, however, more compact and more thermodynamically stable due to RNA's non-canonical base pairing and tertiary interactions. With all these advantages, the development of RNA origami lags behind DNA origami by a large gap. Furthermore, although computational methods have proven to be effective in designing DNA and RNA origami structures and in their evaluation, advances in computational nucleic acid origami is even more limited. In this paper, we review major milestones in experimental and computational DNA and RNA origami and present current challenges in these fields. We believe collaboration between experimental nanotechnologists and computer scientists are critical for advancing these new research paradigms.
Collapse
Affiliation(s)
- Hosna Jabbari
- Ingenuity Lab, 11421 Saskatchewan
Drive, Edmonton, Alberta T6G 2M9, Canada
- Department
of Chemical and Materials Engineering, University of Alberta, Edmonton T6G 2V4, Canada
| | - Maral Aminpour
- Ingenuity Lab, 11421 Saskatchewan
Drive, Edmonton, Alberta T6G 2M9, Canada
- Department
of Chemical and Materials Engineering, University of Alberta, Edmonton T6G 2V4, Canada
| | - Carlo Montemagno
- Ingenuity Lab, 11421 Saskatchewan
Drive, Edmonton, Alberta T6G 2M9, Canada
- Department
of Chemical and Materials Engineering, University of Alberta, Edmonton T6G 2V4, Canada
| |
Collapse
|
12
|
Esmaili-Taheri A, Ganjtabesh M. ERD: a fast and reliable tool for RNA design including constraints. BMC Bioinformatics 2015; 16:20. [PMID: 25626878 PMCID: PMC4384295 DOI: 10.1186/s12859-014-0444-5] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2014] [Accepted: 11/19/2014] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The function of an RNA in cellular processes is directly related to its structure. The free energy of RNA structure in another important key to its function as only some structures with a specific level of free energy can take part in cellular reactions. Therefore, to perform a specific function, a particular RNA structure with specific level of free energy is required. For a given RNA structure, the goal of the RNA design problem is to design an RNA sequence that folds into the given structure. To mimic the biological features of RNA sequences and structures, some sequence and energy constraints should be considered in designing RNA. Although the level of free energy is important, it is not considered in the available approaches for RNA design problem. RESULTS In this paper, we present a new version of our evolutionary algorithm for RNA design problem, entitled ERD, and extend it to handle some sequence and energy constraints. In the sequence constraints, one can restrict sequence positions to a fixed nucleotide or to a subset of nucleotides. As for the energy constraint, one can specify an interval for the free energy ranges of the designed sequences. We compare our algorithm with INFO-RNA, MODENA, NUPACK, and RNAiFold approaches for some artificial and natural RNA secondary structures and constraints. CONCLUSIONS The results indicate that our algorithm outperforms the other mentioned approaches in terms of accuracy, speedup, divergency, nucleotides distribution, and similarity to the natural RNA sequences. Particularly, the designed RNA sequences in our method are much more reliable and similar to the natural counterparts. The generated sequences are more diverse and they have closer nucleotides distribution to the natural one. The ERD tool and web server are freely available at http://mostafa.ut.ac.ir/corna/erd-cons/ .
Collapse
Affiliation(s)
- Ali Esmaili-Taheri
- Department of Computer Science, School of Mathematics, Statistics, and Computer Science, College of Science, University of Tehran, Tehran, Iran.
| | - Mohammad Ganjtabesh
- Department of Computer Science, School of Mathematics, Statistics, and Computer Science, College of Science, University of Tehran, Tehran, Iran. .,Laboratoire d'Informatique (LIX), Ecole Polytechnique, Palaiseau CEDEX, 91128, France.
| |
Collapse
|
13
|
Abstract
In this chapter, we review both computational and experimental aspects of de novo RNA sequence design. We give an overview of currently available design software and their limitations, and discuss the necessary setup to experimentally validate proper function in vitro and in vivo. We focus on transcription-regulating riboswitches, a task that has just recently lead to first successful designs of such RNA elements.
Collapse
Affiliation(s)
- Sven Findeiß
- Research Group Bioinformatics and Computational Biology, Faculty of Computer Science, University of Vienna, Vienna, Austria; Institute for Theoretical Chemistry, University of Vienna, Vienna, Austria
| | - Manja Wachsmuth
- Institute for Biochemistry, University of Leipzig, Leipzig, Germany
| | - Mario Mörl
- Institute for Biochemistry, University of Leipzig, Leipzig, Germany.
| | - Peter F Stadler
- Institute for Theoretical Chemistry, University of Vienna, Vienna, Austria; Bioinformatics Group, Department of Computer Science and the Interdisciplinary Center for Bioinformatic, University of Leipzig, Leipzig, Germany; Center for RNA in Technology and Health, University of Copenhagen, Frederiksberg, Denmark; Max Planck Institute for Mathematics in the Sciences, Leipzig, Germany; Fraunhofer Institute for Cell Therapy and Immunology, Leipzig, Germany; Santa Fe Institute, Santa Fe, New Mexico, USA
| |
Collapse
|
14
|
Esmaili-Taheri A, Ganjtabesh M, Mohammad-Noori M. Evolutionary solution for the RNA design problem. Bioinformatics 2014; 30:1250-8. [PMID: 24407223 DOI: 10.1093/bioinformatics/btu001] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
MOTIVATION RNAs play fundamental roles in cellular processes. The function of an RNA is highly dependent on its 3D conformation, which is referred to as the RNA tertiary structure. Because the prediction or experimental determination of these structures is difficult, so many works focus on the problems associated with the RNA secondary structure. Here, we consider the RNA inverse folding problem, in which an RNA secondary structure is given as a target structure and the goal is to design an RNA sequence that folds into the target structure. In this article, we introduce a new evolutionary algorithm for the RNA inverse folding problem. Our algorithm, entitled Evolutionary RNA Design, generates a sequence whose minimum free energy structure is the same as the target structure. RESULTS We compare our algorithm with INFO-RNA, MODENA, RNAiFold and NUPACK approaches for some biological test sets. The results presented in this article indicate that for longer structures, our algorithm performs better than the other mentioned algorithms in terms of the energy range, accuracy, speedup and nucleotide distribution. Particularly, the generated RNA sequences in our method are much more reliable and similar to the natural RNA sequences.
Collapse
Affiliation(s)
- Ali Esmaili-Taheri
- Department of Computer Science, School of Mathematics, Statistics, and Computer Science, University of Tehran, P. O. Box: 14155-6455, Tehran, Iran, Laboratoire d'Informatique (LIX), Ecole Polytechnique, 91128 Palaiseau CEDEX, France and School of Biological Science, Institute for Research in Fundamental Sciences (IPM), P.O. Box: 19395-5746 Tehran, Iran
| | | | | |
Collapse
|
15
|
Reinharz V, Ponty Y, Waldispühl J. A weighted sampling algorithm for the design of RNA sequences with targeted secondary structure and nucleotide distribution. Bioinformatics 2013; 29:i308-15. [PMID: 23812999 PMCID: PMC3694657 DOI: 10.1093/bioinformatics/btt217] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
Motivations: The design of RNA sequences folding into predefined secondary structures is a milestone for many synthetic biology and gene therapy studies. Most of the current software uses similar local search strategies (i.e. a random seed is progressively adapted to acquire the desired folding properties) and more importantly do not allow the user to control explicitly the nucleotide distribution such as the GC-content in their sequences. However, the latter is an important criterion for large-scale applications as it could presumably be used to design sequences with better transcription rates and/or structural plasticity. Results: In this article, we introduce IncaRNAtion, a novel algorithm to design RNA sequences folding into target secondary structures with a predefined nucleotide distribution. IncaRNAtion uses a global sampling approach and weighted sampling techniques. We show that our approach is fast (i.e. running time comparable or better than local search methods), seedless (we remove the bias of the seed in local search heuristics) and successfully generates high-quality sequences (i.e. thermodynamically stable) for any GC-content. To complete this study, we develop a hybrid method combining our global sampling approach with local search strategies. Remarkably, our glocal methodology overcomes both local and global approaches for sampling sequences with a specific GC-content and target structure. Availability:IncaRNAtion is available at csb.cs.mcgill.ca/incarnation/ Contact:jeromew@cs.mcgill.ca or yann.ponty@lix.polytechnique.fr Supplementary Information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Vladimir Reinharz
- School of Computer Science & McGill Centre for Bioinformatics, McGill University, Montréal, QC, Canada
| | | | | |
Collapse
|
16
|
Weinbrand L, Avihoo A, Barash D. RNAfbinv: an interactive Java application for fragment-based design of RNA sequences. Bioinformatics 2013; 29:2938-40. [PMID: 23975763 DOI: 10.1093/bioinformatics/btt494] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
SUMMARY In RNA design problems, it is plausible to assume that the user would be interested in preserving a particular RNA secondary structure motif, or fragment, for biological reasons. The preservation could be in structure or sequence, or both. Thus, the inverse RNA folding problem could benefit from considering fragment constraints. We have developed a new interactive Java application called RNA fragment-based inverse that allows users to insert an RNA secondary structure in dot-bracket notation. It then performs sequence design that conforms to the shape of the input secondary structure, the specified thermodynamic stability, the specified mutational robustness and the user-selected fragment after shape decomposition. In this shape-based design approach, specific RNA structural motifs with known biological functions are strictly enforced, while others can possess more flexibility in their structure in favor of preserving physical attributes and additional constraints. AVAILABILITY RNAfbinv is freely available for download on the web at http://www.cs.bgu.ac.il/~RNAexinv/RNAfbinv. The site contains a help file with an explanation regarding the exact use.
Collapse
Affiliation(s)
- Lina Weinbrand
- Department of Computer Science, Ben Gurion University of the Negev, Beer Sheva 84105, Israel and Microsoft Research Israel, Herzliya 46733, Israel
| | | | | |
Collapse
|
17
|
Ganjtabesh M, Zare-Mirakabad F, Nowzari-Dalini A. Inverse RNA folding solution based on multi-objective genetic algorithm and Gibbs sampling method. EXCLI JOURNAL 2013; 12:546-55. [PMID: 26933401 PMCID: PMC4763459] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/20/2013] [Accepted: 05/02/2013] [Indexed: 11/09/2022]
Abstract
In living systems, RNAs play important biological functions. The functional form of an RNA frequently requires a specific tertiary structure. The scaffold for this structure is provided by secondary structural elements that are hydrogen bonds within the molecule. Here, we concentrate on the inverse RNA folding problem. In this problem, an RNA secondary structure is given as a target structure and the goal is to design an RNA sequence that its structure is the same (or very similar) to the given target structure. Different heuristic search methods have been proposed for this problem. One common feature among these methods is to use a folding algorithm to evaluate the accuracy of the designed RNA sequence during the generation process. The well known folding algorithms take O(n(3)) times where n is the length of the RNA sequence. In this paper, we introduce a new algorithm called GGI-Fold based on multi-objective genetic algorithm and Gibbs sampling method for the inverse RNA folding problem. Our algorithm generates a sequence where its structure is the same or very similar to the given target structure. The key feature of our method is that it never uses any folding algorithm to improve the quality of the generated sequences. We compare our algorithm with RNA-SSD for some biological test samples. In all test samples, our algorithm outperforms the RNA-SSD method for generating a sequence where its structure is more stable.
Collapse
Affiliation(s)
- M Ganjtabesh
- School of Mathematics, Statistics, and Computer Science, College of Science, University of Tehran, Tehran, Iran,School of Computer Science, Institute for Studies in Theoretical Physics and Mathematics (IPM), Tehran, Iran,Laboratoire d’Informatique (LIX), Ecole Polytechnique, Palaiseau CEDEX, 91128, France,*To whom correspondence should be addressed: M Ganjtabesh, School of Mathematics, Statistics, and Computer Science, College of Science, University of Tehran, Tehran, Iran, E-mail:
| | - F Zare-Mirakabad
- Department of Applied Mathematics, Faculty of Mathematics and Computer Science, Amirkabir University of Technology, Tehran, Iran
| | - A Nowzari-Dalini
- School of Mathematics, Statistics, and Computer Science, College of Science, University of Tehran, Tehran, Iran
| |
Collapse
|
18
|
Gaspar P, Moura G, Santos MAS, Oliveira JL. mRNA secondary structure optimization using a correlated stem-loop prediction. Nucleic Acids Res 2013; 41:e73. [PMID: 23325845 PMCID: PMC3616703 DOI: 10.1093/nar/gks1473] [Citation(s) in RCA: 51] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022] Open
Abstract
Secondary structure of messenger RNA plays an important role in the bio-synthesis of proteins. Its negative impact on translation can reduce the yield of protein by slowing or blocking the initiation and movement of ribosomes along the mRNA, becoming a major factor in the regulation of gene expression. Several algorithms can predict the formation of secondary structures by calculating the minimum free energy of RNA sequences, or perform the inverse process of obtaining an RNA sequence for a given structure. However, there is still no approach to redesign an mRNA to achieve minimal secondary structure without affecting the amino acid sequence. Here we present the first strategy to optimize mRNA secondary structures, to increase (or decrease) the minimum free energy of a nucleotide sequence, without changing its resulting polypeptide, in a time-efficient manner, through a simplistic approximation to hairpin formation. Our data show that this approach can efficiently increase the minimum free energy by >40%, strongly reducing the strength of secondary structures. Applications of this technique range from multi-objective optimization of genes by controlling minimum free energy together with CAI and other gene expression variables, to optimization of secondary structures at the genomic level.
Collapse
Affiliation(s)
- Paulo Gaspar
- DETI/IEETA, University of Aveiro, Campus Universitário de Santiago, 3810-193 Aveiro, Portugal.
| | | | | | | |
Collapse
|
19
|
Lyngsø RB, Anderson JWJ, Sizikova E, Badugu A, Hyland T, Hein J. Frnakenstein: multiple target inverse RNA folding. BMC Bioinformatics 2012; 13:260. [PMID: 23043260 PMCID: PMC3534541 DOI: 10.1186/1471-2105-13-260] [Citation(s) in RCA: 52] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2012] [Accepted: 09/23/2012] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND RNA secondary structure prediction, or folding, is a classic problem in bioinformatics: given a sequence of nucleotides, the aim is to predict the base pairs formed in its three dimensional conformation. The inverse problem of designing a sequence folding into a particular target structure has only more recently received notable interest. With a growing appreciation and understanding of the functional and structural properties of RNA motifs, and a growing interest in utilising biomolecules in nano-scale designs, the interest in the inverse RNA folding problem is bound to increase. However, whereas the RNA folding problem from an algorithmic viewpoint has an elegant and efficient solution, the inverse RNA folding problem appears to be hard. RESULTS In this paper we present a genetic algorithm approach to solve the inverse folding problem. The main aims of the development was to address the hitherto mostly ignored extension of solving the inverse folding problem, the multi-target inverse folding problem, while simultaneously designing a method with superior performance when measured on the quality of designed sequences. The genetic algorithm has been implemented as a Python program called Frnakenstein. It was benchmarked against four existing methods and several data sets totalling 769 real and predicted single structure targets, and on 292 two structure targets. It performed as well as or better at finding sequences which folded in silico into the target structure than all existing methods, without the heavy bias towards CG base pairs that was observed for all other top performing methods. On the two structure targets it also performed well, generating a perfect design for about 80% of the targets. CONCLUSIONS Our method illustrates that successful designs for the inverse RNA folding problem does not necessarily have to rely on heavy biases in base pair and unpaired base distributions. The design problem seems to become more difficult on larger structures when the target structures are real structures, while no deterioration was observed for predicted structures. Design for two structure targets is considerably more difficult, but far from impossible, demonstrating the feasibility of automated design of artificial riboswitches. The Python implementation is available at http://www.stats.ox.ac.uk/research/genome/software/frnakenstein.
Collapse
Affiliation(s)
- Rune B Lyngsø
- Department of Statistics, University of Oxford, Oxford OX1 3TG, UK
| | | | - Elena Sizikova
- Department of Computer Science, University of Oxford, Oxford OX1 3QD, UK
| | - Amarendra Badugu
- ETH Zürich, Department of Biosystems Science and Engineering, 4058 Basel, Switzerland
| | - Tomas Hyland
- Mathematics Institute, University of Oxford, Oxford OX1 3LB, UK
| | - Jotun Hein
- Department of Statistics, University of Oxford, Oxford OX1 3TG, UK
| |
Collapse
|
20
|
Levin A, Lis M, Ponty Y, O'Donnell CW, Devadas S, Berger B, Waldispühl J. A global sampling approach to designing and reengineering RNA secondary structures. Nucleic Acids Res 2012; 40:10041-52. [PMID: 22941632 PMCID: PMC3488226 DOI: 10.1093/nar/gks768] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023] Open
Abstract
The development of algorithms for designing artificial RNA sequences that fold into specific secondary structures has many potential biomedical and synthetic biology applications. To date, this problem remains computationally difficult, and current strategies to address it resort to heuristics and stochastic search techniques. The most popular methods consist of two steps: First a random seed sequence is generated; next, this seed is progressively modified (i.e. mutated) to adopt the desired folding properties. Although computationally inexpensive, this approach raises several questions such as (i) the influence of the seed; and (ii) the efficiency of single-path directed searches that may be affected by energy barriers in the mutational landscape. In this article, we present RNA-ensign, a novel paradigm for RNA design. Instead of taking a progressive adaptive walk driven by local search criteria, we use an efficient global sampling algorithm to examine large regions of the mutational landscape under structural and thermodynamical constraints until a solution is found. When considering the influence of the seeds and the target secondary structures, our results show that, compared to single-path directed searches, our approach is more robust, succeeds more often and generates more thermodynamically stable sequences. An ensemble approach to RNA design is thus well worth pursuing as a complement to existing approaches. RNA-ensign is available at http://csb.cs.mcgill.ca/RNAensign.
Collapse
Affiliation(s)
- Alex Levin
- Computer Science and Artificial Intelligence Laboratory, Department of Mathematics, MIT, Cambridge, MA 02139, USA
| | | | | | | | | | | | | |
Collapse
|