1
|
Runge F, Franke J, Fertmann D, Backofen R, Hutter F. Partial RNA design. Bioinformatics 2024; 40:i437-i445. [PMID: 38940170 PMCID: PMC11256918 DOI: 10.1093/bioinformatics/btae222] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/29/2024] Open
Abstract
MOTIVATION RNA design is a key technique to achieve new functionality in fields like synthetic biology or biotechnology. Computational tools could help to find such RNA sequences but they are often limited in their formulation of the search space. RESULTS In this work, we propose partial RNA design, a novel RNA design paradigm that addresses the limitations of current RNA design formulations. Partial RNA design describes the problem of designing RNAs from arbitrary RNA sequences and structure motifs with multiple design goals. By separating the design space from the objectives, our formulation enables the design of RNAs with variable lengths and desired properties, while still allowing precise control over sequence and structure constraints at individual positions. Based on this formulation, we introduce a new algorithm, libLEARNA, capable of efficiently solving different constraint RNA design tasks. A comprehensive analysis of various problems, including a realistic riboswitch design task, reveals the outstanding performance of libLEARNA and its robustness. AVAILABILITY AND IMPLEMENTATION libLEARNA is open-source and publicly available at: https://github.com/automl/learna_tools.
Collapse
Affiliation(s)
- Frederic Runge
- Department of Computer Science, University of Freiburg, Freiburg 79110, Germany
| | - Jörg Franke
- Department of Computer Science, University of Freiburg, Freiburg 79110, Germany
| | - Daniel Fertmann
- Department of Computer Science, University of Freiburg, Freiburg 79110, Germany
| | - Rolf Backofen
- Department of Computer Science, University of Freiburg, Freiburg 79110, Germany
| | - Frank Hutter
- Department of Computer Science, University of Freiburg, Freiburg 79110, Germany
| |
Collapse
|
2
|
Zambrano RAI, Hernandez-Perez C, Takahashi MK. RNA Structure Prediction, Analysis, and Design: An Introduction to Web-Based Tools. Methods Mol Biol 2022; 2518:253-269. [PMID: 35666450 DOI: 10.1007/978-1-0716-2421-0_15] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Understanding RNA structure has become critical in the study of RNA in their roles as mediators of biological processes. To aid in these studies, computational algorithms that utilize thermodynamics have been developed to predict RNA secondary structure. Due to the importance of intermolecular interactions, the algorithms have been expanded to determine and predict RNA-RNA hybridization. This chapter discusses popular webservers with the tools for RNA secondary structure prediction, RNA-RNA hybridization, and design. We address key features that distinguish common-functioning programs and their purposes for the interests of the user. Ultimately, we hope this review elucidates web-based tools researchers may take advantage of in their investigations of RNA structure and function.
Collapse
Affiliation(s)
| | | | - Melissa K Takahashi
- Department of Biology, California State University Northridge, Northridge, CA, USA.
| |
Collapse
|
3
|
Wayment-Steele HK, Kim DS, Choe CA, Nicol JJ, Wellington-Oguri R, Watkins AM, Parra Sperberg RA, Huang PS, Participants E, Das R. Theoretical basis for stabilizing messenger RNA through secondary structure design. Nucleic Acids Res 2021; 49:10604-10617. [PMID: 34520542 PMCID: PMC8499941 DOI: 10.1093/nar/gkab764] [Citation(s) in RCA: 51] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2021] [Revised: 08/17/2021] [Accepted: 08/27/2021] [Indexed: 01/08/2023] Open
Abstract
RNA hydrolysis presents problems in manufacturing, long-term storage, world-wide delivery and in vivo stability of messenger RNA (mRNA)-based vaccines and therapeutics. A largely unexplored strategy to reduce mRNA hydrolysis is to redesign RNAs to form double-stranded regions, which are protected from in-line cleavage and enzymatic degradation, while coding for the same proteins. The amount of stabilization that this strategy can deliver and the most effective algorithmic approach to achieve stabilization remain poorly understood. Here, we present simple calculations for estimating RNA stability against hydrolysis, and a model that links the average unpaired probability of an mRNA, or AUP, to its overall hydrolysis rate. To characterize the stabilization achievable through structure design, we compare AUP optimization by conventional mRNA design methods to results from more computationally sophisticated algorithms and crowdsourcing through the OpenVaccine challenge on the Eterna platform. We find that rational design on Eterna and the more sophisticated algorithms lead to constructs with low AUP, which we term 'superfolder' mRNAs. These designs exhibit a wide diversity of sequence and structure features that may be desirable for translation, biophysical size, and immunogenicity. Furthermore, their folding is robust to temperature, computer modeling method, choice of flanking untranslated regions, and changes in target protein sequence, as illustrated by rapid redesign of superfolder mRNAs for B.1.351, P.1 and B.1.1.7 variants of the prefusion-stabilized SARS-CoV-2 spike protein. Increases in in vitro mRNA half-life by at least two-fold appear immediately achievable.
Collapse
MESH Headings
- Algorithms
- Base Pairing
- Base Sequence
- COVID-19/prevention & control
- Humans
- Hydrolysis
- RNA Stability
- RNA, Double-Stranded/chemistry
- RNA, Double-Stranded/genetics
- RNA, Double-Stranded/immunology
- RNA, Messenger/chemistry
- RNA, Messenger/genetics
- RNA, Messenger/immunology
- RNA, Viral/chemistry
- RNA, Viral/genetics
- RNA, Viral/immunology
- SARS-CoV-2/genetics
- SARS-CoV-2/immunology
- Spike Glycoprotein, Coronavirus/genetics
- Spike Glycoprotein, Coronavirus/immunology
- Thermodynamics
Collapse
Affiliation(s)
- Hannah K Wayment-Steele
- Department of Chemistry, Stanford University, Stanford, CA 94305, USA
- Eterna Massive Open Laboratory
| | - Do Soon Kim
- Eterna Massive Open Laboratory
- Department of Chemical and Biological Engineering, Northwestern University, Evanston, IL 60208, USA
- Department of Biochemistry, Stanford University, Stanford, CA 94305, USA
| | - Christian A Choe
- Eterna Massive Open Laboratory
- Department of Bioengineering, Stanford University, Stanford, CA 94305, USA
| | | | | | - Andrew M Watkins
- Eterna Massive Open Laboratory
- Department of Biochemistry, Stanford University, Stanford, CA 94305, USA
| | | | - Po-Ssu Huang
- Department of Bioengineering, Stanford University, Stanford, CA 94305, USA
| | | | - Rhiju Das
- Eterna Massive Open Laboratory
- Department of Biochemistry, Stanford University, Stanford, CA 94305, USA
- Department of Physics, Stanford University, Stanford, CA 94305, USA
| |
Collapse
|
4
|
Minuesa G, Alsina C, Garcia-Martin JA, Oliveros J, Dotu I. MoiRNAiFold: a novel tool for complex in silico RNA design. Nucleic Acids Res 2021; 49:4934-4943. [PMID: 33956139 PMCID: PMC8136780 DOI: 10.1093/nar/gkab331] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2021] [Revised: 04/09/2021] [Accepted: 04/21/2021] [Indexed: 12/23/2022] Open
Abstract
Novel tools for in silico design of RNA constructs such as riboregulators are required in order to reduce time and cost to production for the development of diagnostic and therapeutic advances. Here, we present MoiRNAiFold, a versatile and user-friendly tool for de novo synthetic RNA design. MoiRNAiFold is based on Constraint Programming and it includes novel variable types, heuristics and restart strategies for Large Neighborhood Search. Moreover, this software can handle dozens of design constraints and quality measures and improves features for RNA regulation control of gene expression, such as Translation Efficiency calculation. We demonstrate that MoiRNAiFold outperforms any previous software in benchmarking structural RNA puzzles from EteRNA. Importantly, with regard to biologically relevant RNA designs, we focus on RNA riboregulators, demonstrating that the designed RNA sequences are functional both in vitro and in vivo. Overall, we have generated a powerful tool for de novo complex RNA design that we make freely available as a web server (https://moiraibiodesign.com/design/).
Collapse
Affiliation(s)
- Gerard Minuesa
- Moirai Biodesign, c/ Baldiri Reixach s/n, Parc Científic de Barcelona (PCB), 08028 Barcelona, Spain
| | - Cristina Alsina
- Moirai Biodesign, c/ Baldiri Reixach s/n, Parc Científic de Barcelona (PCB), 08028 Barcelona, Spain
| | - Juan Antonio Garcia-Martin
- Bioinformatics for Genomics and Proteomics. National Centre for Biotechnology (CNB-CSIC). c/ Darwin 3, 28049 Madrid, Spain
- Grupo Interdisciplinar de Sistemas Complejos (GISC), Universidad Carlos III de Madrid, 28911 Madrid, Spain
| | - Juan Carlos Oliveros
- Bioinformatics for Genomics and Proteomics. National Centre for Biotechnology (CNB-CSIC). c/ Darwin 3, 28049 Madrid, Spain
| | - Ivan Dotu
- Moirai Biodesign, c/ Baldiri Reixach s/n, Parc Científic de Barcelona (PCB), 08028 Barcelona, Spain
| |
Collapse
|
5
|
Inverse RNA Folding Workflow to Design and Test Ribozymes that Include Pseudoknots. Methods Mol Biol 2021; 2167:113-143. [PMID: 32712918 DOI: 10.1007/978-1-0716-0716-9_8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
Abstract
Ribozymes are RNAs that catalyze reactions. They occur in nature, and can also be evolved in vitro to catalyze novel reactions. This chapter provides detailed protocols for using inverse folding software to design a ribozyme sequence that will fold to a known ribozyme secondary structure and for testing the catalytic activity of the sequence experimentally. This protocol is able to design sequences that include pseudoknots, which is important as all naturally occurring full-length ribozymes have pseudoknots. The starting point is the known pseudoknot-containing secondary structure of the ribozyme and knowledge of any nucleotides whose identity is required for function. The output of the protocol is a set of sequences that have been tested for function. Using this protocol, we were previously successful at designing highly active double-pseudoknotted HDV ribozymes.
Collapse
|
6
|
Retwitzer MD, Reinharz V, Churkin A, Ponty Y, Waldispühl J, Barash D. incaRNAfbinv 2.0: a webserver and software with motif control for fragment-based design of RNAs. Bioinformatics 2020; 36:2920-2922. [PMID: 31971575 DOI: 10.1093/bioinformatics/btaa039] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2019] [Revised: 11/25/2019] [Accepted: 01/15/2020] [Indexed: 11/12/2022] Open
Abstract
SUMMARY RNA design has conceptually evolved from the inverse RNA folding problem. In the classical inverse RNA problem, the user inputs an RNA secondary structure and receives an output RNA sequence that folds into it. Although modern RNA design methods are based on the same principle, a finer control over the resulting sequences is sought. As an important example, a substantial number of non-coding RNA families show high preservation in specific regions, while being more flexible in others and this information should be utilized in the design. By using the additional information, RNA design tools can help solve problems of practical interest in the growing fields of synthetic biology and nanotechnology. incaRNAfbinv 2.0 utilizes a fragment-based approach, enabling a control of specific RNA secondary structure motifs. The new version allows significantly more control over the general RNA shape, and also allows to express specific restrictions over each motif separately, in addition to other advanced features. AVAILABILITY AND IMPLEMENTATION incaRNAfbinv 2.0 is available through a standalone package and a web-server at https://www.cs.bgu.ac.il/incaRNAfbinv. Source code, command-line and GUI wrappers can be found at https://github.com/matandro/RNAsfbinv. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Matan Drory Retwitzer
- Department of Computer Science, Ben Gurion University of the Negev, Beer Sheva 84105, Israel
| | - Vladimir Reinharz
- Department of Computer Science, Université du Québec à Montréal, Montreal, H2X 3Y7, Canada.,Institute for Basic Science, Daejeon 34126, South Korea
| | - Alexander Churkin
- Software Engineering Department, Sami Shamoon College of Engineering, Beer-Sheva 84100, Israel
| | - Yann Ponty
- Laboratoire d'Informatique de l'École Polytechnique (LIX CNRS UMR 7161), Ecole Polytechnique, Palaiseau 91120, France
| | - Jérôme Waldispühl
- School of Computer Science, McGill University Montréal H3A 0E9, Canada
| | - Danny Barash
- Department of Computer Science, Ben Gurion University of the Negev, Beer Sheva 84105, Israel
| |
Collapse
|
7
|
Yamagami R, Kayedkhordeh M, Mathews DH, Bevilacqua PC. Design of highly active double-pseudoknotted ribozymes: a combined computational and experimental study. Nucleic Acids Res 2019; 47:29-42. [PMID: 30462314 PMCID: PMC6326823 DOI: 10.1093/nar/gky1118] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2018] [Accepted: 10/24/2018] [Indexed: 01/02/2023] Open
Abstract
Design of RNA sequences that adopt functional folds establishes principles of RNA folding and applications in biotechnology. Inverse folding for RNAs, which allows computational design of sequences that adopt specific structures, can be utilized for unveiling RNA functions and developing genetic tools in synthetic biology. Although many algorithms for inverse RNA folding have been developed, the pseudoknot, which plays a key role in folding of ribozymes and riboswitches, is not addressed in most algorithms. For the few algorithms that attempt to predict pseudoknot-containing ribozymes, self-cleavage activity has not been tested. Herein, we design double-pseudoknot HDV ribozymes using an inverse RNA folding algorithm and test their kinetic mechanisms experimentally. More than 90% of the positively designed ribozymes possess self-cleaving activity, whereas more than 70% of negative control ribozymes, which are predicted to fold to the necessary structure but with low fidelity, do not possess it. Kinetic and mutation analyses reveal that these RNAs cleave site-specifically and with the same mechanism as the WT ribozyme. Most ribozymes react just 50- to 80-fold slower than the WT ribozyme, and this rate can be improved to near WT by modification of a junction. Thus, fast-cleaving functional ribozymes with multiple pseudoknots can be designed computationally.
Collapse
Affiliation(s)
- Ryota Yamagami
- Department of Chemistry, Pennsylvania State University, University Park, PA 16802, USA.,Center for RNA Molecular Biology, Pennsylvania State University, University Park, PA 16802, USA
| | - Mohammad Kayedkhordeh
- Department of Biochemistry & Biophysics and Center for RNA Biology, University of Rochester Medical Center, Rochester, New York, NY 14642, USA
| | - David H Mathews
- Department of Biochemistry & Biophysics and Center for RNA Biology, University of Rochester Medical Center, Rochester, New York, NY 14642, USA.,Department of Biostatistics & Computational Biology, University of Rochester Medical Center, Rochester, New York, NY 14642, USA
| | - Philip C Bevilacqua
- Department of Chemistry, Pennsylvania State University, University Park, PA 16802, USA.,Center for RNA Molecular Biology, Pennsylvania State University, University Park, PA 16802, USA.,Department of Biochemistry and Molecular Biology, Pennsylvania State University, University Park, PA 16802, USA
| |
Collapse
|
8
|
Evolving methods for rational de novo design of functional RNA molecules. Methods 2019; 161:54-63. [PMID: 31059832 DOI: 10.1016/j.ymeth.2019.04.022] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2019] [Revised: 04/26/2019] [Accepted: 04/29/2019] [Indexed: 12/16/2022] Open
Abstract
Artificial RNA molecules with novel functionality have many applications in synthetic biology, pharmacy and white biotechnology. The de novo design of such devices using computational methods and prediction tools is a resource-efficient alternative to experimental screening and selection pipelines. In this review, we describe methods common to many such computational approaches, thoroughly dissect these methods and highlight open questions for the individual steps. Initially, it is essential to investigate the biological target system, the regulatory mechanism that will be exploited, as well as the desired components in order to define design objectives. Subsequent computational design is needed to combine the selected components and to obtain novel functionality. This process can usually be split into constrained sequence sampling, the formulation of an optimization problem and an in silico analysis to narrow down the number of candidates with respect to secondary goals. Finally, experimental analysis is important to check whether the defined design objectives are indeed met in the target environment and detailed characterization experiments should be performed to improve the mechanistic models and detect missing design requirements.
Collapse
|
9
|
Hammer S, Tschiatschek B, Flamm C, Hofacker IL, Findeiß S. RNAblueprint: flexible multiple target nucleic acid sequence design. Bioinformatics 2018; 33:2850-2858. [PMID: 28449031 PMCID: PMC5870862 DOI: 10.1093/bioinformatics/btx263] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2016] [Accepted: 04/21/2017] [Indexed: 01/06/2023] Open
Abstract
Motivation Realizing the value of synthetic biology in biotechnology and medicine requires the design of molecules with specialized functions. Due to its close structure to function relationship, and the availability of good structure prediction methods and energy models, RNA is perfectly suited to be synthetically engineered with predefined properties. However, currently available RNA design tools cannot be easily adapted to accommodate new design specifications. Furthermore, complicated sampling and optimization methods are often developed to suit a specific RNA design goal, adding to their inflexibility. Results We developed a C ++ library implementing a graph coloring approach to stochastically sample sequences compatible with structural and sequence constraints from the typically very large solution space. The approach allows to specify and explore the solution space in a well defined way. Our library also guarantees uniform sampling, which makes optimization runs performant by not only avoiding re-evaluation of already found solutions, but also by raising the probability of finding better solutions for long optimization runs. We show that our software can be combined with any other software package to allow diverse RNA design applications. Scripting interfaces allow the easy adaption of existing code to accommodate new scenarios, making the whole design process very flexible. We implemented example design approaches written in Python to demonstrate these advantages. Availability and implementation RNAblueprint, Python implementations and benchmark datasets are available at github: https://github.com/ViennaRNA. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Stefan Hammer
- Faculty of Chemistry, Department of Theoretical Chemistry.,Faculty of Computer Science, Research Group Bioinformatics and Computational Biology
| | - Birgit Tschiatschek
- Faculty of Computer Science, Research Group Bioinformatics and Computational Biology
| | - Christoph Flamm
- Faculty of Chemistry, Department of Theoretical Chemistry.,Research Network Chemistry Meets Microbiology, University of Vienna, 1090 Vienna, Austria
| | - Ivo L Hofacker
- Faculty of Chemistry, Department of Theoretical Chemistry.,Faculty of Computer Science, Research Group Bioinformatics and Computational Biology.,Center for Non-Coding RNA in Technology and Health, University of Copenhagen, Copenhagen DK-1870, Denmark
| | - Sven Findeiß
- Faculty of Chemistry, Department of Theoretical Chemistry.,Faculty of Computer Science, Research Group Bioinformatics and Computational Biology
| |
Collapse
|
10
|
Dotu I, Adamson SI, Coleman B, Fournier C, Ricart-Altimiras E, Eyras E, Chuang JH. SARNAclust: Semi-automatic detection of RNA protein binding motifs from immunoprecipitation data. PLoS Comput Biol 2018; 14:e1006078. [PMID: 29596423 PMCID: PMC5892938 DOI: 10.1371/journal.pcbi.1006078] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2017] [Revised: 04/10/2018] [Accepted: 03/05/2018] [Indexed: 12/02/2022] Open
Abstract
RNA-protein binding is critical to gene regulation, controlling fundamental processes including splicing, translation, localization and stability, and aberrant RNA-protein interactions are known to play a role in a wide variety of diseases. However, molecular understanding of RNA-protein interactions remains limited; in particular, identification of RNA motifs that bind proteins has long been challenging, especially when such motifs depend on both sequence and structure. Moreover, although RNA binding proteins (RBPs) often contain more than one binding domain, algorithms capable of identifying more than one binding motif simultaneously have not been developed. In this paper we present a novel pipeline to determine binding peaks in crosslinking immunoprecipitation (CLIP) data, to discover multiple possible RNA sequence/structure motifs among them, and to experimentally validate such motifs. At the core is a new semi-automatic algorithm SARNAclust, the first unsupervised method to identify and deconvolve multiple sequence/structure motifs simultaneously. SARNAclust computes similarity between sequence/structure objects using a graph kernel, providing the ability to isolate the impact of specific features through the bulge graph formalism. Application of SARNAclust to synthetic data shows its capability of clustering 5 motifs at once with a V-measure value of over 0.95, while GraphClust achieves only a V-measure of 0.083 and RNAcontext cannot detect any of the motifs. When applied to existing eCLIP sets, SARNAclust finds known motifs for SLBP and HNRNPC and novel motifs for several other RBPs such as AGGF1, AKAP8L and ILF3. We demonstrate an experimental validation protocol, a targeted Bind-n-Seq-like high-throughput sequencing approach that relies on RNA inverse folding for oligo pool design, that can validate the components within the SLBP motif. Finally, we use this protocol to experimentally interrogate the SARNAclust motif predictions for protein ILF3. Our results support a newly identified partially double-stranded UUUUUGAGA motif similar to that known for the splicing factor HNRNPC. RNA-protein binding is critical to gene regulation, and aberrant RNA-protein interactions play a role in a wide variety of diseases. However, molecular understanding of these interactions remains limited because of the difficulty of ascertaining the motifs that bind each protein. To address this challenge, we have developed a novel algorithm, SARNAclust, to computationally identify combined structure/sequence motifs from immunoprecipitation data. SARNAclust can deconvolve multiple motifs simultaneously and determine the importance of specific features through a graph kernel and bulge graph formalism. We have verified SARNAclust to be effective on synthetic motif data and also tested it on ENCODE eCLIP datasets, identifying known motifs and novel predictions. We have experimentally validated SARNAclust for two proteins, SLBP and ILF3, using RNA Bind-n-Seq measurements. Applying SARNAclust to ENCODE data provides new evidence for previously unknown regulatory interactions, notably splicing co-regulation by ILF3 and the splicing factor hnRNPC.
Collapse
Affiliation(s)
- Ivan Dotu
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, United States of America
- Research Programme on Biomedical Informatics (GRIB), Hospital del Mar Medical Research Institute (IMIM)–Pompeu Fabra University (UPF), Barcelona, Spain
| | - Scott I. Adamson
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, United States of America
- UCONN Health, Department of Genetics and Genome Sciences, Farmington, CT, United States of America
| | - Benjamin Coleman
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, United States of America
| | - Cyril Fournier
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, United States of America
| | - Emma Ricart-Altimiras
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, United States of America
- Research Programme on Biomedical Informatics (GRIB), Hospital del Mar Medical Research Institute (IMIM)–Pompeu Fabra University (UPF), Barcelona, Spain
| | - Eduardo Eyras
- Research Programme on Biomedical Informatics (GRIB), Hospital del Mar Medical Research Institute (IMIM)–Pompeu Fabra University (UPF), Barcelona, Spain
- Catalan Institution for Research and Advanced Studies (ICREA), Barcelona, Spain
| | - Jeffrey H. Chuang
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, United States of America
- UCONN Health, Department of Genetics and Genome Sciences, Farmington, CT, United States of America
- * E-mail:
| |
Collapse
|
11
|
Churkin A, Retwitzer MD, Reinharz V, Ponty Y, Waldispühl J, Barash D. Design of RNAs: comparing programs for inverse RNA folding. Brief Bioinform 2018; 19:350-358. [PMID: 28049135 PMCID: PMC6018860 DOI: 10.1093/bib/bbw120] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022] Open
Abstract
Computational programs for predicting RNA sequences with desired folding properties have been extensively developed and expanded in the past several years. Given a secondary structure, these programs aim to predict sequences that fold into a target minimum free energy secondary structure, while considering various constraints. This procedure is called inverse RNA folding. Inverse RNA folding has been traditionally used to design optimized RNAs with favorable properties, an application that is expected to grow considerably in the future in light of advances in the expanding new fields of synthetic biology and RNA nanostructures. Moreover, it was recently demonstrated that inverse RNA folding can successfully be used as a valuable preprocessing step in computational detection of novel noncoding RNAs. This review describes the most popular freeware programs that have been developed for such purposes, starting from RNAinverse that was devised when formulating the inverse RNA folding problem. The most recently published ones that consider RNA secondary structure as input are antaRNA, RNAiFold and incaRNAfbinv, each having different features that could be beneficial to specific biological problems in practice. The various programs also use distinct approaches, ranging from ant colony optimization to constraint programming, in addition to adaptive walk, simulated annealing and Boltzmann sampling. This review compares between the various programs and provides a simple description of the various possibilities that would benefit practitioners in selecting the most suitable program. It is geared for specific tasks requiring RNA design based on input secondary structure, with an outlook toward the future of RNA design programs.
Collapse
Affiliation(s)
- Alexander Churkin
- Shamoon College of Engineering and Physics Department at Ben-Gurion University, Beer-Sheva, Israel
| | | | - Vladimir Reinharz
- Department of Computer Science, Ben-Gurion University, Beer-Sheva, Israel
- School of Computer Science, McGill University, Montréal QC, Canada
| | - Yann Ponty
- Laboratoire d’informatique, École Polytechnique, Palaiseau, France
| | | | - Danny Barash
- Department of Computer Science, Ben-Gurion University, Beer-Sheva, Israel
| |
Collapse
|
12
|
Identification and functional characterization of bacterial small non-coding RNAs and their target: A review. GENE REPORTS 2018. [DOI: 10.1016/j.genrep.2018.01.001] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
|
13
|
Bayegan AH, Garcia-Martin JA, Clote P. New tools to analyze overlapping coding regions. BMC Bioinformatics 2016; 17:530. [PMID: 27964762 PMCID: PMC5155393 DOI: 10.1186/s12859-016-1389-7] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2016] [Accepted: 11/26/2016] [Indexed: 11/10/2022] Open
Abstract
Background Retroviruses transcribe messenger RNA for the overlapping Gag and Gag-Pol polyproteins, by using a programmed -1 ribosomal frameshift which requires a slippery sequence and an immediate downstream stem-loop secondary structure, together called frameshift stimulating signal (FSS). It follows that the molecular evolution of this genomic region of HIV-1 is highly constrained, since the retroviral genome must contain a slippery sequence (sequence constraint), code appropriate peptides in reading frames 0 and 1 (coding requirements), and form a thermodynamically stable stem-loop secondary structure (structure requirement). Results We describe a unique computational tool, RNAsampleCDS, designed to compute the number of RNA sequences that code two (or more) peptides p,q in overlapping reading frames, that are identical (or have BLOSUM/PAM similarity that exceeds a user-specified value) to the input peptides p,q. RNAsampleCDS then samples a user-specified number of messenger RNAs that code such peptides; alternatively, RNAsampleCDS can exactly compute the position-specific scoring matrix and codon usage bias for all such RNA sequences. Our software allows the user to stipulate overlapping coding requirements for all 6 possible reading frames simultaneously, even allowing IUPAC constraints on RNA sequences and fixing GC-content. We generalize the notion of codon preference index (CPI) to overlapping reading frames, and use RNAsampleCDS to generate control sequences required in the computation of CPI. Moreover, by applying RNAsampleCDS, we are able to quantify the extent to which the overlapping coding requirement in HIV-1 [resp. HCV] contribute to the formation of the stem-loop [resp. double stem-loop] secondary structure known as the frameshift stimulating signal. Using our software, we confirm that certain experimentally determined deleterious HCV mutations occur in positions for which our software RNAsampleCDS and RNAiFold both indicate a single possible nucleotide. We generalize the notion of codon preference index (CPI) to overlapping coding regions, and use RNAsampleCDS to generate control sequences required in the computation of CPI for the Gag-Pol overlapping coding region of HIV-1. These applications show that RNAsampleCDS constitutes a unique tool in the software arsenal now available to evolutionary biologists. Conclusion Source code for the programs and additional data are available at http://bioinformatics.bc.edu/clotelab/RNAsampleCDS/. Electronic supplementary material The online version of this article (doi:10.1186/s12859-016-1389-7) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Amir H Bayegan
- Biology Department, Boston College, 140 Commonwealth Avenue, Chestnut Hill MA, 02467, USA
| | | | - Peter Clote
- Biology Department, Boston College, 140 Commonwealth Avenue, Chestnut Hill MA, 02467, USA.
| |
Collapse
|
14
|
Garcia-Martin JA, Bayegan AH, Dotu I, Clote P. RNAdualPF: software to compute the dual partition function with sample applications in molecular evolution theory. BMC Bioinformatics 2016; 17:424. [PMID: 27756204 PMCID: PMC5069997 DOI: 10.1186/s12859-016-1280-6] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2016] [Accepted: 09/26/2016] [Indexed: 12/01/2022] Open
Abstract
Background RNA inverse folding is the problem of finding one or more sequences that fold into a user-specified target structure s0, i.e. whose minimum free energy secondary structure is identical to the target s0. Here we consider the ensemble of all RNA sequences that have low free energy with respect to a given target s0. Results We introduce the program RNAdualPF, which computes the dual partition functionZ∗, defined as the sum of Boltzmann factors exp(−E(a,s0)/RT) of all RNA nucleotide sequences a compatible with target structure s0. Using RNAdualPF, we efficiently sample RNA sequences that approximately fold into s0, where additionally the user can specify IUPAC sequence constraints at certain positions, and whether to include dangles (energy terms for stacked, single-stranded nucleotides). Moreover, since we also compute the dual partition functionZ∗(k) over all sequences having GC-content k, the user can require that all sampled sequences have a precise, specified GC-content. Using Z∗, we compute the dual expected energy 〈E∗〉, and use it to show that natural RNAs from the Rfam 12.0 database have higher minimum free energy than expected, thus suggesting that functional RNAs are under evolutionary pressure to be only marginally thermodynamically stable. We show that C. elegans precursor microRNA (pre-miRNA) is significantly non-robust with respect to mutations, by comparing the robustness of each wild type pre-miRNA sequence with 2000 [resp. 500] sequences of the same GC-content generated by RNAdualPF, which approximately [resp. exactly] fold into the wild type target structure. We confirm and strengthen earlier findings that precursor microRNAs and bacterial small noncoding RNAs display plasticity, a measure of structural diversity. Conclusion We describe RNAdualPF, which rapidly computes the dual partition functionZ∗ and samples sequences having low energy with respect to a target structure, allowing sequence constraints and specified GC-content. Using different inverse folding software, another group had earlier shown that pre-miRNA is mutationally robust, even controlling for compositional bias. Our opposite conclusion suggests a cautionary note that computationally based insights into molecular evolution may heavily depend on the software used. C/C++-software for RNAdualPF is available at http://bioinformatics.bc.edu/clotelab/RNAdualPF. Electronic supplementary material The online version of this article (doi:10.1186/s12859-016-1280-6) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Juan Antonio Garcia-Martin
- Biology Department, Boston College, 140 Commonwealth Avenue, Chestnut Hill, 02467, MA, USA.,Present Address: Systems Biology Program Centro Nacional de Biotecnología Consejo Superior de Investigaciones Científicas (CSIC) C/ Darwin 3, Madrid, 28049, Spain
| | - Amir H Bayegan
- Biology Department, Boston College, 140 Commonwealth Avenue, Chestnut Hill, 02467, MA, USA
| | - Ivan Dotu
- Research Programme on Biomedical Informatics (GRIB), Department of Experimental and Health Sciences, Universitat Pompeu Fabra, IMIM (Hospital del Mar Medical Research Institute), Dr. Aiguader, 88, Barcelona, Spain
| | - Peter Clote
- Biology Department, Boston College, 140 Commonwealth Avenue, Chestnut Hill, 02467, MA, USA.
| |
Collapse
|
15
|
Garcia-Martin JA, Dotu I, Fernandez-Chamorro J, Lozano G, Ramajo J, Martinez-Salas E, Clote P. RNAiFold2T: Constraint Programming design of thermo-IRES switches. Bioinformatics 2016; 32:i360-i368. [PMID: 27307638 PMCID: PMC4908349 DOI: 10.1093/bioinformatics/btw265] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/23/2023] Open
Abstract
MOTIVATION RNA thermometers (RNATs) are cis-regulatory elements that change secondary structure upon temperature shift. Often involved in the regulation of heat shock, cold shock and virulence genes, RNATs constitute an interesting potential resource in synthetic biology, where engineered RNATs could prove to be useful tools in biosensors and conditional gene regulation. RESULTS Solving the 2-temperature inverse folding problem is critical for RNAT engineering. Here we introduce RNAiFold2T, the first Constraint Programming (CP) and Large Neighborhood Search (LNS) algorithms to solve this problem. Benchmarking tests of RNAiFold2T against existent programs (adaptive walk and genetic algorithm) inverse folding show that our software generates two orders of magnitude more solutions, thus allowing ample exploration of the space of solutions. Subsequently, solutions can be prioritized by computing various measures, including probability of target structure in the ensemble, melting temperature, etc. Using this strategy, we rationally designed two thermosensor internal ribosome entry site (thermo-IRES) elements, whose normalized cap-independent translation efficiency is approximately 50% greater at 42 °C than 30 °C, when tested in reticulocyte lysates. Translation efficiency is lower than that of the wild-type IRES element, which on the other hand is fully resistant to temperature shift-up. This appears to be the first purely computational design of functional RNA thermoswitches, and certainly the first purely computational design of functional thermo-IRES elements. AVAILABILITY RNAiFold2T is publicly available as part of the new release RNAiFold3.0 at https://github.com/clotelab/RNAiFold and http://bioinformatics.bc.edu/clotelab/RNAiFold, which latter has a web server as well. The software is written in C ++ and uses OR-Tools CP search engine. CONTACT clote@bc.edu SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | - Ivan Dotu
- Department of Experimental and Health Sciences, Research Programme on Biomedical Informatics (GRIB), Universitat Pompeu Fabra. Dr. Aiguader 88, Barcelona, Spain
| | - Javier Fernandez-Chamorro
- Centro de Biologia Molecular Severo Ochoa, Consejo Superior de Investigaciones Cientificas-Universidad Autonoma de Madrid, 28049 Madrid, Spain
| | - Gloria Lozano
- Centro de Biologia Molecular Severo Ochoa, Consejo Superior de Investigaciones Cientificas-Universidad Autonoma de Madrid, 28049 Madrid, Spain
| | - Jorge Ramajo
- Centro de Biologia Molecular Severo Ochoa, Consejo Superior de Investigaciones Cientificas-Universidad Autonoma de Madrid, 28049 Madrid, Spain
| | - Encarnacion Martinez-Salas
- Centro de Biologia Molecular Severo Ochoa, Consejo Superior de Investigaciones Cientificas-Universidad Autonoma de Madrid, 28049 Madrid, Spain
| | - Peter Clote
- Biology Department, Boston College, Chestnut Hill, MA 02467, USA
| |
Collapse
|
16
|
Drory Retwitzer M, Reinharz V, Ponty Y, Waldispühl J, Barash D. incaRNAfbinv: a web server for the fragment-based design of RNA sequences. Nucleic Acids Res 2016; 44:W308-14. [PMID: 27185893 PMCID: PMC5741205 DOI: 10.1093/nar/gkw440] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2016] [Accepted: 05/06/2016] [Indexed: 01/02/2023] Open
Abstract
In recent years, new methods for computational RNA design have been developed and applied to various problems in synthetic biology and nanotechnology. Lately, there is considerable interest in incorporating essential biological information when solving the inverse RNA folding problem. Correspondingly, RNAfbinv aims at including biologically meaningful constraints and is the only program to-date that performs a fragment-based design of RNA sequences. In doing so it allows the design of sequences that do not necessarily exactly fold into the target, as long as the overall coarse-grained tree graph shape is preserved. Augmented by the weighted sampling algorithm of incaRNAtion, our web server called incaRNAfbinv implements the method devised in RNAfbinv and offers an interactive environment for the inverse folding of RNA using a fragment-based design approach. It takes as input: a target RNA secondary structure; optional sequence and motif constraints; optional target minimum free energy, neutrality and GC content. In addition to the design of synthetic regulatory sequences, it can be used as a pre-processing step for the detection of novel natural occurring RNAs. The two complementary methodologies RNAfbinv and incaRNAtion are merged together and fully implemented in our web server incaRNAfbinv, available at http://www.cs.bgu.ac.il/incaRNAfbinv.
Collapse
Affiliation(s)
| | - Vladimir Reinharz
- School of Computer Science & McGill Centre for Bioinformatics, McGill University, Montréal, QC H3A 0E9, Canada
| | - Yann Ponty
- Laboratoire d'Informatique (LIX)-CNRS UMR 7161, École Polytechnique, 91128 Palaiseau, France AMIB team/project, INRIA Saclay, Bâtiment Alan Turing, 91128 Palaiseau, France
| | - Jérôme Waldispühl
- School of Computer Science & McGill Centre for Bioinformatics, McGill University, Montréal, QC H3A 0E9, Canada
| | - Danny Barash
- Department of Computer Science, Ben-Gurion University, Beer-Sheva 84105, Israel
| |
Collapse
|
17
|
Designing synthetic RNAs to determine the relevance of structural motifs in picornavirus IRES elements. Sci Rep 2016; 6:24243. [PMID: 27053355 PMCID: PMC4823658 DOI: 10.1038/srep24243] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2015] [Accepted: 03/23/2016] [Indexed: 12/22/2022] Open
Abstract
The function of Internal Ribosome Entry Site (IRES) elements is intimately linked to their RNA structure. Viral IRES elements are organized in modular domains consisting of one or more stem-loops that harbor conserved RNA motifs critical for internal initiation of translation. A conserved motif is the pyrimidine-tract located upstream of the functional initiation codon in type I and II picornavirus IRES. By computationally designing synthetic RNAs to fold into a structure that sequesters the polypyrimidine tract in a hairpin, we establish a correlation between predicted inaccessibility of the pyrimidine tract and IRES activity, as determined in both in vitro and in vivo systems. Our data supports the hypothesis that structural sequestration of the pyrimidine-tract within a stable hairpin inactivates IRES activity, since the stronger the stability of the hairpin the higher the inhibition of protein synthesis. Destabilization of the stem-loop immediately upstream of the pyrimidine-tract also decreases IRES activity. Our work introduces a hybrid computational/experimental method to determine the importance of structural motifs for biological function. Specifically, we show the feasibility of using the software RNAiFold to design synthetic RNAs with particular sequence and structural motifs that permit subsequent experimental determination of the importance of such motifs for biological function.
Collapse
|
18
|
Lozano G, Fernandez N, Martinez-Salas E. Modeling Three-Dimensional Structural Motifs of Viral IRES. J Mol Biol 2016; 428:767-776. [PMID: 26778619 DOI: 10.1016/j.jmb.2016.01.005] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2015] [Revised: 01/08/2016] [Accepted: 01/08/2016] [Indexed: 01/23/2023]
Abstract
RNA virus genomes are reservoirs of a wide diversity of RNA structural elements. In particular, specific regions of the viral genome have evolved to adopt specialized three-dimensional (3D) structures, which can act in concert with host factors and/or viral proteins to recruit the translation machinery on viral RNA using a mechanism that is independent on the 5' end. This strategy relies on cis-acting RNA sequences designated as internal ribosome entry site (IRES) elements. IRES elements that are found in the genome of different groups of RNA viruses perform the same function despite differing in primary sequence and secondary RNA structure and host factor requirement to recruit the translation machinery internally. Evolutionarily conserved motifs tend to preserve sequences in each group of RNA viruses impacting on RNA structure and RNA-protein interactions important for IRES function. However, due to the lack of sequence homology among genetically distant IRES elements, accurate modeling of 3D IRES structure is currently a challenging task. In addition, as a universal RNA motif unique to IRES elements has not been found, a better understanding of viral IRES structural motifs could greatly assist in the detection of IRES-like motifs hidden in genome sequences. The focus of this review is to describe recent advances in modeling viral IRES tertiary structural motifs and also novel approaches to detect sequences potentially folding as IRES-like motifs.
Collapse
Affiliation(s)
- Gloria Lozano
- Centro de Biología Molecular Severo Ochoa, Consejo Superior de Investigaciones Científicas Universidad Autónoma de Madrid, Nicolas Cabrera 1, 28049 Madrid, Spain
| | - Noemi Fernandez
- Centro de Biología Molecular Severo Ochoa, Consejo Superior de Investigaciones Científicas Universidad Autónoma de Madrid, Nicolas Cabrera 1, 28049 Madrid, Spain
| | - Encarnacion Martinez-Salas
- Centro de Biología Molecular Severo Ochoa, Consejo Superior de Investigaciones Científicas Universidad Autónoma de Madrid, Nicolas Cabrera 1, 28049 Madrid, Spain.
| |
Collapse
|