1
|
Taylor AD, Hathaway QA, Kunovac A, Pinti MV, Newman MS, Cook CC, Cramer ER, Starcovic SA, Winters MT, Westemeier-Rice ES, Fink GK, Durr AJ, Rizwan S, Shepherd DL, Robart AR, Martinez I, Hollander JM. Mitochondrial sequencing identifies long noncoding RNA features that promote binding to PNPase. Am J Physiol Cell Physiol 2024; 327:C221-C236. [PMID: 38826135 PMCID: PMC11427107 DOI: 10.1152/ajpcell.00648.2023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2023] [Revised: 05/24/2024] [Accepted: 05/24/2024] [Indexed: 06/04/2024]
Abstract
Extranuclear localization of long noncoding RNAs (lncRNAs) is poorly understood. Based on machine learning evaluations, we propose a lncRNA-mitochondrial interaction pathway where polynucleotide phosphorylase (PNPase), through domains that provide specificity for primary sequence and secondary structure, binds nuclear-encoded lncRNAs to facilitate mitochondrial import. Using FVB/NJ mouse and human cardiac tissues, RNA from isolated subcellular compartments (cytoplasmic and mitochondrial) and cross-linked immunoprecipitate (CLIP) with PNPase within the mitochondrion were sequenced on the Illumina HiSeq and MiSeq, respectively. lncRNA sequence and structure were evaluated through supervised [classification and regression trees (CART) and support vector machines (SVM)] machine learning algorithms. In HL-1 cells, quantitative PCR of PNPase CLIP knockout mutants (KH and S1) was performed. In vitro fluorescence assays assessed PNPase RNA binding capacity and verified with PNPase CLIP. One hundred twelve (mouse) and 1,548 (human) lncRNAs were identified in the mitochondrion with Malat1 being the most abundant. Most noncoding RNAs binding PNPase were lncRNAs, including Malat1. lncRNA fragments bound to PNPase compared against randomly generated sequences of similar length showed stratification with SVM and CART algorithms. The lncRNAs bound to PNPase were used to create a criterion for binding, with experimental validation revealing increased binding affinity of RNA designed to bind PNPase compared to control RNA. The binding of lncRNAs to PNPase was decreased through the knockout of RNA binding domains KH and S1. In conclusion, sequence and secondary structural features identified by machine learning enhance the likelihood of nuclear-encoded lncRNAs binding to PNPase and undergoing import into the mitochondrion.NEW & NOTEWORTHY Long noncoding RNAs (lncRNAs) are relatively novel RNAs with increasingly prominent roles in regulating genetic expression, mainly in the nucleus but more recently in regions such as the mitochondrion. This study explores how lncRNAs interact with polynucleotide phosphorylase (PNPase), a protein that regulates RNA import into the mitochondrion. Machine learning identified several RNA structural features that improved lncRNA binding to PNPase, which may be useful in targeting RNA therapeutics to the mitochondrion.
Collapse
Affiliation(s)
- Andrew D Taylor
- Division of Exercise Physiology, West Virginia University School of Medicine, Morgantown, West Virginia, United States
- Mitochondria, Metabolism, and Bioenergetics Working Group, West Virginia University School of Medicine, Morgantown, West Virginia, United States
| | - Quincy A Hathaway
- Division of Exercise Physiology, West Virginia University School of Medicine, Morgantown, West Virginia, United States
- Heart and Vascular Institute, West Virginia University, Morgantown, West Virginia, United States
- Department of Medical Education, West Virginia University School of Medicine, Morgantown, West Virginia, United States
| | - Amina Kunovac
- Division of Exercise Physiology, West Virginia University School of Medicine, Morgantown, West Virginia, United States
- Mitochondria, Metabolism, and Bioenergetics Working Group, West Virginia University School of Medicine, Morgantown, West Virginia, United States
| | - Mark V Pinti
- Mitochondria, Metabolism, and Bioenergetics Working Group, West Virginia University School of Medicine, Morgantown, West Virginia, United States
- West Virginia University School of Pharmacy, Morgantown, West Virginia, United States
| | - Mackenzie S Newman
- Department of Physiology and Pharmacology, West Virginia University School of Medicine, Morgantown, West Virginia, United States
| | - Chris C Cook
- Cardiovascular and Thoracic Surgery, West Virginia University School of Medicine, Morgantown, West Virginia, United States
| | - Evan R Cramer
- Department of Biochemistry, West Virginia University School of Medicine, Morgantown, West Virginia, United States
| | - Sarah A Starcovic
- Department of Biochemistry, West Virginia University School of Medicine, Morgantown, West Virginia, United States
| | - Michael T Winters
- Department of Microbiology, Immunology, and Cell Biology, West Virginia University Cancer Institute, School of Medicine, Morgantown, West Virginia, United States
| | - Emily S Westemeier-Rice
- Department of Microbiology, Immunology, and Cell Biology, West Virginia University Cancer Institute, School of Medicine, Morgantown, West Virginia, United States
| | - Garrett K Fink
- Division of Exercise Physiology, West Virginia University School of Medicine, Morgantown, West Virginia, United States
| | - Andrya J Durr
- Division of Exercise Physiology, West Virginia University School of Medicine, Morgantown, West Virginia, United States
- Mitochondria, Metabolism, and Bioenergetics Working Group, West Virginia University School of Medicine, Morgantown, West Virginia, United States
| | - Saira Rizwan
- Division of Exercise Physiology, West Virginia University School of Medicine, Morgantown, West Virginia, United States
- Mitochondria, Metabolism, and Bioenergetics Working Group, West Virginia University School of Medicine, Morgantown, West Virginia, United States
| | - Danielle L Shepherd
- Division of Exercise Physiology, West Virginia University School of Medicine, Morgantown, West Virginia, United States
- Mitochondria, Metabolism, and Bioenergetics Working Group, West Virginia University School of Medicine, Morgantown, West Virginia, United States
| | - Aaron R Robart
- Department of Biochemistry, West Virginia University School of Medicine, Morgantown, West Virginia, United States
| | - Ivan Martinez
- Department of Microbiology, Immunology, and Cell Biology, West Virginia University Cancer Institute, School of Medicine, Morgantown, West Virginia, United States
| | - John M Hollander
- Division of Exercise Physiology, West Virginia University School of Medicine, Morgantown, West Virginia, United States
- Mitochondria, Metabolism, and Bioenergetics Working Group, West Virginia University School of Medicine, Morgantown, West Virginia, United States
| |
Collapse
|
2
|
Huang FW, Barrett CL, Reidys CM. The energy-spectrum of bicompatible sequences. Algorithms Mol Biol 2021; 16:7. [PMID: 34074304 PMCID: PMC8167974 DOI: 10.1186/s13015-021-00187-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2020] [Accepted: 05/24/2021] [Indexed: 12/04/2022] Open
Abstract
Background Genotype-phenotype maps provide a meaningful filtration of sequence space and RNA secondary structures are particular such phenotypes. Compatible sequences, which satisfy the base-pairing constraints of a given RNA structure, play an important role in the context of neutral evolution. Sequences that are simultaneously compatible with two given structures (bicompatible sequences), are beacons in phenotypic transitions, induced by erroneously replicating populations of RNA sequences. RNA riboswitches, which are capable of expressing two distinct secondary structures without changing the underlying sequence, are one example of bicompatible sequences in living organisms. Results We present a full loop energy model Boltzmann sampler of bicompatible sequences for pairs of structures. The sequence sampler employs a dynamic programming routine whose time complexity is polynomial when assuming the maximum number of exposed vertices, \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\kappa $$\end{document}κ, is a constant. The parameter \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\kappa $$\end{document}κ depends on the two structures and can be very large. We introduce a novel topological framework encapsulating the relations between loops that sheds light on the understanding of \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\kappa $$\end{document}κ. Based on this framework, we give an algorithm to sample sequences with minimum \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\kappa $$\end{document}κ on a particular topologically classified case as well as giving hints to the solution in the other cases. As a result, we utilize our sequence sampler to study some established riboswitches. Conclusion Our analysis of riboswitch sequences shows that a pair of structures needs to satisfy key properties in order to facilitate phenotypic transitions and that pairs of random structures are unlikely to do so. Our analysis observes a distinct signature of riboswitch sequences, suggesting a new criterion for identifying native sequences and sequences subjected to evolutionary pressure. Our free software is available at: https://github.com/FenixHuang667/Bifold.
Collapse
|
3
|
He Q, Huang FW, Barrett C, Reidys CM. Genetic robustness of let-7 miRNA sequence-structure pairs. RNA (NEW YORK, N.Y.) 2019; 25:1592-1603. [PMID: 31548338 PMCID: PMC6859847 DOI: 10.1261/rna.065763.118] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/20/2018] [Accepted: 08/20/2019] [Indexed: 05/13/2023]
Abstract
Genetic robustness, the preservation of evolved phenotypes against genotypic mutations, is one of the central concepts in evolution. In recent years a large body of work has focused on the origins, mechanisms, and consequences of robustness in a wide range of biological systems. In particular, research on ncRNAs studied the ability of sequences to maintain folded structures against single-point mutations. In these studies, the structure is merely a reference. However, recent work revealed evidence that structure itself contributes to the genetic robustness of ncRNAs. We follow this line of thought and consider sequence-structure pairs as the unit of evolution and introduce the spectrum of extended mutational robustness (EMR spectrum) as a measurement of genetic robustness. Our analysis of the miRNA let-7 family captures key features of structure-modulated evolution and facilitates the study of robustness against multiple-point mutations.
Collapse
Affiliation(s)
- Qijun He
- Biocomplexity Institute and Initiative
| | | | | | - Christian M Reidys
- Biocomplexity Institute and Initiative
- Department of Mathematics, University of Virginia, Charlottesville, Virginia 22904, USA
| |
Collapse
|
4
|
Oliver CG, Reinharz V, Waldispühl J. On the emergence of structural complexity in RNA replicators. RNA (NEW YORK, N.Y.) 2019; 25:1579-1591. [PMID: 31467146 PMCID: PMC6859851 DOI: 10.1261/rna.070391.119] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/15/2019] [Accepted: 08/19/2019] [Indexed: 06/10/2023]
Abstract
The RNA world hypothesis relies on the ability of ribonucleic acids to spontaneously acquire complex structures capable of supporting essential biological functions. Multiple sophisticated evolutionary models have been proposed for their emergence, but they often assume specific conditions. In this work, we explore a simple and parsimonious scenario describing the emergence of complex molecular structures at the early stages of life. We show that at specific GC content regimes, an undirected replication model is sufficient to explain the apparition of multibranched RNA secondary structures-a structural signature of many essential ribozymes. We ran a large-scale computational study to map energetically stable structures on complete mutational networks of 50-nt-long RNA sequences. Our results reveal that the sequence landscape with stable structures is enriched with multibranched structures at a length scale coinciding with the appearance of complex structures in RNA databases. A random replication mechanism preserving a 50% GC content may suffice to explain a natural enrichment of stable complex structures in populations of functional RNAs. In contrast, an evolutionary mechanism eliciting the most stable folds at each generation appears to help reaching multibranched structures at highest GC content.
Collapse
Affiliation(s)
- Carlos G Oliver
- School of Computer Science, McGill University, Montreal, QC H3A 2B3, Canada
| | - Vladimir Reinharz
- Center for Soft and Living Matter, Institute for Basic Science, Ulsan 34126, South Korea
| | - Jérôme Waldispühl
- School of Computer Science, McGill University, Montreal, QC H3A 2B3, Canada
| |
Collapse
|
5
|
Hammer S, Wang W, Will S, Ponty Y. Fixed-parameter tractable sampling for RNA design with multiple target structures. BMC Bioinformatics 2019; 20:209. [PMID: 31023239 PMCID: PMC6482512 DOI: 10.1186/s12859-019-2784-7] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2018] [Accepted: 03/28/2019] [Indexed: 01/09/2023] Open
Abstract
Background The design of multi-stable RNA molecules has important applications in biology, medicine, and biotechnology. Synthetic design approaches profit strongly from effective in-silico methods, which substantially reduce the need for costly wet-lab experiments. Results We devise a novel approach to a central ingredient of most in-silico design methods: the generation of sequences that fold well into multiple target structures. Based on constraint networks, our approach supports generic Boltzmann-weighted sampling, which enables the positive design of RNA sequences with specific free energies (for each of multiple, possibly pseudoknotted, target structures) and GC-content. Moreover, we study general properties of our approach empirically and generate biologically relevant multi-target Boltzmann-weighted designs for an established design benchmark. Our results demonstrate the efficacy and feasibility of the method in practice as well as the benefits of Boltzmann sampling over the previously best multi-target sampling strategy—even for the case of negative design of multi-stable RNAs. Besides empirically studies, we finally justify the algorithmic details due to a fundamental theoretic result about multi-stable RNA design, namely the #P-hardness of the counting of designs. Conclusion introduces a novel, flexible, and effective approach to multi-target RNA design, which promises broad applicability and extensibility. Our free software is available at: https://github.com/yannponty/RNARedPrint
Supplementary data are available online. Electronic supplementary material The online version of this article (10.1186/s12859-019-2784-7) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Stefan Hammer
- Dept. Computer Science, and Interdisciplinary Center for Bioinformatics, Univ. Leipzig, Härtelstr. 16-18, Leipzig, D-04107, Germany.,Dept. Theoretical Chemistry, Univ. Vienna, Währingerstr. 17, Wien, A-1090, Austria.,Bioinformatics and Computational Biology Research Group, Univ. Vienna, Währingerstr. 17, Wien, A-1090, Austria
| | - Wei Wang
- CNRS UMR 7161 LIX, Ecole Polytechnique, Bat. Alan Turing, Palaiseau, 91120, France
| | - Sebastian Will
- Dept. Theoretical Chemistry, Univ. Vienna, Währingerstr. 17, Wien, A-1090, Austria. .,Bioinformatics and Computational Biology Research Group, Univ. Vienna, Währingerstr. 17, Wien, A-1090, Austria.
| | - Yann Ponty
- CNRS UMR 7161 LIX, Ecole Polytechnique, Bat. Alan Turing, Palaiseau, 91120, France.
| |
Collapse
|
6
|
Barrett C, He Q, Huang FW, Reidys CM. A Boltzmann Sampler for 1-Pairs with Double Filtration. J Comput Biol 2019; 26:173-192. [PMID: 30653353 DOI: 10.1089/cmb.2018.0095] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
Recently, a framework considering RNA sequences and their RNA secondary structures as pairs led to some information-theoretic perspectives on how the semantics encoded in RNA sequences can be inferred. This pairing arises naturally from the energy model of RNA secondary structures. Fixing the sequence in the pairing produces the RNA energy landscape, whose partition function was discovered by McCaskill. Dually, fixing the structure induces the energy landscape of sequences. The latter has been considered originally for designing more efficient inverse folding algorithms and subsequently enhanced by facilitating the sampling of sequences. We present here a partition function of sequence/structure pairs, with endowed Hamming distance and base pair distance filtration. This partition function is an augmentation of the previous mentioned (dual) partition function. We develop an efficient dynamic programming routine to recursively compute the partition function with this double filtration. Our framework is capable of dealing with RNA secondary structures as well as 1-structures, where a 1-structure is an RNA pseudoknot structure consisting of "building blocks" of genus 0 or 1. In particular, 0-structures, consisting of only "building blocks" of genus 0, are exactly RNA secondary structures. The time complexity for calculating the partition function of 1-pairs, that is, sequence/structure pairs where the structures are 1-structures, is O(h3b3n6), where h, b, n denote the Hamming distance, base pair distance, and sequence length, respectively. The time complexity for the partition function of 0-pairs is O(h2b2n3).
Collapse
Affiliation(s)
- Christopher Barrett
- 1 Biocomplexity Initiative and Institute, University of Virginia, Charlottesville, Virginia.,2 Department of Computer Science, University of Virginia, Charlottesville, Virginia
| | - Qijun He
- 1 Biocomplexity Initiative and Institute, University of Virginia, Charlottesville, Virginia
| | - Fenix W Huang
- 1 Biocomplexity Initiative and Institute, University of Virginia, Charlottesville, Virginia
| | - Christian M Reidys
- 1 Biocomplexity Initiative and Institute, University of Virginia, Charlottesville, Virginia.,3 Department of Mathematics, University of Virginia, Charlottesville, Virginia
| |
Collapse
|
7
|
Barrett C, He Q, Huang FW, Reidys CM. An Efficient Dual Sampling Algorithm with Hamming Distance Filtration. J Comput Biol 2018; 25:1179-1192. [DOI: 10.1089/cmb.2018.0075] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022] Open
Affiliation(s)
- Christopher Barrett
- Biocomplexity Institute of Virginia Tech, Blacksburg, Virginia
- Department of Computer Science, Virginia Tech, Blacksburg, Virginia
| | - Qijun He
- Biocomplexity Institute of Virginia Tech, Blacksburg, Virginia
| | - Fenix W. Huang
- Biocomplexity Institute of Virginia Tech, Blacksburg, Virginia
| | - Christian M. Reidys
- Biocomplexity Institute of Virginia Tech, Blacksburg, Virginia
- Department of Mathematics, Virginia Tech, Blacksburg, Virginia
- Thermo Fisher Scientific Fellow in Advanced Systems for Information Biology, Thermo Fisher Scientific, Waltham, Massachusetts
| |
Collapse
|
8
|
Bovaird S, Patel D, Padilla JCA, Lécuyer E. Biological functions, regulatory mechanisms, and disease relevance of RNA localization pathways. FEBS Lett 2018; 592:2948-2972. [PMID: 30132838 DOI: 10.1002/1873-3468.13228] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2018] [Revised: 08/06/2018] [Accepted: 08/17/2018] [Indexed: 12/12/2022]
Abstract
The asymmetric subcellular distribution of RNA molecules from their sites of transcription to specific compartments of the cell is an important aspect of post-transcriptional gene regulation. This involves the interplay of intrinsic cis-regulatory elements within the RNA molecules with trans-acting RNA-binding proteins and associated factors. Together, these interactions dictate the intracellular localization route of RNAs, whose downstream impacts have wide-ranging implications in cellular physiology. In this review, we examine the mechanisms underlying RNA localization and discuss their biological significance. We also review the growing body of evidence pointing to aberrant RNA localization pathways in the development and progression of diseases.
Collapse
Affiliation(s)
- Samantha Bovaird
- Institut de recherches cliniques de Montréal (IRCM), QC, Canada.,Division of Experimental Medicine, Faculty of Medicine, McGill University, Montreal, QC, Canada
| | - Dhara Patel
- Institut de recherches cliniques de Montréal (IRCM), QC, Canada.,Molecular Biology Program, Faculty of Medicine, Université de Montréal, QC, Canada
| | - Juan-Carlos Alberto Padilla
- Institut de recherches cliniques de Montréal (IRCM), QC, Canada.,Division of Experimental Medicine, Faculty of Medicine, McGill University, Montreal, QC, Canada
| | - Eric Lécuyer
- Institut de recherches cliniques de Montréal (IRCM), QC, Canada.,Division of Experimental Medicine, Faculty of Medicine, McGill University, Montreal, QC, Canada.,Molecular Biology Program, Faculty of Medicine, Université de Montréal, QC, Canada.,Department of Biochemistry and Molecular Medicine, Université de Montréal, QC, Canada
| |
Collapse
|
9
|
Barrett C, Huang FW, Reidys CM. Sequence-structure relations of biopolymers. Bioinformatics 2018; 33:382-389. [PMID: 28171628 DOI: 10.1093/bioinformatics/btw621] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2015] [Revised: 05/16/2016] [Accepted: 09/26/2016] [Indexed: 12/12/2022] Open
Abstract
Motivation DNA data is transcribed into single-stranded RNA, which folds into specific molecular structures. In this paper we pose the question to what extent sequence- and structure-information correlate. We view this correlation as structural semantics of sequence data that allows for a different interpretation than conventional sequence alignment. Structural semantics could enable us to identify more general embedded ‘patterns’ in DNA and RNA sequences. Results We compute the partition function of sequences with respect to a fixed structure and connect this computation to the mutual information of a sequence–structure pair for RNA secondary structures. We present a Boltzmann sampler and obtain the a priori probability of specific sequence patterns. We present a detailed analysis for the three PDB-structures, 2JXV (hairpin), 2N3R (3-branch multi-loop) and 1EHZ (tRNA). We localize specific sequence patterns, contrast the energy spectrum of the Boltzmann sampled sequences versus those sequences that refold into the same structure and derive a criterion to identify native structures. We illustrate that there are multiple sequences in the partition function of a fixed structure, each having nearly the same mutual information, that are nevertheless poorly aligned. This indicates the possibility of the existence of relevant patterns embedded in the sequences that are not discoverable using alignments. Availability and Implementation The source code is freely available at http://staff.vbi.vt.edu/fenixh/Sampler.zip Contact duckcr@vbi.vt.edu Supplimentary Information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Christopher Barrett
- Biocomplexity Institute of Virginia Tech, Virginia Tech University, Blacksburg, VA, USA
| | - Fenix W Huang
- Biocomplexity Institute of Virginia Tech, Virginia Tech University, Blacksburg, VA, USA
| | - Christian M Reidys
- Biocomplexity Institute of Virginia Tech, Virginia Tech University, Blacksburg, VA, USA
| |
Collapse
|
10
|
Churkin A, Retwitzer MD, Reinharz V, Ponty Y, Waldispühl J, Barash D. Design of RNAs: comparing programs for inverse RNA folding. Brief Bioinform 2018; 19:350-358. [PMID: 28049135 PMCID: PMC6018860 DOI: 10.1093/bib/bbw120] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022] Open
Abstract
Computational programs for predicting RNA sequences with desired folding properties have been extensively developed and expanded in the past several years. Given a secondary structure, these programs aim to predict sequences that fold into a target minimum free energy secondary structure, while considering various constraints. This procedure is called inverse RNA folding. Inverse RNA folding has been traditionally used to design optimized RNAs with favorable properties, an application that is expected to grow considerably in the future in light of advances in the expanding new fields of synthetic biology and RNA nanostructures. Moreover, it was recently demonstrated that inverse RNA folding can successfully be used as a valuable preprocessing step in computational detection of novel noncoding RNAs. This review describes the most popular freeware programs that have been developed for such purposes, starting from RNAinverse that was devised when formulating the inverse RNA folding problem. The most recently published ones that consider RNA secondary structure as input are antaRNA, RNAiFold and incaRNAfbinv, each having different features that could be beneficial to specific biological problems in practice. The various programs also use distinct approaches, ranging from ant colony optimization to constraint programming, in addition to adaptive walk, simulated annealing and Boltzmann sampling. This review compares between the various programs and provides a simple description of the various possibilities that would benefit practitioners in selecting the most suitable program. It is geared for specific tasks requiring RNA design based on input secondary structure, with an outlook toward the future of RNA design programs.
Collapse
Affiliation(s)
- Alexander Churkin
- Shamoon College of Engineering and Physics Department at Ben-Gurion University, Beer-Sheva, Israel
| | | | - Vladimir Reinharz
- Department of Computer Science, Ben-Gurion University, Beer-Sheva, Israel
- School of Computer Science, McGill University, Montréal QC, Canada
| | - Yann Ponty
- Laboratoire d’informatique, École Polytechnique, Palaiseau, France
| | | | - Danny Barash
- Department of Computer Science, Ben-Gurion University, Beer-Sheva, Israel
| |
Collapse
|
11
|
Wolfe BR, Porubsky NJ, Zadeh JN, Dirks RM, Pierce NA. Constrained Multistate Sequence Design for Nucleic Acid Reaction Pathway Engineering. J Am Chem Soc 2017; 139:3134-3144. [DOI: 10.1021/jacs.6b12693] [Citation(s) in RCA: 72] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/14/2023]
Affiliation(s)
- Brian R. Wolfe
- Division of Biology & Biological Engineering, California Institute of Technology, Pasadena, California 91125, United States
| | - Nicholas J. Porubsky
- Division of Chemistry & Chemical Engineering, California Institute of Technology, Pasadena, California 91125, United States
| | - Joseph N. Zadeh
- Division of Biology & Biological Engineering, California Institute of Technology, Pasadena, California 91125, United States
| | - Robert M. Dirks
- Division of Biology & Biological Engineering, California Institute of Technology, Pasadena, California 91125, United States
| | - Niles A. Pierce
- Division of Biology & Biological Engineering, California Institute of Technology, Pasadena, California 91125, United States
- Division of Engineering & Applied Science, California Institute of Technology, Pasadena, California 91125, United States
- Weatherall
Institute of Molecular Medicine, University of Oxford, Oxford OX3 9DS, United Kingdom
| |
Collapse
|
12
|
Zandi K, Butler G, Kharma N. An Adaptive Defect Weighted Sampling Algorithm to Design Pseudoknotted RNA Secondary Structures. Front Genet 2016; 7:129. [PMID: 27499762 PMCID: PMC4956659 DOI: 10.3389/fgene.2016.00129] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2016] [Accepted: 07/06/2016] [Indexed: 01/18/2023] Open
Abstract
Computational design of RNA sequences that fold into targeted secondary structures has many applications in biomedicine, nanotechnology and synthetic biology. An RNA molecule is made of different types of secondary structure elements and an important RNA element named pseudoknot plays a key role in stabilizing the functional form of the molecule. However, due to the computational complexities associated with characterizing pseudoknotted RNA structures, most of the existing RNA sequence designer algorithms generally ignore this important structural element and therefore limit their applications. In this paper we present a new algorithm to design RNA sequences for pseudoknotted secondary structures. We use NUPACK as the folding algorithm to compute the equilibrium characteristics of the pseudoknotted RNAs, and describe a new adaptive defect weighted sampling algorithm named Enzymer to design low ensemble defect RNA sequences for targeted secondary structures including pseudoknots. We used a biological data set of 201 pseudoknotted structures from the Pseudobase library to benchmark the performance of our algorithm. We compared the quality characteristics of the RNA sequences we designed by Enzymer with the results obtained from the state of the art MODENA and antaRNA. Our results show our method succeeds more frequently than MODENA and antaRNA do, and generates sequences that have lower ensemble defect, lower probability defect and higher thermostability. Finally by using Enzymer and by constraining the design to a naturally occurring and highly conserved Hammerhead motif, we designed 8 sequences for a pseudoknotted cis-acting Hammerhead ribozyme. Enzymer is available for download at https://bitbucket.org/casraz/enzymer.
Collapse
Affiliation(s)
- Kasra Zandi
- Computer Science Department, Concordia UniversityMontreal, QC, Canada
| | - Gregory Butler
- Computer Science Department, Concordia UniversityMontreal, QC, Canada
- Centre for Structural and Functional Genomics, Concordia UniversityMontreal, QC, Canada
| | - Nawwaf Kharma
- Centre for Structural and Functional Genomics, Concordia UniversityMontreal, QC, Canada
- Electrical and Computer Engineering Department, Concordia UniversityMontreal, QC, Canada
| |
Collapse
|
13
|
Wolfe BR, Pierce NA. Sequence Design for a Test Tube of Interacting Nucleic Acid Strands. ACS Synth Biol 2015; 4:1086-100. [PMID: 25329866 DOI: 10.1021/sb5002196] [Citation(s) in RCA: 41] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023]
Abstract
We describe an algorithm for designing the equilibrium base-pairing properties of a test tube of interacting nucleic acid strands. A target test tube is specified as a set of desired "on-target" complexes, each with a target secondary structure and target concentration, and a set of undesired "off-target" complexes, each with vanishing target concentration. Sequence design is performed by optimizing the test tube ensemble defect, corresponding to the concentration of incorrectly paired nucleotides at equilibrium evaluated over the ensemble of the test tube. To reduce the computational cost of accepting or rejecting mutations to a random initial sequence, the structural ensemble of each on-target complex is hierarchically decomposed into a tree of conditional subensembles, yielding a forest of decomposition trees. Candidate sequences are evaluated efficiently at the leaf level of the decomposition forest by estimating the test tube ensemble defect from conditional physical properties calculated over the leaf subensembles. As optimized subsequences are merged toward the root level of the forest, any emergent defects are eliminated via ensemble redecomposition and sequence reoptimization. After successfully merging subsequences to the root level, the exact test tube ensemble defect is calculated for the first time, explicitly checking for the effect of the previously neglected off-target complexes. Any off-target complexes that form at appreciable concentration are hierarchically decomposed, added to the decomposition forest, and actively destabilized during subsequent forest reoptimization. For target test tubes representative of design challenges in the molecular programming and synthetic biology communities, our test tube design algorithm typically succeeds in achieving a normalized test tube ensemble defect ≤1% at a design cost within an order of magnitude of the cost of test tube analysis.
Collapse
Affiliation(s)
- Brian R. Wolfe
- Division of Biology and Biological
Engineering and ‡Division of Engineering and Applied
Science, California Institute of Technology, Pasadena, California 91125, United States
| | - Niles A. Pierce
- Division of Biology and Biological
Engineering and ‡Division of Engineering and Applied
Science, California Institute of Technology, Pasadena, California 91125, United States
| |
Collapse
|
14
|
Jabbari H, Aminpour M, Montemagno C. Computational Approaches to Nucleic Acid Origami. ACS COMBINATORIAL SCIENCE 2015; 17:535-47. [PMID: 26348196 DOI: 10.1021/acscombsci.5b00079] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]
Abstract
Recent advances in experimental DNA origami have dramatically expanded the horizon of DNA nanotechnology. Complex 3D suprastructures have been designed and developed using DNA origami with applications in biomaterial science, nanomedicine, nanorobotics, and molecular computation. Ribonucleic acid (RNA) origami has recently been realized as a new approach. Similar to DNA, RNA molecules can be designed to form complex 3D structures through complementary base pairings. RNA origami structures are, however, more compact and more thermodynamically stable due to RNA's non-canonical base pairing and tertiary interactions. With all these advantages, the development of RNA origami lags behind DNA origami by a large gap. Furthermore, although computational methods have proven to be effective in designing DNA and RNA origami structures and in their evaluation, advances in computational nucleic acid origami is even more limited. In this paper, we review major milestones in experimental and computational DNA and RNA origami and present current challenges in these fields. We believe collaboration between experimental nanotechnologists and computer scientists are critical for advancing these new research paradigms.
Collapse
Affiliation(s)
- Hosna Jabbari
- Ingenuity Lab, 11421 Saskatchewan
Drive, Edmonton, Alberta T6G 2M9, Canada
- Department
of Chemical and Materials Engineering, University of Alberta, Edmonton T6G 2V4, Canada
| | - Maral Aminpour
- Ingenuity Lab, 11421 Saskatchewan
Drive, Edmonton, Alberta T6G 2M9, Canada
- Department
of Chemical and Materials Engineering, University of Alberta, Edmonton T6G 2V4, Canada
| | - Carlo Montemagno
- Ingenuity Lab, 11421 Saskatchewan
Drive, Edmonton, Alberta T6G 2M9, Canada
- Department
of Chemical and Materials Engineering, University of Alberta, Edmonton T6G 2V4, Canada
| |
Collapse
|
15
|
Churkin A, Weinbrand L, Barash D. Free energy minimization to predict RNA secondary structures and computational RNA design. Methods Mol Biol 2015; 1269:3-16. [PMID: 25577369 DOI: 10.1007/978-1-4939-2291-8_1] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]
Abstract
Determining the RNA secondary structure from sequence data by computational predictions is a long-standing problem. Its solution has been approached in two distinctive ways. If a multiple sequence alignment of a collection of homologous sequences is available, the comparative method uses phylogeny to determine conserved base pairs that are more likely to form as a result of billions of years of evolution than by chance. In the case of single sequences, recursive algorithms that compute free energy structures by using empirically derived energy parameters have been developed. This latter approach of RNA folding prediction by energy minimization is widely used to predict RNA secondary structure from sequence. For a significant number of RNA molecules, the secondary structure of the RNA molecule is indicative of its function and its computational prediction by minimizing its free energy is important for its functional analysis. A general method for free energy minimization to predict RNA secondary structures is dynamic programming, although other optimization methods have been developed as well along with empirically derived energy parameters. In this chapter, we introduce and illustrate by examples the approach of free energy minimization to predict RNA secondary structures.
Collapse
Affiliation(s)
- Alexander Churkin
- Department of Computer Science, Ben-Gurion University, 653, Beer-Sheva, 84105, Israel
| | | | | |
Collapse
|
16
|
Abstract
In this chapter, we review both computational and experimental aspects of de novo RNA sequence design. We give an overview of currently available design software and their limitations, and discuss the necessary setup to experimentally validate proper function in vitro and in vivo. We focus on transcription-regulating riboswitches, a task that has just recently lead to first successful designs of such RNA elements.
Collapse
Affiliation(s)
- Sven Findeiß
- Research Group Bioinformatics and Computational Biology, Faculty of Computer Science, University of Vienna, Vienna, Austria; Institute for Theoretical Chemistry, University of Vienna, Vienna, Austria
| | - Manja Wachsmuth
- Institute for Biochemistry, University of Leipzig, Leipzig, Germany
| | - Mario Mörl
- Institute for Biochemistry, University of Leipzig, Leipzig, Germany.
| | - Peter F Stadler
- Institute for Theoretical Chemistry, University of Vienna, Vienna, Austria; Bioinformatics Group, Department of Computer Science and the Interdisciplinary Center for Bioinformatic, University of Leipzig, Leipzig, Germany; Center for RNA in Technology and Health, University of Copenhagen, Frederiksberg, Denmark; Max Planck Institute for Mathematics in the Sciences, Leipzig, Germany; Fraunhofer Institute for Cell Therapy and Immunology, Leipzig, Germany; Santa Fe Institute, Santa Fe, New Mexico, USA
| |
Collapse
|
17
|
Kang Z, Zhang C, Zhang J, Jin P, Zhang J, Du G, Chen J. Small RNA regulators in bacteria: powerful tools for metabolic engineering and synthetic biology. Appl Microbiol Biotechnol 2014; 98:3413-24. [DOI: 10.1007/s00253-014-5569-y] [Citation(s) in RCA: 51] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2013] [Revised: 01/22/2014] [Accepted: 01/23/2014] [Indexed: 12/17/2022]
|
18
|
Reinharz V, Ponty Y, Waldispühl J. A weighted sampling algorithm for the design of RNA sequences with targeted secondary structure and nucleotide distribution. Bioinformatics 2013; 29:i308-15. [PMID: 23812999 PMCID: PMC3694657 DOI: 10.1093/bioinformatics/btt217] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
Motivations: The design of RNA sequences folding into predefined secondary structures is a milestone for many synthetic biology and gene therapy studies. Most of the current software uses similar local search strategies (i.e. a random seed is progressively adapted to acquire the desired folding properties) and more importantly do not allow the user to control explicitly the nucleotide distribution such as the GC-content in their sequences. However, the latter is an important criterion for large-scale applications as it could presumably be used to design sequences with better transcription rates and/or structural plasticity. Results: In this article, we introduce IncaRNAtion, a novel algorithm to design RNA sequences folding into target secondary structures with a predefined nucleotide distribution. IncaRNAtion uses a global sampling approach and weighted sampling techniques. We show that our approach is fast (i.e. running time comparable or better than local search methods), seedless (we remove the bias of the seed in local search heuristics) and successfully generates high-quality sequences (i.e. thermodynamically stable) for any GC-content. To complete this study, we develop a hybrid method combining our global sampling approach with local search strategies. Remarkably, our glocal methodology overcomes both local and global approaches for sampling sequences with a specific GC-content and target structure. Availability:IncaRNAtion is available at csb.cs.mcgill.ca/incarnation/ Contact:jeromew@cs.mcgill.ca or yann.ponty@lix.polytechnique.fr Supplementary Information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Vladimir Reinharz
- School of Computer Science & McGill Centre for Bioinformatics, McGill University, Montréal, QC, Canada
| | | | | |
Collapse
|
19
|
Reinharz V, Ponty Y, Waldispühl J. Using structural and evolutionary information to detect and correct pyrosequencing errors in noncoding RNAs. J Comput Biol 2013; 20:905-19. [PMID: 24134390 DOI: 10.1089/cmb.2013.0085] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
The analysis of the sequence-structure relationship in RNA molecules is not only essential for evolutionary studies but also for concrete applications such as error-correction in next generation sequencing (NGS) technologies. The prohibitive sizes of the mutational and conformational landscapes, combined with the volume of data to process, require efficient algorithms to compute sequence-structure properties. In this article, we address the correction of NGS errors by calculating which mutations most increase the likelihood of a sequence to a given structure and RNA family. We introduce RNApyro, an efficient, linear time and space inside-outside algorithm that computes exact mutational probabilities under secondary structure and evolutionary constraints given as a multiple sequence alignment with a consensus structure. We develop a scoring scheme combining classical stacking base-pair energies to novel isostericity scores and apply our techniques to correct pointwise errors in 5s and 16s rRNA sequences. Our results suggest that RNApyro is a promising algorithm to complement existing tools in the NGS error-correction pipeline.
Collapse
|
20
|
Weinbrand L, Avihoo A, Barash D. RNAfbinv: an interactive Java application for fragment-based design of RNA sequences. Bioinformatics 2013; 29:2938-40. [PMID: 23975763 DOI: 10.1093/bioinformatics/btt494] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
SUMMARY In RNA design problems, it is plausible to assume that the user would be interested in preserving a particular RNA secondary structure motif, or fragment, for biological reasons. The preservation could be in structure or sequence, or both. Thus, the inverse RNA folding problem could benefit from considering fragment constraints. We have developed a new interactive Java application called RNA fragment-based inverse that allows users to insert an RNA secondary structure in dot-bracket notation. It then performs sequence design that conforms to the shape of the input secondary structure, the specified thermodynamic stability, the specified mutational robustness and the user-selected fragment after shape decomposition. In this shape-based design approach, specific RNA structural motifs with known biological functions are strictly enforced, while others can possess more flexibility in their structure in favor of preserving physical attributes and additional constraints. AVAILABILITY RNAfbinv is freely available for download on the web at http://www.cs.bgu.ac.il/~RNAexinv/RNAfbinv. The site contains a help file with an explanation regarding the exact use.
Collapse
Affiliation(s)
- Lina Weinbrand
- Department of Computer Science, Ben Gurion University of the Negev, Beer Sheva 84105, Israel and Microsoft Research Israel, Herzliya 46733, Israel
| | | | | |
Collapse
|