1
|
Linzer JT, Aminov E, Abdullah AS, Kirkup CE, Diaz Ventura RI, Bijoor VR, Jung J, Huang S, Tse CG, Álvarez Toucet E, Onghai HP, Ghosh AP, Grodzki AC, Haines ER, Iyer AS, Khalil MK, Leong AP, Neuhaus MA, Park J, Shahid A, Xie M, Ziembicki JM, Simmerling C, Nagan MC. Accurately Modeling RNA Stem-Loops in an Implicit Solvent Environment. J Chem Inf Model 2024. [PMID: 39002142 DOI: 10.1021/acs.jcim.4c00756] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/15/2024]
Abstract
Ribonucleic acid (RNA) molecules can adopt a variety of secondary and tertiary structures in solution, with stem-loops being one of the more common motifs. Here, we present a systematic analysis of 15 RNA stem-loop sequences simulated with molecular dynamics simulations in an implicit solvent environment. Analysis of RNA cluster ensembles showed that the stem-loop structures can generally adopt the A-form RNA in the stem region. Loop structures are more sensitive, and experimental structures could only be reproduced with modification of CH···O interactions in the force field, combined with an implicit solvent nonpolar correction to better model base stacking interactions. Accurately modeling RNA with current atomistic physics-based models remains challenging, but the RNA systems studied herein may provide a useful benchmark set for testing other RNA modeling methods in the future.
Collapse
Affiliation(s)
- Jason T Linzer
- Department of Chemistry, Stony Brook University, Stony Brook, New York 11794, United States
| | - Ethan Aminov
- Department of Chemistry, Stony Brook University, Stony Brook, New York 11794, United States
| | - Aalim S Abdullah
- Department of Chemistry, Stony Brook University, Stony Brook, New York 11794, United States
| | - Colleen E Kirkup
- Department of Chemistry, Stony Brook University, Stony Brook, New York 11794, United States
| | - Rebeca I Diaz Ventura
- Department of Chemistry, Stony Brook University, Stony Brook, New York 11794, United States
| | - Vinay R Bijoor
- Department of Chemistry, Stony Brook University, Stony Brook, New York 11794, United States
| | - Jiyun Jung
- Department of Chemistry, Stony Brook University, Stony Brook, New York 11794, United States
| | - Sophie Huang
- Department of Chemistry, Stony Brook University, Stony Brook, New York 11794, United States
| | - Chi Gee Tse
- Department of Chemistry, Stony Brook University, Stony Brook, New York 11794, United States
| | - Emily Álvarez Toucet
- Department of Chemistry, Stony Brook University, Stony Brook, New York 11794, United States
| | - Hugo P Onghai
- Department of Chemistry, Stony Brook University, Stony Brook, New York 11794, United States
| | - Arghya P Ghosh
- Department of Chemistry, Stony Brook University, Stony Brook, New York 11794, United States
| | - Alex C Grodzki
- Department of Chemistry, Stony Brook University, Stony Brook, New York 11794, United States
| | - Emilee R Haines
- Department of Chemistry, Stony Brook University, Stony Brook, New York 11794, United States
| | - Aditya S Iyer
- Department of Chemistry, Stony Brook University, Stony Brook, New York 11794, United States
| | - Mark K Khalil
- Department of Chemistry, Stony Brook University, Stony Brook, New York 11794, United States
| | - Alexander P Leong
- Department of Chemistry, Stony Brook University, Stony Brook, New York 11794, United States
| | - Michael A Neuhaus
- Department of Chemistry, Stony Brook University, Stony Brook, New York 11794, United States
| | - Joseph Park
- Department of Chemistry, Stony Brook University, Stony Brook, New York 11794, United States
| | - Asir Shahid
- Department of Chemistry, Stony Brook University, Stony Brook, New York 11794, United States
| | - Matthew Xie
- Department of Chemistry, Stony Brook University, Stony Brook, New York 11794, United States
| | - Jan M Ziembicki
- Department of Chemistry, Stony Brook University, Stony Brook, New York 11794, United States
| | - Carlos Simmerling
- Department of Chemistry, Stony Brook University, Stony Brook, New York 11794, United States
- Laufer Center for Physical and Quantitative Biology, Stony Brook University, Stony Brook, New York 11794, United States
| | - Maria C Nagan
- Department of Chemistry, Stony Brook University, Stony Brook, New York 11794, United States
| |
Collapse
|
2
|
Kaur J, Sharma A, Mundlia P, Sood V, Pandey A, Singh G, Barnwal RP. RNA-Small-Molecule Interaction: Challenging the "Undruggable" Tag. J Med Chem 2024. [PMID: 38498010 DOI: 10.1021/acs.jmedchem.3c01354] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/19/2024]
Abstract
RNA targeting, specifically with small molecules, is a relatively new and rapidly emerging avenue with the promise to expand the target space in the drug discovery field. From being "disregarded" as an "undruggable" messenger molecule to FDA approval of an RNA-targeting small-molecule drug Risdiplam, a radical change in perspective toward RNA has been observed in the past decade. RNAs serve important regulatory functions beyond canonical protein synthesis, and their dysregulation has been reported in many diseases. A deeper understanding of RNA biology reveals that RNA molecules can adopt a variety of structures, carrying defined binding pockets that can accommodate small-molecule drugs. Due to its functional diversity and structural complexity, RNA can be perceived as a prospective target for therapeutic intervention. This perspective highlights the proof of concept of RNA-small-molecule interactions, exemplified by targeting of various transcripts with functional modulators. The advent of RNA-oriented knowledge would help expedite drug discovery.
Collapse
Affiliation(s)
- Jaskirat Kaur
- Department of Biophysics, Panjab University, Chandigarh 160014, India
| | - Akanksha Sharma
- Department of Biophysics, Panjab University, Chandigarh 160014, India
- University Institute of Pharmaceutical Sciences, Panjab University, Chandigarh 160014, India
| | - Poonam Mundlia
- Department of Biophysics, Panjab University, Chandigarh 160014, India
| | - Vikas Sood
- Department of Biochemistry, Jamia Hamdard, New Delhi 110062, India
| | - Ankur Pandey
- Department of Chemistry, Panjab University, Chandigarh 160014, India
| | - Gurpal Singh
- University Institute of Pharmaceutical Sciences, Panjab University, Chandigarh 160014, India
| | | |
Collapse
|
3
|
Matthies MC, Krueger R, Torda AE, Ward M. Differentiable partition function calculation for RNA. Nucleic Acids Res 2024; 52:e14. [PMID: 38038257 PMCID: PMC10853804 DOI: 10.1093/nar/gkad1168] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2023] [Revised: 10/24/2023] [Accepted: 11/28/2023] [Indexed: 12/02/2023] Open
Abstract
Ribonucleic acid (RNA) is an essential molecule in a wide range of biological functions. In 1990, McCaskill introduced a dynamic programming algorithm for computing the partition function of an RNA sequence. McCaskill's algorithm is widely used today for understanding the thermodynamic properties of RNA. In this work, we introduce a generalization of McCaskill's algorithm that is well-defined over continuous inputs. Crucially, this enables us to implement an end-to-end differentiable partition function calculation. The derivative can be computed with respect to the input, or to any other fixed values, such as the parameters of the energy model. This builds a bridge between RNA thermodynamics and the tools of differentiable programming including deep learning as it enables the partition function to be incorporated directly into any end-to-end differentiable pipeline. To demonstrate the effectiveness of our new approach, we tackle the inverse folding problem directly using gradient optimization. We find that using the gradient to optimize the sequence directly is sufficient to arrive at sequences with a high probability of folding into the desired structure. This indicates that the gradients we compute are meaningful.
Collapse
Affiliation(s)
- Marco C Matthies
- Centre for Bioinformatics, University of Hamburg, Bundesstr. 43, 20146 Hamburg, Germany
| | - Ryan Krueger
- Department of Applied Mathematics, Harvard University, 29 Oxford St, Cambridge, MA 02138, USA
| | - Andrew E Torda
- Centre for Bioinformatics, University of Hamburg, Bundesstr. 43, 20146 Hamburg, Germany
| | - Max Ward
- Department of Computer Science and Software Engineering, The University of Western Australia, 241, 35 Stirling Hwy, Crawley, WA 6009, Australia
| |
Collapse
|
4
|
Greenwood T, Heitsch CE. How Parameters Influence SHAPE-Directed Predictions. Methods Mol Biol 2024; 2726:105-124. [PMID: 38780729 DOI: 10.1007/978-1-0716-3519-3_5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/25/2024]
Abstract
The structure of an RNA sequence encodes information about its biological function. Dynamic programming algorithms are often used to predict the conformation of an RNA molecule from its sequence alone, and adding experimental data as auxiliary information improves prediction accuracy. This auxiliary data is typically incorporated into the nearest neighbor thermodynamic model22 by converting the data into pseudoenergies. Here, we look at how much of the space of possible structures auxiliary data allows prediction methods to explore. We find that for a large class of RNA sequences, auxiliary data shifts the predictions significantly. Additionally, we find that predictions are highly sensitive to the parameters which define the auxiliary data pseudoenergies. In fact, the parameter space can typically be partitioned into regions where different structural predictions predominate.
Collapse
|
5
|
Nakajima M, Smith AD. Counting Distinguishable RNA Secondary Structures. J Comput Biol 2023; 30:1089-1097. [PMID: 37815558 DOI: 10.1089/cmb.2022.0501] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/11/2023] Open
Abstract
RNA secondary structures are essential abstractions for understanding spacial folding behaviors of those macromolecules. Many secondary structure algorithms involve a common dynamic programming setup to exploit the property that secondary structures can be decomposed into substructures. Dirks et al. noted that this setup cannot directly address an issue of distinguishability among secondary structures, which arises for classes of sequences that admit nontrivial symmetry. Circular sequences are among these. We examine the problem of counting distinguishable secondary structures. Drawing from elementary results in group theory, we identify useful subsets of secondary structures. We then extend an algorithm due to Hofacker et al. for computing the sizes of these subsets. This yields a cubic-time algorithm to count distinguishable structures compatible with a given circular sequence. Furthermore, this general approach may be used to solve similar problems for which the RNA structures of interest involve symmetries.
Collapse
Affiliation(s)
- Masaru Nakajima
- Department of Physics and Astronomy and University of Southern California, Los Angeles, California, USA
| | - Andrew D Smith
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, California, USA
| |
Collapse
|
6
|
Wu KE, Zou JY, Chang H. Machine learning modeling of RNA structures: methods, challenges and future perspectives. Brief Bioinform 2023; 24:bbad210. [PMID: 37280185 DOI: 10.1093/bib/bbad210] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2023] [Revised: 05/12/2023] [Accepted: 05/17/2023] [Indexed: 06/08/2023] Open
Abstract
The three-dimensional structure of RNA molecules plays a critical role in a wide range of cellular processes encompassing functions from riboswitches to epigenetic regulation. These RNA structures are incredibly dynamic and can indeed be described aptly as an ensemble of structures that shifts in distribution depending on different cellular conditions. Thus, the computational prediction of RNA structure poses a unique challenge, even as computational protein folding has seen great advances. In this review, we focus on a variety of machine learning-based methods that have been developed to predict RNA molecules' secondary structure, as well as more complex tertiary structures. We survey commonly used modeling strategies, and how many are inspired by or incorporate thermodynamic principles. We discuss the shortcomings that various design decisions entail and propose future directions that could build off these methods to yield more robust, accurate RNA structure predictions.
Collapse
Affiliation(s)
- Kevin E Wu
- Department of Computer Science, Stanford University, Stanford, CA 94305, USA
- Center for Personal Dynamic Regulomes, Stanford University, Stanford, CA 94305, USA
- Department of Biomedical Data Science, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - James Y Zou
- Department of Computer Science, Stanford University, Stanford, CA 94305, USA
- Department of Biomedical Data Science, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Howard Chang
- Howard Hughes Medical Institute, Stanford University, Stanford, CA 94305, USA
- Department of Biomedical Data Science, Stanford University School of Medicine, Stanford, CA 94305, USA
| |
Collapse
|
7
|
Sato K, Hamada M. Recent trends in RNA informatics: a review of machine learning and deep learning for RNA secondary structure prediction and RNA drug discovery. Brief Bioinform 2023; 24:bbad186. [PMID: 37232359 PMCID: PMC10359090 DOI: 10.1093/bib/bbad186] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2023] [Revised: 04/24/2023] [Accepted: 04/25/2023] [Indexed: 05/27/2023] Open
Abstract
Computational analysis of RNA sequences constitutes a crucial step in the field of RNA biology. As in other domains of the life sciences, the incorporation of artificial intelligence and machine learning techniques into RNA sequence analysis has gained significant traction in recent years. Historically, thermodynamics-based methods were widely employed for the prediction of RNA secondary structures; however, machine learning-based approaches have demonstrated remarkable advancements in recent years, enabling more accurate predictions. Consequently, the precision of sequence analysis pertaining to RNA secondary structures, such as RNA-protein interactions, has also been enhanced, making a substantial contribution to the field of RNA biology. Additionally, artificial intelligence and machine learning are also introducing technical innovations in the analysis of RNA-small molecule interactions for RNA-targeted drug discovery and in the design of RNA aptamers, where RNA serves as its own ligand. This review will highlight recent trends in the prediction of RNA secondary structure, RNA aptamers and RNA drug discovery using machine learning, deep learning and related technologies, and will also discuss potential future avenues in the field of RNA informatics.
Collapse
Affiliation(s)
- Kengo Sato
- School of System Design and Technology, Tokyo Denki University, 5 Senju Asahi-cho, Adachi-ku, Tokyo 120-8551, Japan
| | - Michiaki Hamada
- Department of Electrical Engineering and Bioscience, Faculty of Science and Engineering, Waseda University, 55N-06-10, 3-4-1, Okubo, Shinjuku-ku, Tokyo 169-8555, Japan
- Computational Bio Big-Data Open Innovation Laboratory (CBBD-OIL) , National Institute of Advanced Industrial Science and Technology (AIST), 3-4-1, Okubo, Shinjuku-ku, Tokyo 169-8555, Japan
- Graduate School of Medicine, Nippon Medical School, 1-1-5, Sendagi, Bunkyo-ku, Tokyo 113-8602, Japan
| |
Collapse
|
8
|
Abstract
RNAstructure is a user-friendly program for the prediction and analysis of RNA secondary structure. It is available as a web server, a program with a graphical user interface, or a set of command line tools. The programs are available for Microsoft Windows, macOS, or Linux. This article provides protocols for prediction of RNA secondary structure (using the web server, the graphical user interface, or the command line) and high-affinity oligonucleotide binding sites to a structured RNA target (using the graphical user interface). © 2023 Wiley Periodicals LLC. Basic Protocol 1: Predicting RNA secondary structure using the RNAstructure web server Alternate Protocol 1: Predicting secondary structure and base pair probabilities using the RNAstructure graphical user interface Alternate Protocol 2: Predicting secondary structure and base pair probabilities using the RNAstructure command line interface Basic Protocol 2: Predicting binding affinities of oligonucleotides complementary to an RNA target using OligoWalk.
Collapse
Affiliation(s)
- Sara E. Ali
- Department of Biochemistry & Biophysics and Center for RNA Biology, University of Rochester Medical Center, 601 Elmwood Avenue, Box 712, Rochester, New York 14642
| | - Abhinav Mittal
- Department of Biochemistry & Biophysics and Center for RNA Biology, University of Rochester Medical Center, 601 Elmwood Avenue, Box 712, Rochester, New York 14642
| | - David H. Mathews
- Department of Biochemistry & Biophysics and Center for RNA Biology, University of Rochester Medical Center, 601 Elmwood Avenue, Box 712, Rochester, New York 14642
| |
Collapse
|
9
|
Monroy-Eklund A, Taylor C, Weidmann CA, Burch C, Laederach A. Structural analysis of MALAT1 long noncoding RNA in cells and in evolution. RNA (NEW YORK, N.Y.) 2023; 29:691-704. [PMID: 36792358 PMCID: PMC10159000 DOI: 10.1261/rna.079388.122] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/29/2022] [Accepted: 02/02/2023] [Indexed: 05/06/2023]
Abstract
Although not canonically polyadenylated, the long noncoding RNA MALAT1 (metastasis-associated lung adenocarcinoma transcript 1) is stabilized by a highly conserved 76-nt triple helix structure on its 3' end. The entire MALAT1 transcript is over 8000 nt long in humans. The strongest structural conservation signal in MALAT1 (as measured by covariation of base pairs) is in the triple helix structure. Primary sequence analysis of covariation alone does not reveal the degree of structural conservation of the entire full-length transcript, however. Furthermore, RNA structure is often context dependent; RNA binding proteins that are differentially expressed in different cell types may alter structure. We investigate here the in-cell and cell-free structures of the full-length human and green monkey (Chlorocebus sabaeus) MALAT1 transcripts in multiple tissue-derived cell lines using SHAPE chemical probing. Our data reveal levels of uniform structural conservation in different cell lines, in cells and cell-free, and even between species, despite significant differences in primary sequence. The uniformity of the structural conservation across the entire transcript suggests that, despite seeing covariation signals only in the triple helix junction of the lncRNA, the rest of the transcript's structure is remarkably conserved, at least in primates and across multiple cell types and conditions.
Collapse
Affiliation(s)
- Anais Monroy-Eklund
- Department of Biology, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599, USA
| | - Colin Taylor
- Department of Biology, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599, USA
| | - Chase A Weidmann
- Department of Biological Chemistry, University of Michigan Medical School, Center for RNA Biomedicine, Rogel Cancer Center, Ann Arbor, Michigan 48109, USA
| | - Christina Burch
- Department of Biology, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599, USA
| | - Alain Laederach
- Department of Biology, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599, USA
| |
Collapse
|
10
|
Zhang H, Zhang L, Liu K, Li S, Mathews DH, Huang L. Linear-Time Algorithms for RNA Structure Prediction. Methods Mol Biol 2023; 2586:15-34. [PMID: 36705896 DOI: 10.1007/978-1-0716-2768-6_2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]
Abstract
RNA secondary structure prediction is widely used to understand RNA function. Existing dynamic programming-based algorithms, both the classical minimum free energy (MFE) methods and partition function methods, suffer from a major limitation: their runtimes scale cubically with the RNA length, and this slowness limits their use in genome-wide applications. Inspired by incremental parsing for context-free grammars in computational linguistics, we designed linear-time heuristic algorithms, LinearFold and LinearPartition, to approximate the MFE structure, partition function and base pairing probabilities. These programs are orders of magnitude faster than Vienna RNAfold and CONTRAfold on long sequences. More interestingly, LinearFold and LinearPartition lead to more accurate predictions on the longest sequence families for which the structures are well established (16S and 23S Ribosomal RNAs), as well as improved accuracies for long-range base pairs (500 + nucleotides apart). This chapter provides protocols for using LinearFold and LinearPartition for secondary structure prediction.
Collapse
Affiliation(s)
- He Zhang
- Baidu Research USA, Sunnyvale, CA, USA.,School of Electrical Engineering & Computer Science, Oregon State University, Corvallis, OR, USA
| | - Liang Zhang
- Baidu Research USA, Sunnyvale, CA, USA.,School of Electrical Engineering & Computer Science, Oregon State University, Corvallis, OR, USA
| | - Kaibo Liu
- Baidu Research USA, Sunnyvale, CA, USA
| | - Sizhen Li
- School of Electrical Engineering & Computer Science, Oregon State University, Corvallis, OR, USA
| | - David H Mathews
- Dept. of Biochemistry & Biophysics, Center for RNA Biology, Rochester, NY, USA.,Dept. of Biostatistics & Computational Biology, University of Rochester Medical Center, Rochester, NY, USA
| | - Liang Huang
- School of Electrical Engineering & Computer Science, Oregon State University, Corvallis, OR, USA.
| |
Collapse
|
11
|
Zhang L, Chen L, Zhang H, Si H, Liu X, Suo X, Hu D. A comparative study of microRNAs in different stages of Eimeria tenella. Front Vet Sci 2022; 9:954725. [PMID: 35937295 PMCID: PMC9353057 DOI: 10.3389/fvets.2022.954725] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2022] [Accepted: 06/30/2022] [Indexed: 11/17/2022] Open
Abstract
Apicomplexan parasites have divergent biogenesis machinery for small RNA generation. Analysis has shown that parasites in Plasmodium and Cryptosporidium as well as many species in Leishmania or Trypanosoma do not have a complete machinery in small RNA biogenesis. Recently, the miRNA-generating system of Toxoplasma has been identified as plant/fungal-like and its miRNAome has been elucidated. However, the microRNA (miRNA) expression profiles and their potential regulatory functions in different stages of Eimeria tenella remain largely unknown. In this study, we characterized the RNA silencing machinery of E. tenella and investigated the miRNA population distribution at different life stages by high-throughput sequencing. We characterized the expression of miRNAs in the unsporulated oocyst, sporulated oocyst and schizogony stages, obtaining a total of 392 miRNAs. We identified 58 differentially expressed miRNAs between USO (unsporulated oocysts) and SO (sporulated oocysts) that were significantly enriched for their potential target genes in the regulation of gene expression and chromatin binding, suggesting an epigenetic modulation of sporulating by these miRNAs. In comparing miRNA expression at endogenous and exogenous developmental stages, twenty-four miRNAs were identified differently expressed. Those were mainly associated with the regulation of genes with protein kinase activity, suggesting control of protein phosphorylation. This is the first study about the evolution of miRNA biogenesis system and miRNA control of gene expression in Eimeria species. Our data may lead to functional insights into of the regulation of gene expression during parasite life cycle in apicomplexan parasites.
Collapse
Affiliation(s)
- Lei Zhang
- College of Animal Science and Technology, Guangxi University, Nanning, China
| | - Linlin Chen
- Key Laboratory of Animal Epidemiology and Zoonosis of Ministry of Agriculture, National Animal Protozoa Laboratory, College of Veterinary Medicine, China Agricultural University, Beijing, China
| | - Hongtao Zhang
- College of Animal Science and Technology, Guangxi University, Nanning, China
| | - Hongbin Si
- College of Animal Science and Technology, Guangxi University, Nanning, China
| | - Xianyong Liu
- Key Laboratory of Animal Epidemiology and Zoonosis of Ministry of Agriculture, National Animal Protozoa Laboratory, College of Veterinary Medicine, China Agricultural University, Beijing, China
| | - Xun Suo
- Key Laboratory of Animal Epidemiology and Zoonosis of Ministry of Agriculture, National Animal Protozoa Laboratory, College of Veterinary Medicine, China Agricultural University, Beijing, China
| | - Dandan Hu
- College of Animal Science and Technology, Guangxi University, Nanning, China
- *Correspondence: Dandan Hu
| |
Collapse
|
12
|
Rieger M, Zacharias M. Nearest-Neighbor dsDNA Stability Analysis Using Alchemical Free-Energy Simulations. J Phys Chem B 2022; 126:3640-3647. [PMID: 35549273 DOI: 10.1021/acs.jpcb.2c01138] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
The thermodynamic stability of double-stranded (ds)DNA depends on its sequence. It is influenced by the base pairing and stacking with neighboring bases along DNA molecules. Semiempirical schemes are available that allow us to predict the thermodynamic stability of DNA sequences based on empirically derived nearest-neighbor contributions of base pairs formed in the context of all possible nearest-neighbor base pairs. Current molecular dynamics (MD) simulations allow one to simulate the dynamics of DNA molecules in good agreement with experimentally obtained structures and available data on conformational flexibility. However, the suitability of current force field methods to reproduce dsDNA stability and its sequence dependence has been much less well tested. We have employed alchemical free-energy simulations of whole base pair transversions in dsDNA and in unbound single-stranded partner molecules. Such transversions change the sequence context but not the nucleotide content or base pairing in dsDNA and allow a direct comparison with the empirical nearest-neighbor dsDNA stability model. For the alchemical free-energy changes in the unbound single-stranded (ss)DNA partner molecules, we tested different setups assuming either complete unstacking or unrestrained simulations with partial stacking in the unbound ssDNA. The free-energy simulations predicted nearest-neighbor effects of similar magnitude, as observed experimentally but showed overall limited correlation with experimental data. An inaccurate description of stacking interactions and other possible reasons such as the neglect of electronic polarization effects are discussed. The results indicate the need to improve the realistic description of stacking interactions in current molecular mechanic force fields.
Collapse
Affiliation(s)
- Manuel Rieger
- Physics Department and Center of Protein Assemblies, Technical University of Munich, 85748 Garching, Germany
| | - Martin Zacharias
- Physics Department and Center of Protein Assemblies, Technical University of Munich, 85748 Garching, Germany
| |
Collapse
|
13
|
Zuber J, Schroeder SJ, Sun H, Turner DH, Mathews DH. Nearest neighbor rules for RNA helix folding thermodynamics: improved end effects. Nucleic Acids Res 2022; 50:5251-5262. [PMID: 35524574 PMCID: PMC9122537 DOI: 10.1093/nar/gkac261] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2021] [Revised: 03/29/2022] [Accepted: 04/08/2022] [Indexed: 12/26/2022] Open
Abstract
Nearest neighbor parameters for estimating the folding stability of RNA secondary structures are in widespread use. For helices, current parameters penalize terminal AU base pairs relative to terminal GC base pairs. We curated an expanded database of helix stabilities determined by optical melting experiments. Analysis of the updated database shows that terminal penalties depend on the sequence identity of the adjacent penultimate base pair. New nearest neighbor parameters that include this additional sequence dependence accurately predict the measured values of 271 helices in an updated database with a correlation coefficient of 0.982. This refined understanding of helix ends facilitates fitting terms for base pair stacks with GU pairs. Prior parameter sets treated 5′GGUC3′ paired to 3′CUGG5′ separately from other 5′GU3′/3′UG5′ stacks. The improved understanding of helix end stability, however, makes the separate treatment unnecessary. Introduction of the additional terms was tested with three optical melting experiments. The average absolute difference between measured and predicted free energy changes at 37°C for these three duplexes containing terminal adjacent AU and GU pairs improved from 1.38 to 0.27 kcal/mol. This confirms the need for the additional sequence dependence in the model.
Collapse
Affiliation(s)
- Jeffrey Zuber
- Alnylam Pharmaceuticals, Inc., Cambridge, MA 02142, USA
| | - Susan J Schroeder
- Department of Chemistry and Biochemistry, and Department of Microbiology and Plant Biology, University of Oklahoma, Norman, OK 73019, USA
| | - Hongying Sun
- Department of Biochemistry & Biophysics, University of Rochester, Rochester, NY 14642, USA.,Center for RNA Biology, University of Rochester, Rochester, NY 14642, USA
| | - Douglas H Turner
- Center for RNA Biology, University of Rochester, Rochester, NY 14642, USA.,Department of Chemistry, University of Rochester, Rochester, NY 14627, USA
| | - David H Mathews
- Department of Biochemistry & Biophysics, University of Rochester, Rochester, NY 14642, USA.,Center for RNA Biology, University of Rochester, Rochester, NY 14642, USA.,Department of Biostatistics & Computational Biology, University of Rochester, Rochester, NY 14642, USA
| |
Collapse
|
14
|
RNA folding using quantum computers. PLoS Comput Biol 2022; 18:e1010032. [PMID: 35404931 PMCID: PMC9022793 DOI: 10.1371/journal.pcbi.1010032] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2021] [Revised: 04/21/2022] [Accepted: 03/18/2022] [Indexed: 11/19/2022] Open
Abstract
The 3-dimensional fold of an RNA molecule is largely determined by patterns of intramolecular hydrogen bonds between bases. Predicting the base pairing network from the sequence, also referred to as RNA secondary structure prediction or RNA folding, is a nondeterministic polynomial-time (NP)-complete computational problem. The structure of the molecule is strongly predictive of its functions and biochemical properties, and therefore the ability to accurately predict the structure is a crucial tool for biochemists. Many methods have been proposed to efficiently sample possible secondary structure patterns. Classic approaches employ dynamic programming, and recent studies have explored approaches inspired by evolutionary and machine learning algorithms. This work demonstrates leveraging quantum computing hardware to predict the secondary structure of RNA. A Hamiltonian written in the form of a Binary Quadratic Model (BQM) is derived to drive the system toward maximizing the number of consecutive base pairs while jointly maximizing the average length of the stems. A Quantum Annealer (QA) is compared to a Replica Exchange Monte Carlo (REMC) algorithm programmed with the same objective function, with the QA being shown to be highly competitive at rapidly identifying low energy solutions. The method proposed in this study was compared to three algorithms from literature and, despite its simplicity, was found to be competitive on a test set containing known structures with pseudoknots. The recent FDA approval of mRNA-based vaccines has increased public interest in synthetically designed RNA molecules. RNA molecules fold into complex secondary structures which determine their molecular properties and in part their efficacy. Determining the folded structure of an RNA molecule is a computationally challenging task with exponential scaling that is intractable to solve exactly, and therefore approximate methods are used. Quantum computing technology offers a new approach to finding approximate solutions to problems with exponential scaling. We formulate a simplistic, yet effective, model of RNA folding that can easily be mapped to quantum computers and we show that currently available quantum computing hardware is competitive with classical methods.
Collapse
|
15
|
Shatoff E, Bundschuh R. dsRBPBind: modeling the effect of RNA secondary structure on double-stranded RNA-protein binding. Bioinformatics 2022; 38:687-693. [PMID: 34668517 DOI: 10.1093/bioinformatics/btab724] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2021] [Revised: 09/15/2021] [Accepted: 10/15/2021] [Indexed: 02/03/2023] Open
Abstract
MOTIVATION RNA-binding proteins are fundamental to many cellular processes. Double-stranded RNA-binding proteins (dsRBPs) in particular are crucial for RNA interference, mRNA elongation, A-to-I editing, host defense, splicing and a multitude of other important mechanisms. Since dsRBPs require double-stranded RNA to bind, their binding affinity depends on the competition among all possible secondary structures of the target RNA molecule. Here, we introduce a quantitative model that allows calculation of the effective affinity of dsRBPs to any RNA given a principal affinity and the sequence of the RNA, while fully taking into account the entire secondary structure ensemble of the RNA. RESULTS We implement our model within the ViennaRNA folding package while maintaining its O(N3) time complexity. We validate our quantitative model by comparing with experimentally determined binding affinities and stoichiometries for transactivation response element RNA-binding protein (TRBP). We also find that the change in dsRBP binding affinity purely due to the presence of alternative RNA structures can be many orders of magnitude and that the predicted affinity of TRBP for pre-miRNA-like constructs correlates with experimentally measured processing rates. AVAILABILITY AND IMPLEMENTATION Our modified version of the ViennaRNA package is available for download at http://bioserv.mps.ohio-state.edu/dsRBPBind, is free to use for research and educational purposes, and utilizes simple get/set methods for footprint size, concentration, cooperativity, principal dissociation constant and overlap.
Collapse
Affiliation(s)
- Elan Shatoff
- Department of Physics, The Ohio State University, Columbus, OH 43210, USA.,Center for RNA Biology, The Ohio State University, Columbus, OH 43210, USA
| | - Ralf Bundschuh
- Department of Physics, The Ohio State University, Columbus, OH 43210, USA.,Center for RNA Biology, The Ohio State University, Columbus, OH 43210, USA.,Department of Chemistry and Biochemistry, The Ohio State University, Columbus, OH 43210, USA.,Division of Hematology, Department of Internal Medicine, The Ohio State University, Columbus, OH 43210, USA
| |
Collapse
|
16
|
Detection of pks Island mRNAs Using Toehold Sensors in Escherichia coli. Life (Basel) 2021; 11:life11111280. [PMID: 34833155 PMCID: PMC8625898 DOI: 10.3390/life11111280] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2021] [Revised: 11/15/2021] [Accepted: 11/18/2021] [Indexed: 12/14/2022] Open
Abstract
Synthetic biologists have applied biomolecular engineering approaches toward the goal of novel biological devices and have shown progress in diverse areas of medicine and biotechnology. Especially promising is the application of synthetic biological devices towards a novel class of molecular diagnostics. As an example, a de-novo-designed riboregulator called toehold switch, with its programmability and compatibility with field-deployable devices showed promising in vitro applications for viral RNA detection such as Zika and Corona viruses. However, the in vivo application of high-performance RNA sensors remains challenging due to the secondary structure of long mRNA species. Here, we introduced ‘Helper RNAs’ that can enhance the functionality of toehold switch sensors by mitigating the effect of secondary structures around a target site. By employing the helper RNAs, previously reported mCherry mRNA sensor showed improved fold-changes in vivo. To further generalize the Helper RNA approaches, we employed automatic design pipeline for toehold sensors that target the essential genes within the pks island, an important target of biomedical research in connection with colorectal cancer. The toehold switch sensors showed fold-changes upon the expression of full-length mRNAs that apparently depended sensitively on the identity of the gene as well as the predicted local structure within the target region of the mRNA. Still, the helper RNAs could improve the performance of toehold switch sensors in many instances, with up to 10-fold improvement over no helper cases. These results suggest that the helper RNA approaches can further assist the design of functional RNA devices in vivo with the aid of the streamlined automatic design software developed here. Further, our solutions for screening and stabilizing single-stranded region of mRNA may find use in other in vivo mRNA-sensing applications such as cas13 crRNA design, transcriptome engineering, and trans-cleaving ribozymes.
Collapse
|
17
|
Lackey L, Coria A, Ghosh AJ, Grayeski P, Hatfield A, Shankar V, Platig J, Xu Z, Ramos SBV, Silverman EK, Ortega VE, Cho MH, Hersh CP, Hobbs BD, Castaldi P, Laederach A. Alternative poly-adenylation modulates α1-antitrypsin expression in chronic obstructive pulmonary disease. PLoS Genet 2021; 17:e1009912. [PMID: 34784346 PMCID: PMC8631626 DOI: 10.1371/journal.pgen.1009912] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2021] [Revised: 11/30/2021] [Accepted: 10/25/2021] [Indexed: 01/07/2023] Open
Abstract
α1-anti-trypsin (A1AT), encoded by SERPINA1, is a neutrophil elastase inhibitor that controls the inflammatory response in the lung. Severe A1AT deficiency increases risk for Chronic Obstructive Pulmonary Disease (COPD), however, the role of A1AT in COPD in non-deficient individuals is not well known. We identify a 2.1-fold increase (p = 2.5x10-6) in the use of a distal poly-adenylation site in primary lung tissue RNA-seq in 82 COPD cases when compared to 64 controls and replicate this in an independent study of 376 COPD and 267 controls. This alternative polyadenylation event involves two sites, a proximal and distal site, 61 and 1683 nucleotides downstream of the A1AT stop codon. To characterize this event, we measured the distal ratio in human primary tissue short read RNA-seq data and corroborated our results with long read RNA-seq data. Integrating these results with 3' end RNA-seq and nanoluciferase reporter assay experiments we show that use of the distal site yields mRNA transcripts with over 50-fold decreased translation efficiency and A1AT expression. We identified seven RNA binding proteins using enhanced CrossLinking and ImmunoPrecipitation precipitation (eCLIP) with one or more binding sites in the SERPINA1 3' UTR. We combined these data with measurements of the distal ratio in shRNA knockdown experiments, nuclear and cytoplasmic fractionation, and chemical RNA structure probing. We identify Quaking Homolog (QKI) as a modulator of SERPINA1 mRNA translation and confirm the role of QKI in SERPINA1 translation with luciferase reporter assays. Analysis of single-cell RNA-seq showed differences in the distribution of the SERPINA1 distal ratio among hepatocytes, macrophages, αβ-Tcells and plasma cells in the liver. Alveolar Type 1,2, dendritic cells and macrophages also vary in their distal ratio in the lung. Our work reveals a complex post-transcriptional mechanism that regulates alternative polyadenylation and A1AT expression in COPD.
Collapse
Affiliation(s)
- Lela Lackey
- Department of Genetics and Biochemistry, Center for Human Genetics, Clemson University, Greenwood, South Carolina, United States of America
| | - Aaztli Coria
- Department of Biology, University of North Carolina, Chapel Hill, North Carolina, United States of America
- Department of Biochemistry and Biophysics, University of North Carolina, Chapel Hill, North Carolina, United States of America
| | - Auyon J. Ghosh
- Channing Division of Network Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, Massachusetts, United States of America
- Division of Pulmonary and Critical Care Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, Massachusetts, United States of America
| | - Phil Grayeski
- Curriculum in Genetics and Molecular Biology, University of North Carolina, Chapel Hill, North Carolina, United States of America
| | - Abigail Hatfield
- Department of Genetics and Biochemistry, Center for Human Genetics, Clemson University, Greenwood, South Carolina, United States of America
| | - Vijay Shankar
- Department of Genetics and Biochemistry, Center for Human Genetics, Clemson University, Greenwood, South Carolina, United States of America
| | - John Platig
- Channing Division of Network Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, Massachusetts, United States of America
| | - Zhonghui Xu
- Channing Division of Network Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, Massachusetts, United States of America
| | - Silvia B. V. Ramos
- Department of Biochemistry and Biophysics, University of North Carolina, Chapel Hill, North Carolina, United States of America
| | - Edwin K. Silverman
- Channing Division of Network Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, Massachusetts, United States of America
- Division of Pulmonary and Critical Care Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, Massachusetts, United States of America
| | - Victor E. Ortega
- Department of Internal Medicine, Division of Respiratory Medicine, Center for Individualized Medicine, Mayo Clinic, Scottsdale, Arizona, United States of America
| | - Michael H. Cho
- Channing Division of Network Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, Massachusetts, United States of America
- Division of Pulmonary and Critical Care Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, Massachusetts, United States of America
| | - Craig P. Hersh
- Channing Division of Network Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, Massachusetts, United States of America
- Division of Pulmonary and Critical Care Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, Massachusetts, United States of America
| | - Brian D. Hobbs
- Channing Division of Network Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, Massachusetts, United States of America
- Division of Pulmonary and Critical Care Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, Massachusetts, United States of America
| | - Peter Castaldi
- Channing Division of Network Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, Massachusetts, United States of America
- Division of Internal Medicine and Primary Care, Brigham and Women’s Hospital, Harvard Medical School, Boston, Massachusetts, United States of America
| | - Alain Laederach
- Department of Biology, University of North Carolina, Chapel Hill, North Carolina, United States of America
| |
Collapse
|
18
|
Martin NS, Ahnert SE. Insertions and deletions in the RNA sequence-structure map. J R Soc Interface 2021; 18:20210380. [PMID: 34610259 PMCID: PMC8492174 DOI: 10.1098/rsif.2021.0380] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2021] [Accepted: 09/13/2021] [Indexed: 12/21/2022] Open
Abstract
Genotype-phenotype maps link genetic changes to their fitness effect and are thus an essential component of evolutionary models. The map between RNA sequences and their secondary structures is a key example and has applications in functional RNA evolution. For this map, the structural effect of substitutions is well understood, but models usually assume a constant sequence length and do not consider insertions or deletions. Here, we expand the sequence-structure map to include single nucleotide insertions and deletions by using the RNAshapes concept. To quantify the structural effect of insertions and deletions, we generalize existing definitions for robustness and non-neutral mutation probabilities. We find striking similarities between substitutions, deletions and insertions: robustness to substitutions is correlated with robustness to insertions and, for most structures, to deletions. In addition, frequent structural changes after substitutions also tend to be common for insertions and deletions. This is consistent with the connection between energetically suboptimal folds and possible structural transitions. The similarities observed hold both for genotypic and phenotypic robustness and mutation probabilities, i.e. for individual sequences and for averages over sequences with the same structure. Our results could have implications for the rate of neutral and non-neutral evolution.
Collapse
Affiliation(s)
- Nora S. Martin
- Theory of Condensed Matter Group, Cavendish Laboratory, University of Cambridge, JJ Thomson Avenue, Cambridge CB3 0HE, UK
- Sainsbury Laboratory, University of Cambridge, Bateman Street, Cambridge CB2 1LR, UK
| | - Sebastian E. Ahnert
- Department of Chemical Engineering and Biotechnology, University of Cambridge, Philippa Fawcett Drive, Cambridge CB3 0AS, UK
- The Alan Turing Institute, British Library, Euston Road, London NW1 2DB, UK
| |
Collapse
|
19
|
High-throughput dissection of the thermodynamic and conformational properties of a ubiquitous class of RNA tertiary contact motifs. Proc Natl Acad Sci U S A 2021; 118:2109085118. [PMID: 34373334 DOI: 10.1073/pnas.2109085118] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Despite RNA's diverse secondary and tertiary structures and its complex conformational changes, nature utilizes a limited set of structural "motifs"-helices, junctions, and tertiary contact modules-to build diverse functional RNAs. Thus, in-depth descriptions of a relatively small universe of RNA motifs may lead to predictive models of RNA tertiary conformational landscapes. Motifs may have different properties depending on sequence and secondary structure, giving rise to subclasses that expand the universe of RNA building blocks. Yet we know very little about motif subclasses, given the challenges in mapping conformational properties in high throughput. Previously, we used "RNA on a massively parallel array" (RNA-MaP), a quantitative, high-throughput technique, to study thousands of helices and two-way junctions. Here, we adapt RNA-MaP to study the thermodynamic and conformational properties of tetraloop/tetraloop receptor (TL/TLR) tertiary contact motifs, analyzing 1,493 TLR sequences from different classes. Clustering analyses revealed variability in TL specificity, stability, and conformational behavior. Nevertheless, natural GAAA/11ntR TL/TLRs, while varying in tertiary stability by ∼2.5 kcal/mol, exhibited conserved TL specificity and conformational properties. Thus, RNAs may tune stability without altering the overall structure of these TL/TLRs. Furthermore, their stability correlated with natural frequency, suggesting thermodynamics as the dominant selection pressure. In contrast, other TL/TLRs displayed heterogenous conformational behavior and appear to not be under strong thermodynamic selection. Our results build toward a generalizable model of RNA-folding thermodynamics based on the properties of isolated motifs, and our characterized TL/TLR library can be used to engineer RNAs with predictable thermodynamic and conformational behavior.
Collapse
|
20
|
Learning the Fastest RNA Folding Path Based on Reinforcement Learning and Monte Carlo Tree Search. Molecules 2021; 26:molecules26154420. [PMID: 34361572 PMCID: PMC8347524 DOI: 10.3390/molecules26154420] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2021] [Revised: 07/17/2021] [Accepted: 07/20/2021] [Indexed: 11/17/2022] Open
Abstract
RNA molecules participate in many important biological processes, and they need to fold into well-defined secondary and tertiary structures to realize their functions. Like the well-known protein folding problem, there is also an RNA folding problem. The folding problem includes two aspects: structure prediction and folding mechanism. Although the former has been widely studied, the latter is still not well understood. Here we present a deep reinforcement learning algorithms 2dRNA-Fold to study the fastest folding paths of RNA secondary structure. 2dRNA-Fold uses a neural network combined with Monte Carlo tree search to select residue pairing step by step according to a given RNA sequence until the final secondary structure is formed. We apply 2dRNA-Fold to several short RNA molecules and one longer RNA 1Y26 and find that their fastest folding paths show some interesting features. 2dRNA-Fold is further trained using a set of RNA molecules from the dataset bpRNA and is used to predict RNA secondary structure. Since in 2dRNA-Fold the scoring to determine next step is based on possible base pairings, the learned or predicted fastest folding path may not agree with the actual folding paths determined by free energy according to physical laws.
Collapse
|
21
|
Baisden JT, Childs-Disney JL, Ryan LS, Disney MD. Affecting RNA biology genome-wide by binding small molecules and chemically induced proximity. Curr Opin Chem Biol 2021; 62:119-129. [PMID: 34118759 PMCID: PMC9264282 DOI: 10.1016/j.cbpa.2021.03.006] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2021] [Revised: 03/24/2021] [Accepted: 03/25/2021] [Indexed: 01/08/2023]
Abstract
The ENCODE and genome-wide association projects have shown that much of the genome is transcribed into RNA and much less is translated into protein. These and other functional studies suggest that the druggable transcriptome is much larger than the druggable proteome. This review highlights approaches to define druggable RNA targets and structure-activity relationships across genomic RNA. Binding compounds can be identified and optimized into structure-specific ligands by using sequence-based design with various modes of action, for example, inhibiting translation or directing pre-mRNA splicing outcomes. In addition, strategies to direct protein activity against an RNA of interest via chemically induced proximity is a burgeoning area that has been validated both in cells and in preclinical animal models, and we describe that it may allow rapid access to new avenues to affect RNA biology. These approaches and the unique modes of action suggest that more RNAs are potentially amenable to targeting than proteins.
Collapse
Affiliation(s)
- Jared T Baisden
- Department of Chemistry, The Scripps Research Institute, 130 Scripps Way, Jupiter, FL 33458 USA
| | - Jessica L Childs-Disney
- Department of Chemistry, The Scripps Research Institute, 130 Scripps Way, Jupiter, FL 33458 USA
| | - Lucas S Ryan
- Department of Chemistry, The Scripps Research Institute, 130 Scripps Way, Jupiter, FL 33458 USA
| | - Matthew D Disney
- Department of Chemistry, The Scripps Research Institute, 130 Scripps Way, Jupiter, FL 33458 USA.
| |
Collapse
|
22
|
Improving RNA Branching Predictions: Advances and Limitations. Genes (Basel) 2021; 12:genes12040469. [PMID: 33805944 PMCID: PMC8064352 DOI: 10.3390/genes12040469] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2021] [Revised: 03/15/2021] [Accepted: 03/18/2021] [Indexed: 11/16/2022] Open
Abstract
Minimum free energy prediction of RNA secondary structures is based on the Nearest Neighbor Thermodynamics Model. While such predictions are typically good, the accuracy can vary widely even for short sequences, and the branching thermodynamics are an important factor in this variance. Recently, the simplest model for multiloop energetics—a linear function of the number of branches and unpaired nucleotides—was found to be the best. Subsequently, a parametric analysis demonstrated that per family accuracy can be improved by changing the weightings in this linear function. However, the extent of improvement was not known due to the ad hoc method used to find the new parameters. Here we develop a branch-and-bound algorithm that finds the set of optimal parameters with the highest average accuracy for a given set of sequences. Our analysis shows that the previous ad hoc parameters are nearly optimal for tRNA and 5S rRNA sequences on both training and testing sets. Moreover, cross-family improvement is possible but more difficult because competing parameter regions favor different families. The results also indicate that restricting the unpaired nucleotide penalty to small values is warranted. This reduction makes analyzing longer sequences using the present techniques more feasible.
Collapse
|
23
|
Zacharias M. Base-Pairing and Base-Stacking Contributions to Double-Stranded DNA Formation. J Phys Chem B 2020; 124:10345-10352. [PMID: 33156627 DOI: 10.1021/acs.jpcb.0c07670] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/02/2023]
Abstract
Double-stranded (ds)DNA formation and dissociation are of fundamental biological importance. The negative DNA charge influences the dsDNA stability. However, the base pairing and the stacking between neighboring bases are responsible for the sequence-dependent stability of dsDNA. The stability of a dsDNA molecule can be estimated from empirical nearest-neighbor models based on contributions assigned to base-pair steps along the DNA and additional parameters because of DNA termini. In efforts to separate contributions, it has been concluded that base stacking dominates dsDNA stability, whereas base pairing contributes negligibly. Using a different model for dsDNA formation, we reanalyze dsDNA stability contributions and conclude that base stacking contributes already at the level of separate ssDNAs but that pairing contributions drive the dsDNA formation. The theoretical model also predicts that stability contributions of base-pair steps that contain only guanine/cytosine, mixed steps, and steps with only adenine/thymine follow the order 6:5:4, respectively, as expected based on the formed hydrogen bonds. The model is fully consistent with the available stacking data and the nearest-neighbor dsDNA parameters. It allows assigning a narrowly distributed value for the effective free energy contribution per formed hydrogen bond during dsDNA formation of -0.72 kcal·mol-1 based entirely on the experimental data.
Collapse
Affiliation(s)
- Martin Zacharias
- Physics Department T38, Technical University of Munich, 85748 Garching, Germany
| |
Collapse
|
24
|
Osmer PS, Singh G, Boris-Lawrie K. A New Approach to 3D Modeling of Inhomogeneous Populations of Viral Regulatory RNA. Viruses 2020; 12:v12101108. [PMID: 33003639 PMCID: PMC7650772 DOI: 10.3390/v12101108] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2020] [Revised: 09/24/2020] [Accepted: 09/27/2020] [Indexed: 12/17/2022] Open
Abstract
Tertiary structure (3D) is the physical context of RNA regulatory activity. Retroviruses are RNA viruses that replicate through the proviral DNA intermediate transcribed by hosts. Proviral transcripts form inhomogeneous populations due to variable structural ensembles of overlapping regulatory RNA motifs in the 5′-untranslated region (UTR), which drive RNAs to be spliced or translated, and/or dimerized and packaged into virions. Genetic studies and structural techniques have provided fundamental input constraints to begin predicting HIV 3D conformations in silico. Using SimRNA and sets of experimentally-determined input constraints of HIVNL4-3 trans-activation responsive sequence (TAR) and pairings of unique-5′ (U5) with dimerization (DIS) or AUG motifs, we calculated a series of 3D models that differ in proximity of 5′-Cap and the junction of TAR and PolyA helices; configuration of primer binding site (PBS)-segment; and two host cofactors binding sites. Input constraints on U5-AUG pairings were most compatible with intramolecular folding of 5′-UTR motifs in energetic minima. Introducing theoretical constraints predicted metastable PolyA region drives orientation of 5′-Cap with TAR, U5 and PBS-segment helices. SimRNA and the workflow developed herein provides viable options to predict 3D conformations of inhomogeneous populations of large RNAs that have been intractable to conventional ensemble methods.
Collapse
Affiliation(s)
- Patrick S. Osmer
- Department of Astronomy, The Ohio State University, Columbus, OH 43210, USA;
| | - Gatikrushna Singh
- Department of Veterinary and Biomedical Sciences, University of Minnesota, Saint Paul, MN 55108, USA;
| | - Kathleen Boris-Lawrie
- Department of Veterinary and Biomedical Sciences, University of Minnesota, Saint Paul, MN 55108, USA;
- Correspondence: ; Tel.: +1-612-625-2100
| |
Collapse
|
25
|
Fang Z, Wang Y, Wang Z, Xu M, Ren S, Yang D, Hong M, Xie W. ERINA Is an Estrogen-Responsive LncRNA That Drives Breast Cancer through the E2F1/RB1 Pathway. Cancer Res 2020; 80:4399-4413. [PMID: 32826278 DOI: 10.1158/0008-5472.can-20-1031] [Citation(s) in RCA: 32] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2020] [Revised: 07/10/2020] [Accepted: 08/18/2020] [Indexed: 01/23/2023]
Abstract
Resistance to therapeutic drugs is a major challenge in the treatment of cancers, including breast cancer. Long noncoding RNAs (lncRNA) are known to have diverse physiologic and pathophysiologic functions, including in cancer. In searching for lncRNA responsible for cancer drug resistance, we identified an intergenic lncRNA ERINA (estrogen inducible lncRNA) as a novel lncRNA highly expressed in multiple cancer types, especially in estrogen receptor-positive (ER+) breast cancers. Expression of ERINA was inversely correlated with survival of patients with ER+ breast cancer and sensitivity to CDK inhibitor in breast cancer cell lines. Functional characterization established ERINA as an oncogenic lncRNA, as knockdown of ERINA in breast cancer cells inhibited cell-cycle progression and tumor cell proliferation in vitro and xenograft tumor growth in vivo. In contrast, overexpression of ERINA promoted cell growth and cell-cycle progression. ERINA promoted cell-cycle progression by interacting with the E2F transcription factor 1 (E2F1), which prevents the binding of E2F1 to the tumor suppressor retinoblastoma protein 1 (RB1). ERINA also functioned as an estrogen and ER-responsive gene, and an intronic ER-binding site was identified as an enhancer that mediates the transactivation of ERINA. In summary, ERINA is an estrogen-responsive oncogenic lncRNA that may serve as a novel biomarker and potential therapeutic target in breast cancer. SIGNIFICANCE: These findings identify ERINA as an estrogen-responsive, oncogenic lncRNA, whose elevated expression may contribute to drug resistance and poor survival of patients with ER+ breast cancer.
Collapse
Affiliation(s)
- Zihui Fang
- Center for Pharmacogenetics and Department of Pharmaceutical Sciences, University of Pittsburgh, Pittsburgh, Pennsylvania.,College of Life Sciences, South China Agricultural University, Guangzhou, China
| | - Yue Wang
- Center for Pharmacogenetics and Department of Pharmaceutical Sciences, University of Pittsburgh, Pittsburgh, Pennsylvania
| | - Zehua Wang
- Center for Pharmacogenetics and Department of Pharmaceutical Sciences, University of Pittsburgh, Pittsburgh, Pennsylvania
| | - Meishu Xu
- Center for Pharmacogenetics and Department of Pharmaceutical Sciences, University of Pittsburgh, Pittsburgh, Pennsylvania
| | - Songrong Ren
- Center for Pharmacogenetics and Department of Pharmaceutical Sciences, University of Pittsburgh, Pittsburgh, Pennsylvania
| | - Da Yang
- Center for Pharmacogenetics and Department of Pharmaceutical Sciences, University of Pittsburgh, Pittsburgh, Pennsylvania
| | - Mei Hong
- College of Life Sciences, South China Agricultural University, Guangzhou, China.
| | - Wen Xie
- Center for Pharmacogenetics and Department of Pharmaceutical Sciences, University of Pittsburgh, Pittsburgh, Pennsylvania. .,Department of Pharmacology and Chemical Biology, University of Pittsburgh, Pittsburgh, Pennsylvania
| |
Collapse
|
26
|
Zhang H, Zhang L, Mathews DH, Huang L. LinearPartition: linear-time approximation of RNA folding partition function and base-pairing probabilities. Bioinformatics 2020; 36:i258-i267. [PMID: 32657379 PMCID: PMC7355276 DOI: 10.1093/bioinformatics/btaa460] [Citation(s) in RCA: 32] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023] Open
Abstract
MOTIVATION RNA secondary structure prediction is widely used to understand RNA function. Recently, there has been a shift away from the classical minimum free energy methods to partition function-based methods that account for folding ensembles and can therefore estimate structure and base pair probabilities. However, the classical partition function algorithm scales cubically with sequence length, and is therefore prohibitively slow for long sequences. This slowness is even more severe than cubic-time free energy minimization due to a substantially larger constant factor in runtime. RESULTS Inspired by the success of our recent LinearFold algorithm that predicts the approximate minimum free energy structure in linear time, we design a similar linear-time heuristic algorithm, LinearPartition, to approximate the partition function and base-pairing probabilities, which is shown to be orders of magnitude faster than Vienna RNAfold and CONTRAfold (e.g. 2.5 days versus 1.3 min on a sequence with length 32 753 nt). More interestingly, the resulting base-pairing probabilities are even better correlated with the ground-truth structures. LinearPartition also leads to a small accuracy improvement when used for downstream structure prediction on families with the longest length sequences (16S and 23S rRNAs), as well as a substantial improvement on long-distance base pairs (500+ nt apart). AVAILABILITY AND IMPLEMENTATION Code: http://github.com/LinearFold/LinearPartition; Server: http://linearfold.org/partition. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- He Zhang
- Baidu Research, Sunnyvale, CA 94089, USA
- School of Electrical Engineering and Computer Science, Oregon State University, Corvallis, OR 97330, USA
| | - Liang Zhang
- School of Electrical Engineering and Computer Science, Oregon State University, Corvallis, OR 97330, USA
| | - David H Mathews
- Department of Biochemistry & Biophysics, University of Rochester Medical Center, Rochester, NY 48306, USA
- Center for RNA Biology, University of Rochester Medical Center, Rochester, NY 48306, USA
- Department of Biostatistics & Computational Biology, University of Rochester Medical Center, Rochester, NY 48306, USA
| | - Liang Huang
- Baidu Research, Sunnyvale, CA 94089, USA
- School of Electrical Engineering and Computer Science, Oregon State University, Corvallis, OR 97330, USA
| |
Collapse
|
27
|
Ward M, Sun H, Datta A, Wise M, Mathews DH. Determining parameters for non-linear models of multi-loop free energy change. Bioinformatics 2020; 35:4298-4306. [PMID: 30923811 DOI: 10.1093/bioinformatics/btz222] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2018] [Revised: 02/10/2019] [Accepted: 03/27/2019] [Indexed: 12/12/2022] Open
Abstract
MOTIVATION Predicting the secondary structure of RNA is a fundamental task in bioinformatics. Algorithms that predict secondary structure given only the primary sequence, and a model to evaluate the quality of a structure, are an integral part of this. These algorithms have been updated as our model of RNA thermodynamics changed and expanded. An exception to this has been the treatment of multi-loops. Although more advanced models of multi-loop free energy change have been suggested, a simple, linear model has been used since the 1980s. However, recently, new dynamic programing algorithms for secondary structure prediction that could incorporate these models were presented. Unfortunately, these models appear to have lower accuracy for secondary structure prediction. RESULTS We apply linear regression and a new parameter optimization algorithm to find better parameters for the existing linear model and advanced non-linear multi-loop models. These include the Jacobson-Stockmayer and Aalberts & Nandagopal models. We find that the current linear model parameters may be near optimal for the linear model, and that no advanced model performs better than the existing linear model parameters even after parameter optimization. AVAILABILITY AND IMPLEMENTATION Source code and data is available at https://github.com/maxhwardg/advanced_multiloops. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Max Ward
- Computer Science & Software Engineering, The University of Western Australia, Crawley, WA, Australia
| | - Hongying Sun
- Department of Biochemistry & Biophysics, University of Rochester, Rochester, NY, USA.,Center for RNA Biology, University of Rochester, Rochester, NY, USA
| | - Amitava Datta
- Computer Science & Software Engineering, The University of Western Australia, Crawley, WA, Australia
| | - Michael Wise
- Computer Science & Software Engineering, The University of Western Australia, Crawley, WA, Australia.,The Marshall Centre for Infectious Diseases Research and Training, The University of Western Australia, Crawley, WA, Australia
| | - David H Mathews
- Department of Biostatistics & Computational Biology, University of Rochester, Rochester, NY, USA
| |
Collapse
|
28
|
The challenge of RNA branching prediction: a parametric analysis of multiloop initiation under thermodynamic optimization. J Struct Biol 2020; 210:107475. [PMID: 32032754 DOI: 10.1016/j.jsb.2020.107475] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2019] [Revised: 09/25/2019] [Accepted: 01/30/2020] [Indexed: 12/29/2022]
Abstract
Prediction of RNA base pairings yields insight into molecular structure, and therefore function. The most common methods predict an optimal structure under the standard thermodynamic model. One component of this model is the equation which governs the cost of branching, where three or more helical "arms" radiate out from a multiloop (also known as a junction). The multiloop initiation equation has three parameters; changing those values can significantly alter the predicted structure. We give a complete analysis of the prediction accuracy, stability, and robustness for all possible parameter combinations for a diverse set of tRNA sequences, and also for 5S rRNA. We find that the accuracy can often be substantially improved on a per sequence basis. However, simultaneous improvement within families, and most especially between families, remains a challenge.
Collapse
|
29
|
Jelínek J, Hoksza D, Hajič J, Pešek J, Drozen J, Hladík T, Klimpera M, Vohradský J, Pánek J. rPredictorDB: a predictive database of individual secondary structures of RNAs and their formatted plots. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2020; 2019:5479229. [PMID: 31032840 PMCID: PMC6482342 DOI: 10.1093/database/baz047] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/29/2018] [Revised: 03/01/2019] [Accepted: 03/21/2019] [Indexed: 12/11/2022]
Abstract
Secondary data structure of RNA molecules provides insights into the identity and function of RNAs. With RNAs readily sequenced, the question of their structural characterization is increasingly important. However, RNA structure is difficult to acquire. Its experimental identification is extremely technically demanding, while computational prediction is not accurate enough, especially for large structures of long sequences. We address this difficult situation with rPredictorDB, a predictive database of RNA secondary structures that aims to form a middle ground between experimentally identified structures in PDB and predicted consensus secondary structures in Rfam. The database contains individual secondary structures predicted using a tool for template-based prediction of RNA secondary structure for the homologs of the RNA families with at least one homolog with experimentally solved structure. Experimentally identified structures are used as the structural templates and thus the prediction has higher reliability than de novo predictions in Rfam. The sequences are downloaded from public resources. So far rPredictorDB covers 7365 RNAs with their secondary structures. Plots of the secondary structures use the Traveler package for readable display of RNAs with long sequences and complex structures, such as ribosomal RNAs. The RNAs in the output of rPredictorDB are extensively annotated and can be viewed, browsed, searched and downloaded according to taxonomic, sequence and structure data. Additionally, structure of user-provided sequences can be predicted using the templates stored in rPredictorDB.
Collapse
Affiliation(s)
- Jan Jelínek
- Department of Software Engineering, Faculty of Mathematics and Physics, Charles University, Ke Karlovu, Praha.,Laboratory of Bioinformatics, Institute of Microbiology, The Czech Academy of Sciences, Videnska, Praha
| | - David Hoksza
- Department of Software Engineering, Faculty of Mathematics and Physics, Charles University, Ke Karlovu, Praha.,Luxembourg Centre for Systems Biomedicine, University of Luxembourg, avenue du Swing, Belvaux
| | - Jan Hajič
- Department of Software Engineering, Faculty of Mathematics and Physics, Charles University, Ke Karlovu, Praha
| | - Jan Pešek
- Department of Software Engineering, Faculty of Mathematics and Physics, Charles University, Ke Karlovu, Praha
| | - Jan Drozen
- Department of Software Engineering, Faculty of Mathematics and Physics, Charles University, Ke Karlovu, Praha
| | - Tomáš Hladík
- Department of Software Engineering, Faculty of Mathematics and Physics, Charles University, Ke Karlovu, Praha
| | - Michal Klimpera
- Department of Software Engineering, Faculty of Mathematics and Physics, Charles University, Ke Karlovu, Praha
| | - Jiří Vohradský
- Laboratory of Bioinformatics, Institute of Microbiology, The Czech Academy of Sciences, Videnska, Praha
| | - Josef Pánek
- Laboratory of Bioinformatics, Institute of Microbiology, The Czech Academy of Sciences, Videnska, Praha
| |
Collapse
|
30
|
Zhou G, Loper J, Geman S. Base-pair ambiguity and the kinetics of RNA folding. BMC Bioinformatics 2019; 20:666. [PMID: 31830902 PMCID: PMC6909616 DOI: 10.1186/s12859-019-3303-6] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2019] [Accepted: 12/02/2019] [Indexed: 01/28/2023] Open
Abstract
Background A pairings of nucleotide sequences. Given this forbidding free-energy landscape, mechanisms have evolved that contribute to a directed and efficient folding process, including catalytic proteins and error-detecting chaperones. Among structural RNA molecules we make a distinction between “bound” molecules, which are active as part of ribonucleoprotein (RNP) complexes, and “unbound,” with physiological functions performed without necessarily being bound in RNP complexes. We hypothesized that unbound molecules, lacking the partnering structure of a protein, would be more vulnerable than bound molecules to kinetic traps that compete with native stem structures. We defined an “ambiguity index”—a normalized function of the primary and secondary structure of an individual molecule that measures the number of kinetic traps available to nucleotide sequences that are paired in the native structure, presuming that unbound molecules would have lower indexes. The ambiguity index depends on the purported secondary structure, and was computed under both the comparative (“gold standard”) and an equilibrium-based prediction which approximates the minimum free energy (MFE) structure. Arguing that kinetically accessible metastable structures might be more biologically relevant than thermodynamic equilibrium structures, we also hypothesized that MFE-derived ambiguities would be less effective in separating bound and unbound molecules. Results We have introduced an intuitive and easily computed function of primary and secondary structures that measures the availability of complementary sequences that could disrupt the formation of native stems on a given molecule—an ambiguity index. Using comparative secondary structures, the ambiguity index is systematically smaller among unbound than bound molecules, as expected. Furthermore, the effect is lost when the presumably more accurate comparative structure is replaced instead by the MFE structure. Conclusions A statistical analysis of the relationship between the primary and secondary structures of non-coding RNA molecules suggests that stem-disrupting kinetic traps are substantially less prevalent in molecules not participating in RNP complexes. In that this distinction is apparent under the comparative but not the MFE secondary structure, the results highlight a possible deficiency in structure predictions when based upon assumptions of thermodynamic equilibrium.
Collapse
Affiliation(s)
| | - Jackson Loper
- Data Science Institute, Columbia University, New York, NY, USA
| | - Stuart Geman
- Division of Applied Mathematics, Brown University, Providence, RI, USA
| |
Collapse
|
31
|
Wang J, Gribskov M. IRESpy: an XGBoost model for prediction of internal ribosome entry sites. BMC Bioinformatics 2019; 20:409. [PMID: 31362694 PMCID: PMC6664791 DOI: 10.1186/s12859-019-2999-7] [Citation(s) in RCA: 35] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2019] [Accepted: 07/17/2019] [Indexed: 01/08/2023] Open
Abstract
Background Internal ribosome entry sites (IRES) are segments of mRNA found in untranslated regions that can recruit the ribosome and initiate translation independently of the 5′ cap-dependent translation initiation mechanism. IRES usually function when 5′ cap-dependent translation initiation has been blocked or repressed. They have been widely found to play important roles in viral infections and cellular processes. However, a limited number of confirmed IRES have been reported due to the requirement for highly labor intensive, slow, and low efficiency laboratory experiments. Bioinformatics tools have been developed, but there is no reliable online tool. Results This paper systematically examines the features that can distinguish IRES from non-IRES sequences. Sequence features such as kmer words, structural features such as QMFE, and sequence/structure hybrid features are evaluated as possible discriminators. They are incorporated into an IRES classifier based on XGBoost. The XGBoost model performs better than previous classifiers, with higher accuracy and much shorter computational time. The number of features in the model has been greatly reduced, compared to previous predictors, by including global kmer and structural features. The contributions of model features are well explained by LIME and SHapley Additive exPlanations. The trained XGBoost model has been implemented as a bioinformatics tool for IRES prediction, IRESpy (https://irespy.shinyapps.io/IRESpy/), which has been applied to scan the human 5′ UTR and find novel IRES segments. Conclusions IRESpy is a fast, reliable, high-throughput IRES online prediction tool. It provides a publicly available tool for all IRES researchers, and can be used in other genomics applications such as gene annotation and analysis of differential gene expression. Electronic supplementary material The online version of this article (10.1186/s12859-019-2999-7) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Junhui Wang
- Biological Sciences Department, Purdue University, West Lafayette, IN, USA
| | - Michael Gribskov
- Biological Sciences Department, Purdue University, West Lafayette, IN, USA.
| |
Collapse
|
32
|
Auslander N, Wolf YI, Shabalina SA, Koonin EV. A unique insert in the genomes of high-risk human papillomaviruses with a predicted dual role in conferring oncogenic risk. F1000Res 2019; 8:1000. [PMID: 31448109 PMCID: PMC6685453 DOI: 10.12688/f1000research.19590.2] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 09/17/2019] [Indexed: 12/12/2022] Open
Abstract
The differences between high risk and low risk human papillomaviruses (HR-HPV and LR-HPV, respectively) that contribute to the tumorigenic potential of HR-HPV are not well understood but can be expected to involve the HPV oncoproteins, E6 and E7. We combine genome comparison and machine learning techniques to identify a previously unnoticed insert near the 3’-end of the E6 oncoprotein gene that is unique to HR-HPV. Analysis of the insert sequence suggests that it exerts a dual effect, by creating a PDZ domain-binding motif at the C-terminus of E6, as well as eliminating the overlap between the E6 and E7 coding regions in HR-HPV. We show that, as a result, the insert might enable coupled termination-reinitiation of the E6 and E7 genes, supported by motifs complementary to the human 18S rRNA. We hypothesize that the added functionality of E6 and positive regulation of E7 expression jointly account for the tumorigenic potential of HR-HPV.
Collapse
Affiliation(s)
- Noam Auslander
- National Center for Biotechnology Information, National Institutes of Health, USA, Bethesda, Maryland, 20814, USA
| | - Yuri I Wolf
- National Center for Biotechnology Information, National Institutes of Health, USA, Bethesda, Maryland, 20814, USA
| | - Svetlana A Shabalina
- National Center for Biotechnology Information, National Institutes of Health, USA, Bethesda, Maryland, 20814, USA
| | - Eugene V Koonin
- National Center for Biotechnology Information, National Institutes of Health, USA, Bethesda, Maryland, 20814, USA
| |
Collapse
|
33
|
Zuber J, Mathews DH. Estimating uncertainty in predicted folding free energy changes of RNA secondary structures. RNA (NEW YORK, N.Y.) 2019; 25:747-754. [PMID: 30952689 PMCID: PMC6521603 DOI: 10.1261/rna.069203.118] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/08/2018] [Accepted: 04/02/2019] [Indexed: 06/09/2023]
Abstract
Nearest neighbor parameters for estimating the folding stability of RNA are commonly used in secondary structure prediction, for generating folding ensembles of structures, and for analyzing RNA function. Previously, we demonstrated that we could quantify the uncertainties in each nearest neighbor parameter by perturbing the underlying optical melting data within experimental error and rederiving the parameters, which accounts for the substantial correlations that exist between the parameters. In this contribution, we describe a method to estimate uncertainty in the estimated folding stabilities of RNA structures, accounting for correlations in the nearest neighbor parameters. This method is incorporated in the RNA structure software package.
Collapse
Affiliation(s)
- Jeffrey Zuber
- Department of Biochemistry and Biophysics and Center for RNA Biology, University of Rochester Medical Center, Rochester, New York, 14642, USA
| | - David H Mathews
- Department of Biochemistry and Biophysics and Center for RNA Biology, University of Rochester Medical Center, Rochester, New York, 14642, USA
- Department of Biostatistics and Computational Biology, University of Rochester Medical Center, Rochester, New York, 14642, USA
| |
Collapse
|
34
|
Geary C, Meunier PÉ, Schabanel N, Seki S. Oritatami: A Computational Model for Molecular Co-Transcriptional Folding. Int J Mol Sci 2019; 20:ijms20092259. [PMID: 31067813 PMCID: PMC6539498 DOI: 10.3390/ijms20092259] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2019] [Revised: 04/25/2019] [Accepted: 04/30/2019] [Indexed: 12/12/2022] Open
Abstract
We introduce and study the computational power of Oritatami, a theoretical model that explores greedy molecular folding, whereby a molecular strand begins to fold before its production is complete. This model is inspired by our recent experimental work demonstrating the construction of shapes at the nanoscale from RNA, where strands of RNA fold into programmable shapes during their transcription from an engineered sequence of synthetic DNA. In the model of Oritatami, we explore the process of folding a single-strand bit by bit in such a way that the final fold emerges as a space-time diagram of computation. One major requirement in order to compute within this model is the ability to program a single sequence to fold into different shapes dependent on the state of the surrounding inputs. Another challenge is to embed all of the computing components within a contiguous strand, and in such a way that different fold patterns of the same strand perform different functions of computation. Here, we introduce general design techniques to solve these challenges in the Oritatami model. Our main result in this direction is the demonstration of a periodic Oritatami system that folds upon itself algorithmically into a prescribed set of shapes, depending on its current local environment, and whose final folding displays the sequence of binary integers from 0 to N=2k−1 with a seed of size O(k). We prove that designing Oritatami is NP-hard in the number of possible local environments for the folding. Nevertheless, we provide an efficient algorithm, linear in the length of the sequence, that solves the Oritatami design problem when the number of local environments is a small fixed constant. This shows that this problem is in fact fixed parameter tractable (FPT) and can thus be solved in practice efficiently. We hope that the numerous structural strategies employed in Oritatami enabling computation will inspire new architectures for computing in RNA that take advantage of the rapid kinetic-folding of RNA.
Collapse
Affiliation(s)
- Cody Geary
- Computer Science Computation and Neural Systems Bioengineering Caltech, MS 136-93, Moore Building, Pasadena, CA 91125, USA.
| | | | - Nicolas Schabanel
- CNRS, École normale supérieure de Lyon (LIP), CEDEX 07, 69364 Lyon, France.
| | - Shinnosuke Seki
- Computer and Network Engineering Dept, University of Electro-Communications, 1-5-1, Chofugaoka, Chofu, Tokyo 1828585, Japan.
| |
Collapse
|
35
|
Mathews DH. How to benchmark RNA secondary structure prediction accuracy. Methods 2019; 162-163:60-67. [PMID: 30951834 DOI: 10.1016/j.ymeth.2019.04.003] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2018] [Revised: 03/24/2019] [Accepted: 04/01/2019] [Indexed: 11/18/2022] Open
Abstract
RNA secondary structure prediction is widely used. As new methods are developed, these are often benchmarked for accuracy against existing methods. This review discusses good practices for performing these benchmarks, including the choice of benchmarking structures, metrics to quantify accuracy, the importance of allowing flexibility for pairs in the accepted structure, and the importance of statistical testing for significance.
Collapse
Affiliation(s)
- David H Mathews
- Center for RNA Biology, Department of Biochemistry & Biophysics, and Department of Biostatistics & Computational Biology, University of Rochester Medical Center, 601 Elmwood Avenue, Box 712, Rochester, NY 14642, United States.
| |
Collapse
|
36
|
Jelínek J, Pánek J. cpPredictor: a web server for template-based prediction of RNA secondary structure. Bioinformatics 2019; 35:1231-1233. [PMID: 30169571 DOI: 10.1093/bioinformatics/bty753] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2018] [Revised: 07/12/2018] [Accepted: 08/28/2018] [Indexed: 11/12/2022] Open
Abstract
SUMMARY We present the cpPredictor webserver that implements a novel template-based method for prediction of secondary structure of RNA. The method outperforms available prediction methods as it uses RNA structures of related molecules, either predicted or experimentally identified, as structural templates. The server aims at three major tasks: i) prediction of RNA secondary structures that are difficult to predict by available methods, ii) characterization of uncharacterized RNAs as compatible or incompatible with a chosen template structure and iii) an identification of the most relevant structure among different candidate structures of a single RNA ambiguously predicted by available methods. The web server is accompanied with a comprehensive documentation. AVAILABILITY AND IMPLEMENTATION The web server is freely available at http://cppredictor.elixir-czech.cz/. The source code of the cpPredictor algorithm is freely available from the webserver under the Apache License, Version 2.0.
Collapse
Affiliation(s)
- Jan Jelínek
- Laboratory of Bioinformatics, Institute of Microbiology, The Czech Academy of Sciences, Prague 1, Czech Republic
| | - Josef Pánek
- Laboratory of Bioinformatics, Institute of Microbiology, The Czech Academy of Sciences, Prague 1, Czech Republic
| |
Collapse
|
37
|
Wang L, Liu Y, Zhong X, Liu H, Lu C, Li C, Zhang H. DMfold: A Novel Method to Predict RNA Secondary Structure With Pseudoknots Based on Deep Learning and Improved Base Pair Maximization Principle. Front Genet 2019; 10:143. [PMID: 30886627 PMCID: PMC6409321 DOI: 10.3389/fgene.2019.00143] [Citation(s) in RCA: 37] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2018] [Accepted: 02/12/2019] [Indexed: 01/21/2023] Open
Abstract
While predicting the secondary structure of RNA is vital for researching its function, determining RNA secondary structure is challenging, especially for that with pseudoknots. Typically, several excellent computational methods can be utilized to predict the secondary structure (with or without pseudoknots), but they have their own merits and demerits. These methods can be classified into two categories: the multi-sequence method and the single-sequence method. The main advantage of the multi-sequence method lies in its use of the auxiliary sequences to assist in predicting the secondary structure, but it can only successfully predict in the presence of multiple highly homologous sequences. The single-sequence method is associated with the major merit of easy operation (only need the target sequence to predict secondary structure), but its folding parameters are the common features of diversity RNA, which cannot describe the unique characteristics of RNA, thus potentially resulting in the low prediction accuracy in some RNA. In this paper, "DMfold," a method based on the Deep Learning and Improved Base Pair Maximization Principle, is proposed to predict the secondary structure with pseudoknots, which fully absorbs the advantages and avoids some disadvantages of those two methods. Notably, DMfold could predict the secondary structure of RNA by learning similar RNA in the known structures, which uses the similar RNA sequences instead of the highly homogeneous sequences in the multi-sequence method, thereby reducing the requirement for auxiliary sequences. In DMfold, it only needs to input the target sequence to predict the secondary structure. Its folding parameters are fully extracted automatically by deep learning, which could avoid the lack of folding parameters in the single-sequence method. Experiments show that our method is not only simple to operate, but also improves the prediction accuracy compared to multiple excellent prediction methods. A repository containing our code can be found at https://github.com/linyuwangPHD/RNA-Secondary-Structure-Database.
Collapse
Affiliation(s)
- Linyu Wang
- College of Computer Science and Technology, Jilin University, Changchun, China
- Key Laboratory of Symbolic Computation and Knowledge Engineering, Ministry of Education, Jilin University, Changchun, China
| | - Yuanning Liu
- College of Computer Science and Technology, Jilin University, Changchun, China
- Key Laboratory of Symbolic Computation and Knowledge Engineering, Ministry of Education, Jilin University, Changchun, China
| | - Xiaodan Zhong
- College of Computer Science and Technology, Jilin University, Changchun, China
- Key Laboratory of Symbolic Computation and Knowledge Engineering, Ministry of Education, Jilin University, Changchun, China
- Department of Pediatric Oncology, The First Hospital of Jilin University, Changchun, China
| | - Haiming Liu
- College of Computer Science and Technology, Jilin University, Changchun, China
- Key Laboratory of Symbolic Computation and Knowledge Engineering, Ministry of Education, Jilin University, Changchun, China
| | - Chao Lu
- College of Computer Science and Technology, Jilin University, Changchun, China
- Key Laboratory of Symbolic Computation and Knowledge Engineering, Ministry of Education, Jilin University, Changchun, China
| | - Cong Li
- College of Computer Science and Technology, Jilin University, Changchun, China
- Key Laboratory of Symbolic Computation and Knowledge Engineering, Ministry of Education, Jilin University, Changchun, China
| | - Hao Zhang
- College of Computer Science and Technology, Jilin University, Changchun, China
- Key Laboratory of Symbolic Computation and Knowledge Engineering, Ministry of Education, Jilin University, Changchun, China
| |
Collapse
|
38
|
Schroeder SJ. Challenges and approaches to predicting RNA with multiple functional structures. RNA (NEW YORK, N.Y.) 2018; 24:1615-1624. [PMID: 30143552 PMCID: PMC6239171 DOI: 10.1261/rna.067827.118] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
The revolution in sequencing technology demands new tools to interpret the genetic code. As in vivo transcriptome-wide chemical probing techniques advance, new challenges emerge in the RNA folding problem. The emphasis on one sequence folding into a single minimum free energy structure is fading as a new focus develops on generating RNA structural ensembles and identifying functional structural features in ensembles. This review describes an efficient combinatorially complete method and three free energy minimization approaches to predicting RNA structures with more than one functional fold, as well as two methods for analysis of a thermodynamics-based Boltzmann ensemble of structures. The review then highlights two examples of viral RNA 3'-UTR regions that fold into more than one conformation and have been characterized by single molecule fluorescence energy resonance transfer or NMR spectroscopy. These examples highlight the different approaches and challenges in predicting structure and function from sequence for RNA with multiple biological roles and folds. More well-defined examples and new metrics for measuring differences in RNA structures will guide future improvements in prediction of RNA structure and function from sequence.
Collapse
Affiliation(s)
- Susan J Schroeder
- Department of Chemistry and Biochemistry, Department of Microbiology and Plant Biology, University of Oklahoma, Norman, Oklahoma 73019, USA
| |
Collapse
|
39
|
Lin L, McKerrow WH, Richards B, Phonsom C, Lawrence CE. Characterization and visualization of RNA secondary structure Boltzmann ensemble via information theory. BMC Bioinformatics 2018; 19:82. [PMID: 29506466 PMCID: PMC5836418 DOI: 10.1186/s12859-018-2078-5] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2017] [Accepted: 02/20/2018] [Indexed: 12/26/2022] Open
Abstract
Background The nearest neighbor model and associated dynamic programming algorithms allow for the efficient estimation of the RNA secondary structure Boltzmann ensemble. However because a given RNA secondary structure only contains a fraction of the possible helices that could form from a given sequence, the Boltzmann ensemble is multimodal. Several methods exist for clustering structures and finding those modes. However less focus is given to exploring the underlying reasons for this multimodality: the presence of conflicting basepairs. Information theory, or more specifically mutual information, provides a method to identify those basepairs that are key to the secondary structure. Results To this end we find most informative basepairs and visualize the effect of these basepairs on the secondary structure. Knowing whether a most informative basepair is present tells us not only the status of the particular pair but also provides a large amount of information about which other pairs are present or not present. We find that a few basepairs account for a large amount of the structural uncertainty. The identification of these pairs indicates small changes to sequence or stability that will have a large effect on structure. Conclusion We provide a novel algorithm that uses mutual information to identify the key basepairs that lead to a multimodal Boltzmann distribution. We then visualize the effect of these pairs on the overall Boltzmann ensemble.
Collapse
Affiliation(s)
- Luan Lin
- Center for Devices and Radiological Health, U.S. Food and Drug Administration, Silver Spring, 20993, MD, USA
| | - Wilson H McKerrow
- Division of Applied Mathematics, Brown University, Providence, 02912, RI, USA
| | | | - Chukiat Phonsom
- Department of Mathematics, University of Southern California, Los Angeles, 90089, CA, USA
| | - Charles E Lawrence
- Division of Applied Mathematics, Brown University, Providence, 02912, RI, USA.
| |
Collapse
|
40
|
Pánek J, Modrák M, Schwarz M. An Algorithm for Template-Based Prediction of Secondary Structures of Individual RNA Sequences. Front Genet 2017; 8:147. [PMID: 29067038 PMCID: PMC5641303 DOI: 10.3389/fgene.2017.00147] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2017] [Accepted: 09/25/2017] [Indexed: 11/24/2022] Open
Abstract
While understanding the structure of RNA molecules is vital for deciphering their functions, determining RNA structures experimentally is exceptionally hard. At the same time, extant approaches to computational RNA structure prediction have limited applicability and reliability. In this paper we provide a method to solve a simpler yet still biologically relevant problem: prediction of secondary RNA structure using structure of different molecules as a template. Our method identifies conserved and unconserved subsequences within an RNA molecule. For conserved subsequences, the template structure is directly transferred into the generated structure and combined with de-novo predicted structure for the unconserved subsequences with low evolutionary conservation. The method also determines, when the generated structure is unreliable. The method is validated using experimentally identified structures. The accuracy of the method exceeds that of classical prediction algorithms and constrained prediction methods. This is demonstrated by comparison using large number of heterogeneous RNAs. The presented method is fast and robust, and useful for various applications requiring knowledge of secondary structures of individual RNA sequences.
Collapse
Affiliation(s)
- Josef Pánek
- Laboratory of Bioinformatics, Institute of Microbiology of the Academy of Sciences of Czech Republic, Prague, Czechia
| | - Martin Modrák
- Laboratory of Bioinformatics, Institute of Microbiology of the Academy of Sciences of Czech Republic, Prague, Czechia
| | - Marek Schwarz
- Laboratory of Bioinformatics, Institute of Microbiology of the Academy of Sciences of Czech Republic, Prague, Czechia
| |
Collapse
|
41
|
Qi F, Frishman D. Melting temperature highlights functionally important RNA structure and sequence elements in yeast mRNA coding regions. Nucleic Acids Res 2017; 45:6109-6118. [PMID: 28335026 PMCID: PMC5449622 DOI: 10.1093/nar/gkx161] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2016] [Accepted: 02/24/2017] [Indexed: 11/13/2022] Open
Abstract
Secondary structure elements in the coding regions of mRNAs play an important role in gene expression and regulation, but distinguishing functional from non-functional structures remains challenging. Here we investigate the dependence of sequence–structure relationships in the coding regions on temperature based on the recent PARTE data by Wan et al. Our main finding is that the regions with high and low thermostability (high Tm and low Tm regions) are under evolutionary pressure to preserve RNA secondary structure and primary sequence, respectively. Sequences of low Tm regions display a higher degree of evolutionary conservation compared to high Tm regions. Low Tm regions are under strong synonymous constraint, while high Tm regions are not. These findings imply that high Tm regions contain thermo-stable functionally important RNA structures, which impose relaxed evolutionary constraint on sequence as long as the base-pairing patterns remain intact. By contrast, low thermostability regions contain single-stranded functionally important conserved RNA sequence elements accessible for binding by other molecules. We also find that theoretically predicted structures of paralogous mRNA pairs become more similar with growing temperature, while experimentally measured structures tend to diverge, which implies that the melting pathways of RNA structures cannot be fully captured by current computational approaches.
Collapse
Affiliation(s)
- Fei Qi
- Department of Bioinformatics, Technische Universität München, Wissenschaftzentrum Weihenstephan, Maximus-von-Imhof-Forum 3, D-85354 Freising, Germany
| | - Dmitrij Frishman
- Department of Bioinformatics, Technische Universität München, Wissenschaftzentrum Weihenstephan, Maximus-von-Imhof-Forum 3, D-85354 Freising, Germany.,St Petersburg State Polytechnic University, St Petersburg 195251, Russia
| |
Collapse
|
42
|
Yu Z, Cowan JA. Catalytic Metallodrugs: Substrate-Selective Metal Catalysts as Therapeutics. Chemistry 2017; 23:14113-14127. [PMID: 28688119 DOI: 10.1002/chem.201701714] [Citation(s) in RCA: 43] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2017] [Indexed: 12/13/2022]
Affiliation(s)
- Zhen Yu
- Department of Chemistry and Biochemistry; The Ohio State University; 100 West 18th Avenue Columbus OH 43210 USA
| | - James A. Cowan
- Department of Chemistry and Biochemistry; The Ohio State University; 100 West 18th Avenue Columbus OH 43210 USA
| |
Collapse
|
43
|
Rogers E, Murrugarra D, Heitsch C. Conditioning and Robustness of RNA Boltzmann Sampling under Thermodynamic Parameter Perturbations. Biophys J 2017. [PMID: 28629618 DOI: 10.1016/j.bpj.2017.05.026] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/23/2023] Open
Abstract
Understanding how RNA secondary structure prediction methods depend on the underlying nearest-neighbor thermodynamic model remains a fundamental challenge in the field. Minimum free energy (MFE) predictions are known to be "ill conditioned" in that small changes to the thermodynamic model can result in significantly different optimal structures. Hence, the best practice is now to sample from the Boltzmann distribution, which generates a set of suboptimal structures. Although the structural signal of this Boltzmann sample is known to be robust to stochastic noise, the conditioning and robustness under thermodynamic perturbations have yet to be addressed. We present here a mathematically rigorous model for conditioning inspired by numerical analysis, and also a biologically inspired definition for robustness under thermodynamic perturbation. We demonstrate the strong correlation between conditioning and robustness and use its tight relationship to define quantitative thresholds for well versus ill conditioning. These resulting thresholds demonstrate that the majority of the sequences are at least sample robust, which verifies the assumption of sampling's improved conditioning over the MFE prediction. Furthermore, because we find no correlation between conditioning and MFE accuracy, the presence of both well- and ill-conditioned sequences indicates the continued need for both thermodynamic model refinements and alternate RNA structure prediction methods beyond the physics-based ones.
Collapse
Affiliation(s)
- Emily Rogers
- School of Computational Science and Engineering, Georgia Institute of Technology, Atlanta, Georgia
| | - David Murrugarra
- Department of Mathematics, University of Kentucky, Lexington, Kentucky
| | - Christine Heitsch
- School of Mathematics, Georgia Institute of Technology, Atlanta, Georgia.
| |
Collapse
|
44
|
Woods CT, Lackey L, Williams B, Dokholyan NV, Gotz D, Laederach A. Comparative Visualization of the RNA Suboptimal Conformational Ensemble In Vivo. Biophys J 2017. [PMID: 28625696 PMCID: PMC5529173 DOI: 10.1016/j.bpj.2017.05.031] [Citation(s) in RCA: 26] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
Abstract
When a ribonucleic acid (RNA) molecule folds, it often does not adopt a single, well-defined conformation. The folding energy landscape of an RNA is highly dependent on its nucleotide sequence and molecular environment. Cellular molecules sometimes alter the energy landscape, thereby changing the ensemble of likely low-energy conformations. The effects of these energy landscape changes on the conformational ensemble are particularly challenging to visualize for large RNAs. We have created a robust approach for visualizing the conformational ensemble of RNAs that is well suited for in vitro versus in vivo comparisons. Our method creates a stable map of conformational space for a given RNA sequence. We first identify single point mutations in the RNA that maximally sample suboptimal conformational space based on the ensemble’s partition function. Then, we cluster these diverse ensembles to identify the most diverse partition functions for Boltzmann stochastic sampling. By using, to our knowledge, a novel nestedness distance metric, we iteratively add mutant suboptimal ensembles to converge on a stable 2D map of conformational space. We then compute the selective 2′ hydroxyl acylation by primer extension (SHAPE)-directed ensemble for the RNA folding under different conditions, and we project these ensembles on the map to visualize. To validate our approach, we established a conformational map of the Vibrio vulnificus add adenine riboswitch that reveals five classes of structures. In the presence of adenine, projection of the SHAPE-directed sampling correctly identified the on-conformation; without the ligand, only off-conformations were visualized. We also collected the whole-transcript in vitro and in vivo SHAPE-MaP for human β-actin messenger RNA that revealed similar global folds in both conditions. Nonetheless, a comparison of in vitro and in vivo data revealed that specific regions exhibited significantly different SHAPE-MaP profiles indicative of structural rearrangements, including rearrangement consistent with binding of the zipcode protein in a region distal to the stop codon.
Collapse
Affiliation(s)
- Chanin T Woods
- Curriculum in Bioinformatics and Computational Biology, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina; Department of Biology, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina
| | - Lela Lackey
- Department of Biology, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina
| | - Benfeard Williams
- Department of Biochemistry and Biophysics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina
| | - Nikolay V Dokholyan
- Department of Biochemistry and Biophysics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina
| | - David Gotz
- Carolina Health Informatics Program, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina; School of Information and Library Science, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina
| | - Alain Laederach
- Curriculum in Bioinformatics and Computational Biology, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina; Department of Biology, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina.
| |
Collapse
|
45
|
Umu SU, Gardner PP. A comprehensive benchmark of RNA-RNA interaction prediction tools for all domains of life. Bioinformatics 2017; 33:988-996. [PMID: 27993777 PMCID: PMC5408919 DOI: 10.1093/bioinformatics/btw728] [Citation(s) in RCA: 26] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2016] [Accepted: 11/13/2016] [Indexed: 12/15/2022] Open
Abstract
Motivation The aim of this study is to assess the performance of RNA-RNA interaction prediction tools for all domains of life. Results Minimum free energy (MFE) and alignment methods constitute most of the current RNA interaction prediction algorithms. The MFE tools that include accessibility (i.e. RNAup, IntaRNA and RNAplex) to the final predicted binding energy have better true positive rates (TPRs) with a high positive predictive values (PPVs) in all datasets than other methods. They can also differentiate almost half of the native interactions from background. The algorithms that include effects of internal binding energies to their model and alignment methods seem to have high TPR but relatively low associated PPV compared to accessibility based methods. Availability and Implementation We shared our wrapper scripts and datasets at Github (github.com/UCanCompBio/RNA_Interactions_Benchmark). All parameters are documented for personal use. Contact sinan.umu@pg.canterbury.ac.nz. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Sinan Ugur Umu
- School of Biological Sciences.,Biomolecular Interaction Centre
| | - Paul P Gardner
- School of Biological Sciences.,Biomolecular Interaction Centre.,Bio-Protection Research Centre, University of Canterbury, Christchurch, New Zealand
| |
Collapse
|
46
|
Liu Y, Zhao Q, Zhang H, Xu R, Li Y, Wei L. A New Method to Predict RNA Secondary Structure Based on RNA Folding Simulation. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2016; 13:990-995. [PMID: 26552091 DOI: 10.1109/tcbb.2015.2496347] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
RNA plays an important role in various biological processes; hence, it is essential when determining the functions of RNA to research its secondary structures. So far, the accuracy of RNA secondary structure prediction remains an area in need of improvement. This paper presents a novel method for predicting RNA secondary structure based on an RNA folding simulation model. This model assumes that the process of RNA folding from the random coil state to full structure is staged and in every stage of folding, the final state of an RNA is determined by the optimal combination of helical regions, which are urgently essential to dynamics of RNA formation. This paper proposes the First Large Free Energy Difference (FLED) in order to find the helical regions most urgently needed for optimal final state formation among all the possible helical regions. Tests on the datasets with known structures from public databases demonstrate that our method can outperform other current RNA secondary structure prediction methods in terms of prediction accuracy.
Collapse
|
47
|
Abstract
Deciphering the folding pathways and predicting the structures of complex three-dimensional biomolecules is central to elucidating biological function. RNA is single-stranded, which gives it the freedom to fold into complex secondary and tertiary structures. These structures endow RNA with the ability to perform complex chemistries and functions ranging from enzymatic activity to gene regulation. Given that RNA is involved in many essential cellular processes, it is critical to understand how it folds and functions in vivo. Within the last few years, methods have been developed to probe RNA structures in vivo and genome-wide. These studies reveal that RNA often adopts very different structures in vivo and in vitro, and provide profound insights into RNA biology. Nonetheless, both in vitro and in vivo approaches have limitations: studies in the complex and uncontrolled cellular environment make it difficult to obtain insight into RNA folding pathways and thermodynamics, and studies in vitro often lack direct cellular relevance, leaving a gap in our knowledge of RNA folding in vivo. This gap is being bridged by biophysical and mechanistic studies of RNA structure and function under conditions that mimic the cellular environment. To date, most artificial cytoplasms have used various polymers as molecular crowding agents and a series of small molecules as cosolutes. Studies under such in vivo-like conditions are yielding fresh insights, such as cooperative folding of functional RNAs and increased activity of ribozymes. These observations are accounted for in part by molecular crowding effects and interactions with other molecules. In this review, we report milestones in RNA folding in vitro and in vivo and discuss ongoing experimental and computational efforts to bridge the gap between these two conditions in order to understand how RNA folds in the cell.
Collapse
|
48
|
Yau EH, Butler MC, Sullivan JM. A cellular high-throughput screening approach for therapeutic trans-cleaving ribozymes and RNAi against arbitrary mRNA disease targets. Exp Eye Res 2016; 151:236-55. [PMID: 27233447 DOI: 10.1016/j.exer.2016.05.020] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/01/2016] [Revised: 04/25/2016] [Accepted: 05/22/2016] [Indexed: 12/11/2022]
Abstract
Major bottlenecks in development of therapeutic post transcriptional gene silencing (PTGS) agents (e.g. ribozymes, RNA interference, antisense) include the challenge of mapping rare accessible regions of the mRNA target that are open for annealing and cleavage, testing and optimization of agents in human cells to identify lead agents, testing for cellular toxicity, and preclinical evaluation in appropriate animal models of disease. Methods for rapid and reliable cellular testing of PTGS agents are needed to identify potent lead candidates for optimization. Our goal was to develop a means of rapid assessment of many RNA agents to identify a lead candidate for a given mRNA associated with a disease state. We developed a rapid human cell-based screening platform to test efficacy of hammerhead ribozyme (hhRz) or RNA interference (RNAi) constructs, using a model retinal degeneration target, human rod opsin (RHO) mRNA. The focus is on RNA Drug Discovery for diverse retinal degeneration targets. To validate the approach, candidate hhRzs were tested against NUH↓ cleavage sites (N = G,C,A,U; H = C,A,U) within the target mRNA of secreted alkaline phosphatase (SEAP), a model gene expression reporter, based upon in silico predictions of mRNA accessibility. HhRzs were embedded in a larger stable adenoviral VAI RNA scaffold for high cellular expression, cytoplasmic trafficking, and stability. Most hhRz expression plasmids exerted statistically significant knockdown of extracellular SEAP enzyme activity when readily assayed by a fluorescence enzyme assay intended for high throughput screening (HTS). Kinetics of PTGS knockdown of cellular targets is measureable in live cells with the SEAP reporter. The validated SEAP HTS platform was transposed to identify lead PTGS agents against a model hereditary retinal degeneration target, RHO mRNA. Two approaches were used to physically fuse the model retinal gene target mRNA to the SEAP reporter mRNA. The most expedient way to evaluate a large set of potential VAI-hhRz expression plasmids against diverse NUH↓ cleavage sites uses cultured human HEK293S cells stably expressing a dicistronic Target-IRES-SEAP target fusion mRNA. Broad utility of this rational RNA drug discovery approach is feasible for any ophthalmological disease-relevant mRNA targets and any disease mRNA targets in general. The approach will permit rank ordering of PTGS agents based on potency to identify a lead therapeutic compound for further optimization.
Collapse
Affiliation(s)
- Edwin H Yau
- Department of Pharmacology/Toxicology, University at Buffalo- SUNY, Buffalo, NY 14209, USA; Department of Ophthalmology (Ira G. Ross Eye Institute), University at Buffalo- SUNY, Buffalo, NY 14209, USA
| | - Mark C Butler
- Department of Ophthalmology (Ira G. Ross Eye Institute), University at Buffalo- SUNY, Buffalo, NY 14209, USA
| | - Jack M Sullivan
- Research Service, VA Western New York Healthcare System, Buffalo, NY 14215, USA; Department of Ophthalmology (Ira G. Ross Eye Institute), University at Buffalo- SUNY, Buffalo, NY 14209, USA; Department of Pharmacology/Toxicology, University at Buffalo- SUNY, Buffalo, NY 14209, USA; Department of Physiology/Biophysics, University at Buffalo- SUNY, Buffalo, NY 14209, USA; Neuroscience Program, University at Buffalo- SUNY, Buffalo, NY 14209, USA; SUNY Eye Institute, University at Albany- SUNY, USA; RNA Institute, University at Albany- SUNY, USA.
| |
Collapse
|
49
|
Wu Y, Qu R, Huang Y, Shi B, Liu M, Li Y, Lu ZJ. RNAex: an RNA secondary structure prediction server enhanced by high-throughput structure-probing data. Nucleic Acids Res 2016; 44:W294-301. [PMID: 27137891 PMCID: PMC4987914 DOI: 10.1093/nar/gkw362] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2016] [Accepted: 04/24/2016] [Indexed: 01/16/2023] Open
Abstract
Several high-throughput technologies have been developed to probe RNA base pairs and loops at the transcriptome level in multiple species. However, to obtain the final RNA secondary structure, extensive effort and considerable expertise is required to statistically process the probing data and combine them with free energy models. Therefore, we developed an RNA secondary structure prediction server that is enhanced by experimental data (RNAex). RNAex is a web interface that enables non-specialists to easily access cutting-edge structure-probing data and predict RNA secondary structures enhanced by in vivo and in vitro data. RNAex annotates the RNA editing, RNA modification and SNP sites on the predicted structures. It provides four structure-folding methods, restrained MaxExpect, SeqFold, RNAstructure (Fold) and RNAfold that can be selected by the user. The performance of these four folding methods has been verified by previous publications on known structures. We re-mapped the raw sequencing data of the probing experiments to the whole genome for each species. RNAex thus enables users to predict secondary structures for both known and novel RNA transcripts in human, mouse, yeast and Arabidopsis The RNAex web server is available at http://RNAex.ncrnalab.org/.
Collapse
Affiliation(s)
- Yang Wu
- MOE Key Laboratory of Bioinformatics, Center for Synthetic and Systems Biology, Center for Plant Biology and Tsinghua-Peking Joint Center for Life Sciences, School of Life Sciences, Tsinghua University, Beijing 100084, China
| | - Rihao Qu
- MOE Key Laboratory of Bioinformatics, Center for Synthetic and Systems Biology, Center for Plant Biology and Tsinghua-Peking Joint Center for Life Sciences, School of Life Sciences, Tsinghua University, Beijing 100084, China
| | - Yiming Huang
- MOE Key Laboratory of Bioinformatics, Center for Synthetic and Systems Biology, Center for Plant Biology and Tsinghua-Peking Joint Center for Life Sciences, School of Life Sciences, Tsinghua University, Beijing 100084, China
| | - Binbin Shi
- MOE Key Laboratory of Bioinformatics, Center for Synthetic and Systems Biology, Center for Plant Biology and Tsinghua-Peking Joint Center for Life Sciences, School of Life Sciences, Tsinghua University, Beijing 100084, China
| | - Mengrong Liu
- MOE Key Laboratory of Bioinformatics, Center for Synthetic and Systems Biology, Center for Plant Biology and Tsinghua-Peking Joint Center for Life Sciences, School of Life Sciences, Tsinghua University, Beijing 100084, China
| | - Yang Li
- MOE Key Laboratory of Bioinformatics, Center for Synthetic and Systems Biology, Center for Plant Biology and Tsinghua-Peking Joint Center for Life Sciences, School of Life Sciences, Tsinghua University, Beijing 100084, China
| | - Zhi John Lu
- MOE Key Laboratory of Bioinformatics, Center for Synthetic and Systems Biology, Center for Plant Biology and Tsinghua-Peking Joint Center for Life Sciences, School of Life Sciences, Tsinghua University, Beijing 100084, China
| |
Collapse
|
50
|
Rogers E, Heitsch C. New insights from cluster analysis methods for RNA secondary structure prediction. WILEY INTERDISCIPLINARY REVIEWS-RNA 2016; 7:278-94. [PMID: 26971529 DOI: 10.1002/wrna.1334] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/08/2015] [Revised: 12/03/2015] [Accepted: 12/17/2015] [Indexed: 01/12/2023]
Abstract
A widening gap exists between the best practices for RNA secondary structure prediction developed by computational researchers and the methods used in practice by experimentalists. Minimum free energy predictions, although broadly used, are outperformed by methods which sample from the Boltzmann distribution and data mine the results. In particular, moving beyond the single structure prediction paradigm yields substantial gains in accuracy. Furthermore, the largest improvements in accuracy and precision come from viewing secondary structures not at the base pair level but at lower granularity/higher abstraction. This suggests that random errors affecting precision and systematic ones affecting accuracy are both reduced by this 'fuzzier' view of secondary structures. Thus experimentalists who are willing to adopt a more rigorous, multilayered approach to secondary structure prediction by iterating through these levels of granularity will be much better able to capture fundamental aspects of RNA base pairing. WIREs RNA 2016, 7:278-294. doi: 10.1002/wrna.1334 For further resources related to this article, please visit the WIREs website.
Collapse
Affiliation(s)
- Emily Rogers
- School of Computational Science and Engineering, Georgia Institute of Technology, Atlanta, GA 30332-0765, USA
| | - Christine Heitsch
- School of Mathematics, Georgia Institute of Technology, Atlanta, GA 30332-0160, USA
| |
Collapse
|