1
|
Yamamura K, Asai K, Iwakiri J. Consistent features observed in structural probing data of eukaryotic RNAs. NAR Genom Bioinform 2025; 7:lqaf001. [PMID: 39885881 PMCID: PMC11780854 DOI: 10.1093/nargab/lqaf001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2024] [Revised: 12/25/2024] [Accepted: 01/09/2025] [Indexed: 02/01/2025] Open
Abstract
Understanding RNA structure is crucial for elucidating its regulatory mechanisms. With the recent commercialization of messenger RNA vaccines, the profound impact of RNA structure on stability and translation efficiency has become increasingly evident, underscoring the importance of understanding RNA structure. Chemical probing of RNA has emerged as a powerful technique for investigating RNA structure in living cells. This approach utilizes chemical probes that selectively react with accessible regions of RNA, and by measuring reactivity, the openness and potential of RNA for protein binding or base pairing can be inferred. Extensive experimental data generated using RNA chemical probing have significantly contributed to our understanding of RNA structure in cells. However, it is crucial to acknowledge potential biases in chemical probing data to ensure an accurate interpretation. In this study, we comprehensively analyzed transcriptome-scale RNA chemical probing data in eukaryotes and report common features. Notably, in all experiments, the number of bases modified in probing was small, the bases showing the top 10% reactivity well reflected the known secondary structure, bases with high reactivity were more likely to be exposed to solvent and low reactivity did not reflect solvent exposure, which is important information for the analysis of RNA chemical probing data.
Collapse
Affiliation(s)
- Kazuteru Yamamura
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Kashiwanoha 5-1-5, Kashiwa, Chiba 277-8561, Japan
| | - Kiyoshi Asai
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Kashiwanoha 5-1-5, Kashiwa, Chiba 277-8561, Japan
| | - Junichi Iwakiri
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Kashiwanoha 5-1-5, Kashiwa, Chiba 277-8561, Japan
| |
Collapse
|
2
|
von Löhneysen S, Spicher T, Varenyk Y, Yao HT, Lorenz R, Hofacker I, Stadler PF. Phylogenetic and Chemical Probing Information as Soft Constraints in RNA Secondary Structure Prediction. J Comput Biol 2024; 31:549-563. [PMID: 38935442 DOI: 10.1089/cmb.2024.0519] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/29/2024] Open
Abstract
Extrinsic, experimental information can be incorporated into thermodynamics-based RNA folding algorithms in the form of pseudo-energies. Evolutionary conservation of RNA secondary structure elements is detectable in alignments of phylogenetically related sequences and provides evidence for the presence of certain base pairs that can also be converted into pseudo-energy contributions. We show that the centroid base pairs computed from a consensus folding model such as RNAalifold result in a substantial improvement of the prediction accuracy for single sequences. Evidence for specific base pairs turns out to be more informative than a position-wise profile for the conservation of the pairing status. A comparison with chemical probing data, furthermore, strongly suggests that phylogenetic base pairing data are more informative than position-specific data on (un)pairedness as obtained from chemical probing experiments. In this context we demonstrate, in addition, that the conversion of signal from probing data into pseudo-energies is possible using thermodynamic structure predictions as a reference instead of known RNA structures.
Collapse
Affiliation(s)
- Sarah von Löhneysen
- Bioinformatics Group, Department of Computer Science, and Interdisciplinary Center for Bioinformatics, Universität Leipzig, Leipzig, Germany
| | - Thomas Spicher
- Institute for Theoretical Chemistry, University of Vienna, Vienna, Austria
- UniVie Doctoral School Computer Science (DoCS), University of Vienna, Vienna, Austria
| | - Yuliia Varenyk
- Institute for Theoretical Chemistry, University of Vienna, Vienna, Austria
- Vienna BioCenter PhD Program, Doctoral School of the University of Vienna and Medical, University of Vienna, Vienna, Austria
| | - Hua-Ting Yao
- Institute for Theoretical Chemistry, University of Vienna, Vienna, Austria
| | - Ronny Lorenz
- Institute for Theoretical Chemistry, University of Vienna, Vienna, Austria
| | - Ivo Hofacker
- Institute for Theoretical Chemistry, University of Vienna, Vienna, Austria
| | - Peter F Stadler
- Bioinformatics Group, Department of Computer Science, and Interdisciplinary Center for Bioinformatics, Universität Leipzig, Leipzig, Germany
- Institute for Theoretical Chemistry, University of Vienna, Vienna, Austria
- Max Planck Institute for Mathematics in the Sciences, Leipzig, Germany
- Facultad de Ciencias, Universidad Nacional de Colombia, Bogotá, Colombia
- Santa Fe Institute, Santa Fe, New Mexico, USA
| |
Collapse
|
3
|
Wang J, Zhang Y, Zhang T, Tan WT, Lambert F, Darmawan J, Huber R, Wan Y. RNA structure profiling at single-cell resolution reveals new determinants of cell identity. Nat Methods 2024; 21:411-422. [PMID: 38177506 PMCID: PMC10927541 DOI: 10.1038/s41592-023-02128-y] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2023] [Accepted: 11/10/2023] [Indexed: 01/06/2024]
Abstract
RNA structure is critical for multiple steps in gene regulation. However, how the structures of transcripts differ both within and between individual cells is unknown. Here we develop a SHAPE-inspired method called single-cell structure probing of RNA transcripts that enables simultaneous determination of transcript secondary structure and abundance at single-cell resolution. We apply single-cell structure probing of RNA transcripts to human embryonic stem cells and differentiating neurons. Remarkably, RNA structure is more homogeneous in human embryonic stem cells compared with neurons, with the greatest homogeneity found in coding regions. More extensive heterogeneity is found within 3' untranslated regions and is determined by specific RNA-binding proteins. Overall RNA structure profiles better discriminate cell type identity and differentiation stage than gene expression profiles alone. We further discover a cell-type variable region of 18S ribosomal RNA that is associated with cell cycle and translation control. Our method opens the door to the systematic characterization of RNA structure-function relationships at single-cell resolution.
Collapse
Affiliation(s)
- Jiaxu Wang
- Stem Cell and Regenerative Biology, Genome Institute of Singapore, A*STAR, Singapore, Singapore.
| | - Yu Zhang
- Stem Cell and Regenerative Biology, Genome Institute of Singapore, A*STAR, Singapore, Singapore
| | - Tong Zhang
- Stem Cell and Regenerative Biology, Genome Institute of Singapore, A*STAR, Singapore, Singapore
| | - Wen Ting Tan
- Stem Cell and Regenerative Biology, Genome Institute of Singapore, A*STAR, Singapore, Singapore
| | - Finnlay Lambert
- Stem Cell and Regenerative Biology, Genome Institute of Singapore, A*STAR, Singapore, Singapore
- Division of Biomedical Sciences, Warwick Medical School, University of Warwick, Coventry, UK
| | - Jefferson Darmawan
- Stem Cell and Regenerative Biology, Genome Institute of Singapore, A*STAR, Singapore, Singapore
| | - Roland Huber
- Bioinformatics Institute, A*STAR, Singapore, Singapore
| | - Yue Wan
- Stem Cell and Regenerative Biology, Genome Institute of Singapore, A*STAR, Singapore, Singapore.
- Department of Biochemistry, National University of Singapore, Singapore, Singapore.
| |
Collapse
|
4
|
Backofen R, Gorodkin J, Hofacker IL, Stadler PF. Comparative RNA Genomics. Methods Mol Biol 2024; 2802:347-393. [PMID: 38819565 DOI: 10.1007/978-1-0716-3838-5_12] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/01/2024]
Abstract
Over the last quarter of a century it has become clear that RNA is much more than just a boring intermediate in protein expression. Ancient RNAs still appear in the core information metabolism and comprise a surprisingly large component in bacterial gene regulation. A common theme with these types of mostly small RNAs is their reliance of conserved secondary structures. Large-scale sequencing projects, on the other hand, have profoundly changed our understanding of eukaryotic genomes. Pervasively transcribed, they give rise to a plethora of large and evolutionarily extremely flexible non-coding RNAs that exert a vastly diverse array of molecule functions. In this chapter we provide a-necessarily incomplete-overview of the current state of comparative analysis of non-coding RNAs, emphasizing computational approaches as a means to gain a global picture of the modern RNA world.
Collapse
Affiliation(s)
- Rolf Backofen
- Bioinformatics Group, Department of Computer Science, University of Freiburg, Freiburg, Germany
- Center for Non-coding RNA in Technology and Health, University of Copenhagen, Frederiksberg, Denmark
| | - Jan Gorodkin
- Center for Non-coding RNA in Technology and Health, Department of Veterinary and Animal Sciences, University of Copenhagen, Frederiksberg, Denmark
| | - Ivo L Hofacker
- Institute for Theoretical Chemistry, University of Vienna, Wien, Austria
- Bioinformatics and Computational Biology research group, University of Vienna, Vienna, Austria
- Center for Non-coding RNA in Technology and Health, University of Copenhagen, Frederiksberg, Denmark
| | - Peter F Stadler
- Bioinformatics Group, Department of Computer Science, University of Leipzig, Leipzig, Germany.
- Interdisciplinary Center for Bioinformatics, University of Leipzig, Leipzig, Germany.
- Max Planck Institute for Mathematics in the Sciences, Leipzig, Germany.
- Universidad National de Colombia, Bogotá, Colombia.
- Institute for Theoretical Chemistry, University of Vienna, Wien, Austria.
- Center for Non-coding RNA in Technology and Health, University of Copenhagen, Frederiksberg, Denmark.
- Santa Fe Institute, Santa Fe, NM, USA.
| |
Collapse
|
5
|
Badelt S, Lorenz R. A Guide to Computational Cotranscriptional Folding Featuring the SRP RNA. Methods Mol Biol 2024; 2726:315-346. [PMID: 38780737 DOI: 10.1007/978-1-0716-3519-3_13] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/25/2024]
Abstract
Although RNA molecules are synthesized via transcription, little is known about the general impact of cotranscriptional folding in vivo. We present different computational approaches for the simulation of changing structure ensembles during transcription, including interpretations with respect to experimental data from literature. Specifically, we analyze different mutations of the E. coli SRP RNA, which has been studied comparatively well in previous literature, yet the details of which specific metastable structures form as well as when they form are still under debate. Here, we combine thermodynamic and kinetic, deterministic, and stochastic models with automated and visual inspection of those systems to derive the most likely scenario of which substructures form at which point during transcription. The simulations do not only provide explanations for present experimental observations but also suggest previously unnoticed conformations that may be verified through future experimental studies.
Collapse
Affiliation(s)
- Stefan Badelt
- Department of Theoretical Chemistry, University of Vienna, Vienna, Austria.
| | - Ronny Lorenz
- Department of Theoretical Chemistry, University of Vienna, Vienna, Austria
| |
Collapse
|
6
|
Greenwood T, Heitsch CE. How Parameters Influence SHAPE-Directed Predictions. Methods Mol Biol 2024; 2726:105-124. [PMID: 38780729 DOI: 10.1007/978-1-0716-3519-3_5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/25/2024]
Abstract
The structure of an RNA sequence encodes information about its biological function. Dynamic programming algorithms are often used to predict the conformation of an RNA molecule from its sequence alone, and adding experimental data as auxiliary information improves prediction accuracy. This auxiliary data is typically incorporated into the nearest neighbor thermodynamic model22 by converting the data into pseudoenergies. Here, we look at how much of the space of possible structures auxiliary data allows prediction methods to explore. We find that for a large class of RNA sequences, auxiliary data shifts the predictions significantly. Additionally, we find that predictions are highly sensitive to the parameters which define the auxiliary data pseudoenergies. In fact, the parameter space can typically be partitioned into regions where different structural predictions predominate.
Collapse
|
7
|
Yu B, Li P, Zhang QC, Hou L. Differential analysis of RNA structure probing experiments at nucleotide resolution: uncovering regulatory functions of RNA structure. Nat Commun 2022; 13:4227. [PMID: 35869080 PMCID: PMC9307511 DOI: 10.1038/s41467-022-31875-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2021] [Accepted: 07/05/2022] [Indexed: 11/09/2022] Open
Abstract
RNAs perform their function by forming specific structures, which can change across cellular conditions. Structure probing experiments combined with next generation sequencing technology have enabled transcriptome-wide analysis of RNA secondary structure in various cellular conditions. Differential analysis of structure probing data in different conditions can reveal the RNA structurally variable regions (SVRs), which is important for understanding RNA functions. Here, we propose DiffScan, a computational framework for normalization and differential analysis of structure probing data in high resolution. DiffScan preprocesses structure probing datasets to remove systematic bias, and then scans the transcripts to identify SVRs and adaptively determines their lengths and locations. The proposed approach is compatible with most structure probing platforms (e.g., icSHAPE, DMS-seq). When evaluated with simulated and benchmark datasets, DiffScan identifies structurally variable regions at nucleotide resolution, with substantial improvement in accuracy compared with existing SVR detection methods. Moreover, the improvement is robust when tested in multiple structure probing platforms. Application of DiffScan in a dataset of multi-subcellular RNA structurome and a subsequent motif enrichment analysis suggest potential links of RNA structural variation and mRNA abundance, possibly mediated by RNA binding proteins such as the serine/arginine rich splicing factors. This work provides an effective tool for differential analysis of RNA secondary structure, reinforcing the power of structure probing experiments in deciphering the dynamic RNA structurome. The authors present DiffScan, an advanced tool for normalization and differential analysis of RNA structure probing experiments, combining their power in deciphering the dynamic RNA structurome and facilitating the discovery of RNA regulatory functions.
Collapse
|
8
|
Szikszai M, Wise M, Datta A, Ward M, Mathews DH. Deep learning models for RNA secondary structure prediction (probably) do not generalize across families. Bioinformatics 2022; 38:3892-3899. [PMID: 35748706 PMCID: PMC9364374 DOI: 10.1093/bioinformatics/btac415] [Citation(s) in RCA: 22] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2022] [Revised: 06/09/2022] [Accepted: 06/21/2022] [Indexed: 12/24/2022] Open
Abstract
MOTIVATION The secondary structure of RNA is of importance to its function. Over the last few years, several papers attempted to use machine learning to improve de novo RNA secondary structure prediction. Many of these papers report impressive results for intra-family predictions but seldom address the much more difficult (and practical) inter-family problem. RESULTS We demonstrate that it is nearly trivial with convolutional neural networks to generate pseudo-free energy changes, modelled after structure mapping data that improve the accuracy of structure prediction for intra-family cases. We propose a more rigorous method for inter-family cross-validation that can be used to assess the performance of learning-based models. Using this method, we further demonstrate that intra-family performance is insufficient proof of generalization despite the widespread assumption in the literature and provide strong evidence that many existing learning-based models have not generalized inter-family. AVAILABILITY AND IMPLEMENTATION Source code and data are available at https://github.com/marcellszi/dl-rna. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Marcell Szikszai
- Department of Computer Science & Software Engineering, The University of Western Australia, Perth, WA 6009, Australia
| | - Michael Wise
- Department of Computer Science & Software Engineering, The University of Western Australia, Perth, WA 6009, Australia
- The Marshall Centre for Infectious Diseases Research and Training, The University of Western Australia, Perth, WA 6009, Australia
| | - Amitava Datta
- Department of Computer Science & Software Engineering, The University of Western Australia, Perth, WA 6009, Australia
| | - Max Ward
- Department of Computer Science & Software Engineering, The University of Western Australia, Perth, WA 6009, Australia
- Department of Molecular and Cellular Biology, Harvard University, Cambridge, MA 02138, USA
| | - David H Mathews
- Department of Biochemistry & Biophysics, Center for RNA Biology, and Department of Biostatistics & Computational Biology, University of Rochester, Rochester, NY 14642, USA
| |
Collapse
|
9
|
Aviran S, Incarnato D. Computational approaches for RNA structure ensemble deconvolution from structure probing data. J Mol Biol 2022; 434:167635. [PMID: 35595163 DOI: 10.1016/j.jmb.2022.167635] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2022] [Revised: 04/29/2022] [Accepted: 05/05/2022] [Indexed: 12/15/2022]
Abstract
RNA structure probing experiments have emerged over the last decade as a straightforward way to determine the structure of RNA molecules in a number of different contexts. Although powerful, the ability of RNA to dynamically interconvert between, and to simultaneously populate, alternative structural configurations, poses a nontrivial challenge to the interpretation of data derived from these experiments. Recent efforts aimed at developing computational methods for the reconstruction of coexisting alternative RNA conformations from structure probing data are paving the way to the study of RNA structure ensembles, even in the context of living cells. In this review, we critically discuss these methods, their limitations and possible future improvements.
Collapse
Affiliation(s)
- Sharon Aviran
- Biomedical Engineering Department and Genome Center, University of California, Davis, CA, USA.
| | - Danny Incarnato
- Department of Molecular Genetics, Groningen Biomolecular Sciences and Biotechnology Institute (GBB), University of Groningen, Groningen, the Netherlands.
| |
Collapse
|
10
|
Zhou Y, Sotcheff SL, Routh AL. Next-generation sequencing: A new avenue to understand viral RNA-protein interactions. J Biol Chem 2022; 298:101924. [PMID: 35413291 PMCID: PMC8994257 DOI: 10.1016/j.jbc.2022.101924] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2021] [Revised: 04/01/2022] [Accepted: 04/02/2022] [Indexed: 10/25/2022] Open
Abstract
The genomes of RNA viruses present an astonishing source of both sequence and structural diversity. From intracellular viral RNA-host interfaces to interactions between the RNA genome and structural proteins in virus particles themselves, almost the entire viral lifecycle is accompanied by a myriad of RNA-protein interactions that are required to fulfill their replicative potential. It is therefore important to characterize such rich and dynamic collections of viral RNA-protein interactions to understand virus evolution and their adaptation to their hosts and environment. Recent advances in next-generation sequencing technologies have allowed the characterization of viral RNA-protein interactions, including both transient and conserved interactions, where molecular and structural approaches have fallen short. In this review, we will provide a methodological overview of the high-throughput techniques used to study viral RNA-protein interactions, their biochemical mechanisms, and how they evolved from classical methods as well as one another. We will discuss how different techniques have fueled virus research to characterize how viral RNA and proteins interact, both locally and on a global scale. Finally, we will present examples on how these techniques influence the studies of clinically important pathogens such as HIV-1 and SARS-CoV-2.
Collapse
Affiliation(s)
- Yiyang Zhou
- Department of Biochemistry and Molecular Biology, The University of Texas Medical Branch, Galveston, Texas, USA.
| | - Stephanea L Sotcheff
- Department of Biochemistry and Molecular Biology, The University of Texas Medical Branch, Galveston, Texas, USA
| | - Andrew L Routh
- Department of Biochemistry and Molecular Biology, The University of Texas Medical Branch, Galveston, Texas, USA; Sealy Center for Structural Biology and Molecular Biophysics, The University of Texas Medical Branch, Galveston, Texas, USA; Institute for Human Infections and Immunity, University of Texas Medical Branch, Galveston, Texas, USA
| |
Collapse
|
11
|
Tagashira M, Asai K. ConsAlifold: considering RNA structural alignments improves prediction accuracy of RNA consensus secondary structures. Bioinformatics 2022; 38:710-719. [PMID: 34694364 DOI: 10.1093/bioinformatics/btab738] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2020] [Revised: 08/24/2021] [Accepted: 10/20/2021] [Indexed: 02/03/2023] Open
Abstract
MOTIVATION By detecting homology among RNAs, the probabilistic consideration of RNA structural alignments has improved the prediction accuracy of significant RNA prediction problems. Predicting an RNA consensus secondary structure from an RNA sequence alignment is a fundamental research objective because in the detection of conserved base-pairings among RNA homologs, predicting an RNA consensus secondary structure is more convenient than predicting an RNA structural alignment. RESULTS We developed and implemented ConsAlifold, a dynamic programming-based method that predicts the consensus secondary structure of an RNA sequence alignment. ConsAlifold considers RNA structural alignments. ConsAlifold achieves moderate running time and the best prediction accuracy of RNA consensus secondary structures among available prediction methods. AVAILABILITY AND IMPLEMENTATION ConsAlifold, data and Python scripts for generating both figures and tables are freely available at https://github.com/heartsh/consalifold. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Masaki Tagashira
- Department of Computational Biology and Medical Sciences, University of Tokyo, Chiba 277-8561, Japan.,Artificial Intelligence Research Center, AIST, Tokyo 135-0064, Japan
| | - Kiyoshi Asai
- Department of Computational Biology and Medical Sciences, University of Tokyo, Chiba 277-8561, Japan.,Artificial Intelligence Research Center, AIST, Tokyo 135-0064, Japan
| |
Collapse
|
12
|
Radecki P, Uppuluri R, Deshpande K, Aviran S. Accurate detection of RNA stem-loops in structurome data reveals widespread association with protein binding sites. RNA Biol 2021; 18:521-536. [PMID: 34606413 PMCID: PMC8677038 DOI: 10.1080/15476286.2021.1971382] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/31/2022] Open
Abstract
RNA molecules are known to fold into specific structures which often play a central role in their functions and regulation. In silico folding of RNA transcripts, especially when assisted with structure profiling (SP) data, is capable of accurately elucidating relevant structural conformations. However, such methods scale poorly to the swaths of SP data generated by transcriptome-wide experiments, which are becoming more commonplace and advancing our understanding of RNA structure and its regulation at global and local levels. This has created a need for tools capable of rapidly deriving structural assessments from SP data in a scalable manner. One such tool we previously introduced that aims to process such data is patteRNA, a statistical learning algorithm capable of rapidly mining big SP datasets for structural elements. Here, we present a reformulation of patteRNA's pattern recognition scheme that sees significantly improved precision without major compromises to computational overhead. Specifically, we developed a data-driven logistic classifier which interprets patteRNA's statistical characterizations of SP data in addition to local sequence properties as measured with a nearest neighbour thermodynamic model. Application of the classifier to human structurome data reveals a marked association between detected stem-loops and RNA binding protein (RBP) footprints. The results of our application demonstrate that upwards of 30% of RBP footprints occur within loops of stable stem-loop elements. Overall, our work arrives at a rapid and accurate method for automatically detecting families of RNA structure motifs and demonstrates the functional relevance of identifying them transcriptome-wide.
Collapse
Affiliation(s)
- Pierce Radecki
- Biomedical Engineering Department and Genome Center, University of California, Davis, CA, USA
| | - Rahul Uppuluri
- Biomedical Engineering Department and Genome Center, University of California, Davis, CA, USA
| | - Kaustubh Deshpande
- Biomedical Engineering Department and Genome Center, University of California, Davis, CA, USA
| | - Sharon Aviran
- Biomedical Engineering Department and Genome Center, University of California, Davis, CA, USA
| |
Collapse
|
13
|
Radecki P, Uppuluri R, Aviran S. Rapid structure-function insights via hairpin-centric analysis of big RNA structure probing datasets. NAR Genom Bioinform 2021; 3:lqab073. [PMID: 34447931 PMCID: PMC8384053 DOI: 10.1093/nargab/lqab073] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2021] [Revised: 07/14/2021] [Accepted: 08/03/2021] [Indexed: 12/23/2022] Open
Abstract
The functions of RNA are often tied to its structure, hence analyzing structure is of significant interest when studying cellular processes. Recently, large-scale structure probing (SP) studies have enabled assessment of global structure-function relationships via standard data summarizations or local folding. Here, we approach structure quantification from a hairpin-centric perspective where putative hairpins are identified in SP datasets and used as a means to capture local structural effects. This has the advantage of rapid processing of big (e.g. transcriptome-wide) data as RNA folding is circumvented, yet it captures more information than simple data summarizations. We reformulate a statistical learning algorithm we previously developed to significantly improve precision of hairpin detection, then introduce a novel nucleotide-wise measure, termed the hairpin-derived structure level (HDSL), which captures local structuredness by accounting for the presence of likely hairpin elements. Applying HDSL to data from recent studies recapitulates, strengthens and expands on their findings which were obtained by more comprehensive folding algorithms, yet our analyses are orders of magnitude faster. These results demonstrate that hairpin detection is a promising avenue for global and rapid structure-function analysis, furthering our understanding of RNA biology and the principal features which drive biological insights from SP data.
Collapse
Affiliation(s)
- Pierce Radecki
- Biomedical Engineering Department and Genome Center, University of California at Davis, Davis, CA 95616, USA
| | - Rahul Uppuluri
- Biomedical Engineering Department and Genome Center, University of California at Davis, Davis, CA 95616, USA
| | - Sharon Aviran
- Biomedical Engineering Department and Genome Center, University of California at Davis, Davis, CA 95616, USA
| |
Collapse
|
14
|
Cao J, Xue Y. Characteristic chemical probing patterns of loop motifs improve prediction accuracy of RNA secondary structures. Nucleic Acids Res 2021; 49:4294-4307. [PMID: 33849076 PMCID: PMC8096282 DOI: 10.1093/nar/gkab250] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2020] [Revised: 03/24/2021] [Accepted: 04/10/2021] [Indexed: 12/14/2022] Open
Abstract
RNA structures play a fundamental role in nearly every aspect of cellular physiology and pathology. Gaining insights into the functions of RNA molecules requires accurate predictions of RNA secondary structures. However, the existing thermodynamic folding models remain less accurate than desired, even when chemical probing data, such as selective 2'-hydroxyl acylation analyzed by primer extension (SHAPE) reactivities, are used as restraints. Unlike most SHAPE-directed algorithms that only consider SHAPE restraints for base pairing, we extract two-dimensional structural features encoded in SHAPE data and establish robust relationships between characteristic SHAPE patterns and loop motifs of various types (hairpin, internal, and bulge) and lengths (2-11 nucleotides). Such characteristic SHAPE patterns are closely related to the sugar pucker conformations of loop residues. Based on these patterns, we propose a computational method, SHAPELoop, which refines the predicted results of the existing methods, thereby further improving their prediction accuracy. In addition, SHAPELoop can provide information about local or global structural rearrangements (including pseudoknots) and help researchers to easily test their hypothesized secondary structures.
Collapse
Affiliation(s)
- Jingyi Cao
- School of Life Sciences, Tsinghua-Peking Joint Center for Life Sciences, Beijing Advanced Innovation Center for Structural Biology, Tsinghua University, Beijing 100084, China
| | - Yi Xue
- School of Life Sciences, Tsinghua-Peking Joint Center for Life Sciences, Beijing Advanced Innovation Center for Structural Biology, Tsinghua University, Beijing 100084, China
| |
Collapse
|
15
|
Rivas E. Evolutionary conservation of RNA sequence and structure. WILEY INTERDISCIPLINARY REVIEWS-RNA 2021; 12:e1649. [PMID: 33754485 PMCID: PMC8250186 DOI: 10.1002/wrna.1649] [Citation(s) in RCA: 22] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/17/2020] [Revised: 02/24/2021] [Accepted: 02/25/2021] [Indexed: 12/22/2022]
Abstract
An RNA structure prediction from a single‐sequence RNA folding program is not evidence for an RNA whose structure is important for function. Random sequences have plausible and complex predicted structures not easily distinguishable from those of structural RNAs. How to tell when an RNA has a conserved structure is a question that requires looking at the evolutionary signature left by the conserved RNA. This question is important not just for long noncoding RNAs which usually lack an identified function, but also for RNA binding protein motifs which can be single stranded RNAs or structures. Here we review recent advances using sequence and structural analysis to determine when RNA structure is conserved or not. Although covariation measures assess structural RNA conservation, one must distinguish covariation due to RNA structure from covariation due to independent phylogenetic substitutions. We review a statistical test to measure false positives expected under the null hypothesis of phylogenetic covariation alone (specificity). We also review a complementary test that measures power, that is, expected covariation derived from sequence variation alone (sensitivity). Power in the absence of covariation signals the absence of a conserved RNA structure. We analyze artifacts that falsely identify conserved RNA structure such as the misuse of programs that do not assess significance, the use of inappropriate statistics confounded by signals other than covariation, or misalignments that induce spurious covariation. Among artifacts that obscure the signal of a conserved RNA structure, we discuss the inclusion of pseudogenes in alignments which increase power but destroy covariation. This article is categorized under:RNA Structure and Dynamics > RNA Structure, Dynamics and Chemistry RNA Evolution and Genomics > Computational Analyses of RNA RNA Evolution and Genomics > RNA and Ribonucleoprotein Evolution
Collapse
Affiliation(s)
- Elena Rivas
- Department of Molecular and Cellular Biology, Harvard University, Cambridge, Massachusetts, USA
| |
Collapse
|
16
|
Madden EA, Plante KS, Morrison CR, Kutchko KM, Sanders W, Long KM, Taft-Benz S, Cruz Cisneros MC, White AM, Sarkar S, Reynolds G, Vincent HA, Laederach A, Moorman NJ, Heise MT. Using SHAPE-MaP To Model RNA Secondary Structure and Identify 3'UTR Variation in Chikungunya Virus. J Virol 2020; 94:e00701-20. [PMID: 32999019 PMCID: PMC7925192 DOI: 10.1128/jvi.00701-20] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2020] [Accepted: 09/17/2020] [Indexed: 01/04/2023] Open
Abstract
Chikungunya virus (CHIKV) is a mosquito-borne alphavirus associated with debilitating arthralgia in humans. RNA secondary structure in the viral genome plays an important role in the lifecycle of alphaviruses; however, the specific role of RNA structure in regulating CHIKV replication is poorly understood. Our previous studies found little conservation in RNA secondary structure between alphaviruses, and this structural divergence creates unique functional structures in specific alphavirus genomes. Therefore, to understand the impact of RNA structure on CHIKV biology, we used SHAPE-MaP to inform the modeling of RNA secondary structure throughout the genome of a CHIKV isolate from the 2013 Caribbean outbreak. We then analyzed regions of the genome with high levels of structural specificity to identify potentially functional RNA secondary structures and identified 23 regions within the CHIKV genome with higher than average structural stability, including four previously identified, functionally important CHIKV RNA structures. We also analyzed the RNA flexibility and secondary structures of multiple 3'UTR variants of CHIKV that are known to affect virus replication in mosquito cells. This analysis found several novel RNA structures within these 3'UTR variants. A duplication in the 3'UTR that enhances viral replication in mosquito cells led to an overall increase in the amount of unstructured RNA in the 3'UTR. This analysis demonstrates that the CHIKV genome contains a number of unique, specific RNA secondary structures and provides a strategy for testing these secondary structures for functional importance in CHIKV replication and pathogenesis.IMPORTANCE Chikungunya virus (CHIKV) is a mosquito-borne RNA virus that causes febrile illness and debilitating arthralgia in humans. CHIKV causes explosive outbreaks but there are no approved therapies to treat or prevent CHIKV infection. The CHIKV genome contains functional RNA secondary structures that are essential for proper virus replication. Since RNA secondary structures have only been defined for a small portion of the CHIKV genome, we used a chemical probing method to define the RNA secondary structures of CHIKV genomic RNA. We identified 23 highly specific structured regions of the genome, and confirmed the functional importance of one structure using mutagenesis. Furthermore, we defined the RNA secondary structure of three CHIKV 3'UTR variants that differ in their ability to replicate in mosquito cells. Our study highlights the complexity of the CHIKV genome and describes new systems for designing compensatory mutations to test the functional relevance of viral RNA secondary structures.
Collapse
Affiliation(s)
- Emily A Madden
- Department of Microbiology and Immunology, UNC-Chapel Hill, Chapel Hill, North Carolina, USA
| | - Kenneth S Plante
- Department of Genetics, UNC-Chapel Hill, Chapel Hill, North Carolina, USA
| | - Clayton R Morrison
- Department of Genetics, UNC-Chapel Hill, Chapel Hill, North Carolina, USA
| | - Katrina M Kutchko
- Biology Department, UNC-Chapel Hill, Chapel Hill, North Carolina, USA
- Curriculum in Bioinformatics and Computational Biology, UNC-Chapel Hill, Chapel Hill, North Carolina, USA
| | - Wes Sanders
- Department of Microbiology and Immunology, UNC-Chapel Hill, Chapel Hill, North Carolina, USA
| | - Kristin M Long
- Department of Genetics, UNC-Chapel Hill, Chapel Hill, North Carolina, USA
| | - Sharon Taft-Benz
- Department of Genetics, UNC-Chapel Hill, Chapel Hill, North Carolina, USA
| | | | | | - Sanjay Sarkar
- Department of Genetics, UNC-Chapel Hill, Chapel Hill, North Carolina, USA
| | - Grace Reynolds
- Department of Genetics, UNC-Chapel Hill, Chapel Hill, North Carolina, USA
| | - Heather A Vincent
- Department of Microbiology and Immunology, UNC-Chapel Hill, Chapel Hill, North Carolina, USA
| | - Alain Laederach
- Biology Department, UNC-Chapel Hill, Chapel Hill, North Carolina, USA
| | - Nathanial J Moorman
- Department of Microbiology and Immunology, UNC-Chapel Hill, Chapel Hill, North Carolina, USA
- Lineberger Comprehensive Cancer Center, UNC-Chapel Hill, Chapel Hill, North Carolina, USA
| | - Mark T Heise
- Department of Microbiology and Immunology, UNC-Chapel Hill, Chapel Hill, North Carolina, USA
- Department of Genetics, UNC-Chapel Hill, Chapel Hill, North Carolina, USA
| |
Collapse
|
17
|
Rivas E. RNA structure prediction using positive and negative evolutionary information. PLoS Comput Biol 2020; 16:e1008387. [PMID: 33125376 PMCID: PMC7657543 DOI: 10.1371/journal.pcbi.1008387] [Citation(s) in RCA: 33] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2020] [Revised: 11/11/2020] [Accepted: 09/24/2020] [Indexed: 12/22/2022] Open
Abstract
Knowing the structure of conserved structural RNAs is important to elucidate their function and mechanism of action. However, predicting a conserved RNA structure remains unreliable, even when using a combination of thermodynamic stability and evolutionary covariation information. Here we present a method to predict a conserved RNA structure that combines the following three features. First, it uses significant covariation due to RNA structure and removes spurious covariation due to phylogeny. Second, it uses negative evolutionary information: basepairs that have variation but no significant covariation are prevented from occurring. Lastly, it uses a battery of probabilistic folding algorithms that incorporate all positive covariation into one structure. The method, named CaCoFold (Cascade variation/covariation Constrained Folding algorithm), predicts a nested structure guided by a maximal subset of positive basepairs, and recursively incorporates all remaining positive basepairs into alternative helices. The alternative helices can be compatible with the nested structure such as pseudoknots, or overlapping such as competing structures, base triplets, or other 3D non-antiparallel interactions. We present evidence that CaCoFold predictions are consistent with structures modeled from crystallography. The availability of deeper comparative sequence alignments and recent advances in statistical analysis of RNA sequence covariation have made it possible to identify a reliable set of conserved base pairs, as well as a reliable set of non-basepairs (positions that vary without covarying). Predicting an overall consensus secondary structure consistent with a set of individual inferred pairs and non-pairs remains a problem. Current RNA structure prediction algorithms that predict nested secondary structures cannot use the full set of inferred covarying pairs, because covariation analysis also identifies important non-nested pairing interactions such as pseudoknots, base triples, and alternative structures. Moreover, although algorithms for incorporating negative constraints exist, negative information from covariation analysis (inferred non-pairs) has not been systematically exploited. Here I introduce an efficient approximate RNA structure prediction algorithm that incorporates all inferred pairs and excludes all non-pairs. Using this, and an improved visualization tool, I show that the method correctly identifies many non-nested structures in agreement with known crystal structures, and improves many curated consensus secondary structure annotations in RNA sequence alignment databases.
Collapse
Affiliation(s)
- Elena Rivas
- Department of Molecular and Cellular Biology, Harvard University, Cambridge, Massachusetts, USA
- * E-mail:
| |
Collapse
|
18
|
Greenwood T, Heitsch CE. On the Problem of Reconstructing a Mixture of RNA Structures. Bull Math Biol 2020; 82:133. [PMID: 33029669 DOI: 10.1007/s11538-020-00804-0] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2020] [Accepted: 09/08/2020] [Indexed: 01/02/2023]
Abstract
A growing number of RNA sequences are now known to exist in some distribution with two or more different stable structures. Recent algorithms attempt to reconstruct such mixtures using the list of nucleotides in a sequence in conjunction with auxiliary experimental footprinting data. In this paper, we demonstrate some challenges which remain in addressing this problem; in particular we consider the difficulty of reconstructing a mixture of two RNA structures across a spectrum of different relative abundances. Although progress has been made in identifying the stable structures present, it remains nontrivial to predict the relative abundance of each within the experimentally sampled mixture. Because the ratio of structures present can change depending on experimental conditions, it is the footprinting data-and not the sequence-which must encode information on changes in the relative abundance. Here, we use simulated experimental data to demonstrate that there exist RNA sequences and relative abundance combinations which cannot be recovered by current methods. We then prove that this is not a single exception, but rather part of the rule. In particular, we show, using a Nussinov-Jacobson model, that recovering the relative abundances is difficult for a large proportion of RNA structure pairs. Lastly, we use information theory to establish a framework for quantifying how useful auxiliary data is in predicting the relative abundance of a structure. Together, these results demonstrate that aspects of the problem of reconstructing a mixture of RNA structures from experimental data remain open.
Collapse
|
19
|
Li TJX, Reidys CM. On an enhancement of RNA probing data using information theory. Algorithms Mol Biol 2020; 15:15. [PMID: 32782456 PMCID: PMC7413225 DOI: 10.1186/s13015-020-00176-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2019] [Accepted: 07/31/2020] [Indexed: 12/21/2022] Open
Abstract
Identifying the secondary structure of an RNA is crucial for understanding its diverse regulatory functions. This paper focuses on how to enhance target identification in a Boltzmann ensemble of structures via chemical probing data. We employ an information-theoretic approach to solve the problem, via considering a variant of the Rényi-Ulam game. Our framework is centered around the ensemble tree, a hierarchical bi-partition of the input ensemble, that is constructed by recursively querying about whether or not a base pair of maximum information entropy is contained in the target. These queries are answered via relating local with global probing data, employing the modularity in RNA secondary structures. We present that leaves of the tree are comprised of sub-samples exhibiting a distinguished structure with high probability. In particular, for a Boltzmann ensemble incorporating probing data, which is well established in the literature, the probability of our framework correctly identifying the target in the leaf is greater than \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$90\%$$\end{document}90%.
Collapse
|
20
|
Twittenhoff C, Brandenburg VB, Righetti F, Nuss AM, Mosig A, Dersch P, Narberhaus F. Lead-seq: transcriptome-wide structure probing in vivo using lead(II) ions. Nucleic Acids Res 2020; 48:e71. [PMID: 32463449 PMCID: PMC7337928 DOI: 10.1093/nar/gkaa404] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2019] [Revised: 04/08/2020] [Accepted: 05/06/2020] [Indexed: 12/24/2022] Open
Abstract
The dynamic conformation of RNA molecules within living cells is key to their function. Recent advances in probing the RNA structurome in vivo, including the use of SHAPE (Selective 2'-Hydroxyl Acylation analyzed by Primer Extension) or kethoxal reagents or DMS (dimethyl sulfate), provided unprecedented insights into the architecture of RNA molecules in the living cell. Here, we report the establishment of lead probing in a global RNA structuromics approach. In order to elucidate the transcriptome-wide RNA landscape in the enteric pathogen Yersinia pseudotuberculosis, we combined lead(II) acetate-mediated cleavage of single-stranded RNA regions with high-throughput sequencing. This new approach, termed 'Lead-seq', provides structural information independent of base identity. We show that the method recapitulates secondary structures of tRNAs, RNase P RNA, tmRNA, 16S rRNA and the rpsT 5'-untranslated region, and that it reveals global structural features of mRNAs. The application of Lead-seq to Y. pseudotuberculosis cells grown at two different temperatures unveiled the first temperature-responsive in vivo RNA structurome of a bacterial pathogen. The translation of candidate genes derived from this approach was confirmed to be temperature regulated. Overall, this study establishes Lead-seq as complementary approach to interrogate intracellular RNA structures on a global scale.
Collapse
Affiliation(s)
| | | | | | - Aaron M Nuss
- Department of Molecular Infection Biology, Helmholtz Centre for Infection Research, 381214 Braunschweig, Germany
| | - Axel Mosig
- Department of Biophysics, Ruhr University Bochum, 44780 Bochum, Germany
| | - Petra Dersch
- Department of Molecular Infection Biology, Helmholtz Centre for Infection Research, 381214 Braunschweig, Germany
- Institute of Infectiology, Center for Molecular Biology of Inflammation, University of Münster, 48149 Münster, Germany
| | - Franz Narberhaus
- Microbial Biology, Ruhr University Bochum, 44780 Bochum, Germany
| |
Collapse
|
21
|
Mautner S, Montaseri S, Miladi M, Raden M, Costa F, Backofen R. ShaKer: RNA SHAPE prediction using graph kernel. Bioinformatics 2020; 35:i354-i359. [PMID: 31510707 PMCID: PMC6612843 DOI: 10.1093/bioinformatics/btz395] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023] Open
Abstract
Summary SHAPE experiments are used to probe the structure of RNA molecules. We present ShaKer to predict SHAPE data for RNA using a graph-kernel-based machine learning approach that is trained on experimental SHAPE information. While other available methods require a manually curated reference structure, ShaKer predicts reactivity data based on sequence input only and by sampling the ensemble of possible structures. Thus, ShaKer is well placed to enable experiment-driven, transcriptome-wide SHAPE data prediction to enable the study of RNA structuredness and to improve RNA structure and RNA–RNA interaction prediction. For performance evaluation, we use accuracy and accessibility comparing to experimental SHAPE data and competing methods. We can show that Shaker outperforms its competitors and is able to predict high quality SHAPE annotations even when no reference structure is provided. Availability and implementation ShaKer is freely available at https://github.com/BackofenLab/ShaKer.
Collapse
Affiliation(s)
- Stefan Mautner
- Bioinformatics Group, Department of Computer Science, University of Freiburg, Freiburg, Germany
| | - Soheila Montaseri
- Bioinformatics Group, Department of Computer Science, University of Freiburg, Freiburg, Germany
| | - Milad Miladi
- Bioinformatics Group, Department of Computer Science, University of Freiburg, Freiburg, Germany
| | - Martin Raden
- Bioinformatics Group, Department of Computer Science, University of Freiburg, Freiburg, Germany
| | - Fabrizio Costa
- Department Computer Science, University of Exeter, Exeter, UK
| | - Rolf Backofen
- Bioinformatics Group, Department of Computer Science, University of Freiburg, Freiburg, Germany.,Signalling Research Centres BIOSS and CIBSS, University of Freiburg, Freiburg, Germany
| |
Collapse
|
22
|
Kuksa PP, Li F, Kannan S, Gregory BD, Leung YY, Wang LS. HiPR: High-throughput probabilistic RNA structure inference. Comput Struct Biotechnol J 2020; 18:1539-1547. [PMID: 32637050 PMCID: PMC7327253 DOI: 10.1016/j.csbj.2020.06.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2020] [Revised: 05/15/2020] [Accepted: 06/01/2020] [Indexed: 11/20/2022] Open
Abstract
Recent high-throughput structure-sensitive genome-wide sequencing-based assays have enabled large-scale studies of RNA structure, and robust transcriptome-wide computational prediction of individual RNA structures across RNA classes from these assays has potential to further improve the prediction accuracy. Here, we describe HiPR, a novel method for RNA structure prediction at single-nucleotide resolution that combines high-throughput structure probing data (DMS-seq, DMS-MaPseq) with a novel probabilistic folding algorithm. On validation data spanning a variety of RNA classes, HiPR often increases accuracy for predicting RNA structures, giving researchers new tools to study RNA structure.
Collapse
Affiliation(s)
- Pavel P. Kuksa
- Penn Neurodegeneration Genomics Center, Department of Pathology and Laboratory Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Fan Li
- Children’s Hospital Los Angeles, Los Angeles, CA 90027, USA
| | - Sampath Kannan
- Department of Computer and Information Science, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Brian D. Gregory
- Department of Biology, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Yuk Yee Leung
- Penn Neurodegeneration Genomics Center, Department of Pathology and Laboratory Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Li-San Wang
- Penn Neurodegeneration Genomics Center, Department of Pathology and Laboratory Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
- Department of Computer and Information Science, University of Pennsylvania, Philadelphia, PA 19104, USA
| |
Collapse
|
23
|
Willmott D, Murrugarra D, Ye Q. Improving RNA secondary structure prediction via state inference with deep recurrent neural networks. COMPUTATIONAL AND MATHEMATICAL BIOPHYSICS 2020. [DOI: 10.1515/cmb-2020-0002] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022] Open
Abstract
Abstract
The problem of determining which nucleotides of an RNA sequence are paired or unpaired in the secondary structure of an RNA, which we call RNA state inference, can be studied by different machine learning techniques. Successful state inference of RNA sequences can be used to generate auxiliary information for data-directed RNA secondary structure prediction. Typical tools for state inference, such as hidden Markov models, exhibit poor performance in RNA state inference, owing in part to their inability to recognize nonlocal dependencies. Bidirectional long short-term memory (LSTM) neural networks have emerged as a powerful tool that can model global nonlinear sequence dependencies and have achieved state-of-the-art performances on many different classification problems.
This paper presents a practical approach to RNA secondary structure inference centered around a deep learning method for state inference. State predictions from a deep bidirectional LSTM are used to generate synthetic SHAPE data that can be incorporated into RNA secondary structure prediction via the Nearest Neighbor Thermodynamic Model (NNTM). This method produces predicted secondary structures for a diverse test set of 16S ribosomal RNA that are, on average, 25 percentage points more accurate than undirected MFE structures. Accuracy is highly dependent on the success of our state inference method, and investigating the global features of our state predictions reveals that accuracy of both our state inference and structure inference methods are highly dependent on the similarity of pairing patterns of the sequence to the training dataset. Availability of a large training dataset is critical to the success of this approach. Code available at https://github.com/dwillmott/rna-state-inf.
Collapse
Affiliation(s)
| | - David Murrugarra
- Department of Mathematics , University of Kentucky , Lexington , KY 40506-0027 USA
| | - Qiang Ye
- Department of Mathematics , University of Kentucky , Lexington , KY 40506-0027 USA
| |
Collapse
|
24
|
Xue AY, Yu AM, Lucks JB, Bagheri N. DUETT quantitatively identifies known and novel events in nascent RNA structural dynamics from chemical probing data. Bioinformatics 2019; 35:5103-5112. [PMID: 31389563 PMCID: PMC6954663 DOI: 10.1093/bioinformatics/btz449] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2018] [Revised: 04/29/2019] [Accepted: 07/19/2019] [Indexed: 01/23/2023] Open
Abstract
MOTIVATION RNA molecules can undergo complex structural dynamics, especially during transcription, which influence their biological functions. Recently developed high-throughput chemical probing experiments that study RNA cotranscriptional folding generate nucleotide-resolution 'reactivities' for each length of a growing nascent RNA that reflect structural dynamics. However, the manual annotation and qualitative interpretation of reactivity across these large datasets can be nuanced, laborious, and difficult for new practitioners. We developed a quantitative and systematic approach to automatically detect RNA folding events from these datasets to reduce human bias/error, standardize event discovery and generate hypotheses about RNA folding trajectories for further analysis and experimental validation. RESULTS Detection of Unknown Events with Tunable Thresholds (DUETT) identifies RNA structural transitions in cotranscriptional RNA chemical probing datasets. DUETT employs a feedback control-inspired method and a linear regression approach and relies on interpretable and independently tunable parameter thresholds to match qualitative user expectations with quantitatively identified folding events. We validate the approach by identifying known RNA structural transitions within the cotranscriptional folding pathways of the Escherichia coli signal recognition particle RNA and the Bacillus cereus crcB fluoride riboswitch. We identify previously overlooked features of these datasets such as heightened reactivity patterns in the signal recognition particle RNA about 12 nt lengths before base-pair rearrangement. We then apply a sensitivity analysis to identify tradeoffs when choosing parameter thresholds. Finally, we show that DUETT is tunable across a wide range of contexts, enabling flexible application to study broad classes of RNA folding mechanisms. AVAILABILITY AND IMPLEMENTATION https://github.com/BagheriLab/DUETT. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Albert Y Xue
- Department of Chemical & Biological Engineering, Northwestern University, Evanston, IL, USA
- Center for Synthetic Biology, Northwestern University, Evanston IL, USA
- Northwestern Institute on Complex Systems, Northwestern University, Evanston, IL, USA
| | - Angela M Yu
- Center for Synthetic Biology, Northwestern University, Evanston IL, USA
- Tri-Institutional Training Program in Computational Biology and Medicine, Weill Cornell Medicine, New York, NY, USA
| | - Julius B Lucks
- Department of Chemical & Biological Engineering, Northwestern University, Evanston, IL, USA
- Center for Synthetic Biology, Northwestern University, Evanston IL, USA
- Chemistry of Life Processes Institute, Northwestern University, Evanston, IL, USA
| | - Neda Bagheri
- Department of Chemical & Biological Engineering, Northwestern University, Evanston, IL, USA
- Center for Synthetic Biology, Northwestern University, Evanston IL, USA
- Northwestern Institute on Complex Systems, Northwestern University, Evanston, IL, USA
- Chemistry of Life Processes Institute, Northwestern University, Evanston, IL, USA
| |
Collapse
|
25
|
Miladi M, Sokhoyan E, Houwaart T, Heyne S, Costa F, Grüning B, Backofen R. GraphClust2: Annotation and discovery of structured RNAs with scalable and accessible integrative clustering. Gigascience 2019; 8:giz150. [PMID: 31808801 PMCID: PMC6897289 DOI: 10.1093/gigascience/giz150] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2019] [Revised: 08/23/2019] [Accepted: 11/20/2019] [Indexed: 11/18/2022] Open
Abstract
BACKGROUND RNA plays essential roles in all known forms of life. Clustering RNA sequences with common sequence and structure is an essential step towards studying RNA function. With the advent of high-throughput sequencing techniques, experimental and genomic data are expanding to complement the predictive methods. However, the existing methods do not effectively utilize and cope with the immense amount of data becoming available. RESULTS Hundreds of thousands of non-coding RNAs have been detected; however, their annotation is lagging behind. Here we present GraphClust2, a comprehensive approach for scalable clustering of RNAs based on sequence and structural similarities. GraphClust2 bridges the gap between high-throughput sequencing and structural RNA analysis and provides an integrative solution by incorporating diverse experimental and genomic data in an accessible manner via the Galaxy framework. GraphClust2 can efficiently cluster and annotate large datasets of RNAs and supports structure-probing data. We demonstrate that the annotation performance of clustering functional RNAs can be considerably improved. Furthermore, an off-the-shelf procedure is introduced for identifying locally conserved structure candidates in long RNAs. We suggest the presence and the sparseness of phylogenetically conserved local structures for a collection of long non-coding RNAs. CONCLUSIONS By clustering data from 2 cross-linking immunoprecipitation experiments, we demonstrate the benefits of GraphClust2 for motif discovery under the presence of biological and methodological biases. Finally, we uncover prominent targets of double-stranded RNA binding protein Roquin-1, such as BCOR's 3' untranslated region that contains multiple binding stem-loops that are evolutionary conserved.
Collapse
Affiliation(s)
- Milad Miladi
- Bioinformatics Group, Department of Computer Science, University of Freiburg, Georges-Koehler-Allee 106, 79110 Freiburg, Germany
| | - Eteri Sokhoyan
- Bioinformatics Group, Department of Computer Science, University of Freiburg, Georges-Koehler-Allee 106, 79110 Freiburg, Germany
| | - Torsten Houwaart
- Institute of Medical Microbiology and Hospital Hygiene, University of Dusseldorf, Universitaetsstr. 1, 40225 Dusseldorf, Germany
| | - Steffen Heyne
- Max Planck Institute of Immunobiology and Epigenetics, Freiburg, Stuebeweg 51, 79108 Freiburg, Germany
| | - Fabrizio Costa
- Department of Computer Science, University of Exeter, North Park Road, EX4 4QF Exeter, UK
| | - Björn Grüning
- Bioinformatics Group, Department of Computer Science, University of Freiburg, Georges-Koehler-Allee 106, 79110 Freiburg, Germany
- ZBSA Centre for Biological Systems Analysis, University of Freiburg, Hauptstr. 1, 79104 Freiburg, Germany
| | - Rolf Backofen
- Bioinformatics Group, Department of Computer Science, University of Freiburg, Georges-Koehler-Allee 106, 79110 Freiburg, Germany
- ZBSA Centre for Biological Systems Analysis, University of Freiburg, Hauptstr. 1, 79104 Freiburg, Germany
- Signalling Research Centres BIOSS and CIBSS, University of Freiburg, Schaenzlestr. 18, 79104 Freiburg, Germany
| |
Collapse
|
26
|
Wang F, Sun LZ, Sun T, Chang S, Xu X. Helix-Based RNA Landscape Partition and Alternative Secondary Structure Determination. ACS OMEGA 2019; 4:15407-15413. [PMID: 31572840 PMCID: PMC6761681 DOI: 10.1021/acsomega.9b01430] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 05/16/2019] [Accepted: 09/03/2019] [Indexed: 06/10/2023]
Abstract
RNA is a versatile macromolecule with the ability to fold into and interconvert between multiple functional conformations. The elucidation of the RNA folding landscape, especially the knowledge of alternative structures, is critical to uncover the physical mechanism of RNA functions. Here, we introduce a helix-based strategy for RNA folding landscape partition and alternative secondary structure determination. The benchmark test of 27 RNAs involving alternative stable structures shows that the model has the ability to divide the whole landscape into distinct partitions at the secondary structure level and predict the representative structures for each partition. Furthermore, the predicted structures and equilibrium populations of metastable conformations for the 2'dG-sensing riboswitch reveal the allosteric conformational switch on transcript length, which is consistent with the experimental study, indicating the importance of metastable states for RNA-based gene regulation. The model delivers a starting point for the landscape-based strategy toward the RNA folding mechanism and functions.
Collapse
Affiliation(s)
- Fengfei Wang
- Institute
of Bioinformatics and Medical Engineering, School of Mathematics and
Physics, Jiangsu University of Technology, Changzhou, Jiangsu 213001, China
| | - Li-Zhen Sun
- Department
of Applied Physics, Zhejiang University
of Technology, Hangzhou, Zhejiang 310023, China
| | - Tingting Sun
- Department
of Physics, Zhejiang University of Science
and Technology, Hangzhou, Zhejiang 310023, China
| | - Shan Chang
- Institute
of Bioinformatics and Medical Engineering, School of Mathematics and
Physics, Jiangsu University of Technology, Changzhou, Jiangsu 213001, China
| | - Xiaojun Xu
- Institute
of Bioinformatics and Medical Engineering, School of Mathematics and
Physics, Jiangsu University of Technology, Changzhou, Jiangsu 213001, China
| |
Collapse
|
27
|
Pons J, Bover P, Bidegaray-Batista L, Arnedo MA. Arm-less mitochondrial tRNAs conserved for over 30 millions of years in spiders. BMC Genomics 2019; 20:665. [PMID: 31438844 PMCID: PMC6706885 DOI: 10.1186/s12864-019-6026-1] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2019] [Accepted: 08/12/2019] [Indexed: 12/31/2022] Open
Abstract
BACKGROUND In recent years, Next Generation Sequencing (NGS) has accelerated the generation of full mitogenomes, providing abundant material for studying different aspects of molecular evolution. Some mitogenomes have been observed to harbor atypical sequences with bizarre secondary structures, which origins and significance could only be fully understood in an evolutionary framework. RESULTS Here we report and analyze the mitochondrial sequences and gene arrangements of six closely related spiders in the sister genera Parachtes and Harpactocrates, which belong to the nocturnal, ground dwelling family Dysderidae. Species of both genera have compacted mitogenomes with many overlapping genes and strikingly reduced tRNAs that are among the shortest described within metazoans. Thanks to the conservation of the gene order and the nucleotide identity across close relatives, we were able to predict the secondary structures even on arm-less tRNAs, which would be otherwise unattainable for a single species. They exhibit aberrant secondary structures with the lack of either DHU or TΨC arms and many miss-pairings in the acceptor arm but this degeneracy trend goes even further since at least four tRNAs are arm-less in the six spider species studied. CONCLUSIONS The conservation of at least four arm-less tRNA genes in two sister spider genera for about 30 myr suggest that these genes are still encoding fully functional tRNAs though they may be post-transcriptionally edited to be fully functional as previously described in other species. We suggest that the presence of overlapping and truncated tRNA genes may be related and explains why spider mitogenomes are smaller than those of other invertebrates.
Collapse
Affiliation(s)
- Joan Pons
- Departamento de Biodiversidad y Conservación, Instituto Mediterráneo de Estudios Avanzados (CSIC-UIB), Miquel Marquès, 21, 07190 Esporles, Illes Balears Spain
| | - Pere Bover
- ARAID Foundation – IUCA Grupo-Aragosaurus, Facultad de Ciencias, Universidad de Zaragoza, Pedro Cerbuna 12 -, 50009 Zaragoza, Spain
| | - Leticia Bidegaray-Batista
- Departamento de Biodiversidad y Genética, Instituto de Investigaciones Biológicas Clemente Estable, Avenida Italia 3318, 11600 Montevideo, CP Uruguay
| | - Miquel A. Arnedo
- Departament de Biologia Evolutiva, Ecologia i Ciències Ambientals & Institut de Recerca de la Biodiversitat (IRBio), Universitat de Barcelona, Av. Diagonal 643, E-8028 Barcelona, Catalonia Spain
| |
Collapse
|
28
|
Spasic A, Assmann SM, Bevilacqua PC, Mathews DH. Modeling RNA secondary structure folding ensembles using SHAPE mapping data. Nucleic Acids Res 2019; 46:314-323. [PMID: 29177466 PMCID: PMC5758915 DOI: 10.1093/nar/gkx1057] [Citation(s) in RCA: 61] [Impact Index Per Article: 10.2] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2017] [Accepted: 10/30/2017] [Indexed: 12/22/2022] Open
Abstract
RNA secondary structure prediction is widely used for developing hypotheses about the structures of RNA sequences, and structure can provide insight about RNA function. The accuracy of structure prediction is known to be improved using experimental mapping data that provide information about the pairing status of single nucleotides, and these data can now be acquired for whole transcriptomes using high-throughput sequencing. Prior methods for using these experimental data focused on predicting structures for sequences assuming that they populate a single structure. Most RNAs populate multiple structures, however, where the ensemble of strands populates structures with different sets of canonical base pairs. The focus on modeling single structures has been a bottleneck for accurately modeling RNA structure. In this work, we introduce Rsample, an algorithm for using experimental data to predict more than one RNA structure for sequences that populate multiple structures at equilibrium. We demonstrate, using SHAPE mapping data, that we can accurately model RNA sequences that populate multiple structures, including the relative probabilities of those structures. This program is freely available as part of the RNAstructure software package.
Collapse
Affiliation(s)
- Aleksandar Spasic
- Department of Biochemistry & Biophysics, University of Rochester Medical Center, Rochester, NY 14642, USA.,Center for RNA Biology, University of Rochester Medical Center, Rochester, NY 14642, USA
| | - Sarah M Assmann
- Department of Biology, Pennsylvania State University, University Park, PA 16802, USA
| | - Philip C Bevilacqua
- Department of Chemistry, Department of Biochemistry & Molecular Biology, Center for RNA Molecular Biology, Pennsylvania State University, University Park, PA 16802, USA
| | - David H Mathews
- Department of Biochemistry & Biophysics, University of Rochester Medical Center, Rochester, NY 14642, USA.,Center for RNA Biology, University of Rochester Medical Center, Rochester, NY 14642, USA.,Department of Biostatistics & Computational Biology, University of Rochester Medical Center, Rochester, NY 14642, USA
| |
Collapse
|
29
|
Choudhary K, Lai YH, Tran EJ, Aviran S. dStruct: identifying differentially reactive regions from RNA structurome profiling data. Genome Biol 2019; 20:40. [PMID: 30791935 PMCID: PMC6385470 DOI: 10.1186/s13059-019-1641-3] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2018] [Accepted: 01/24/2019] [Indexed: 12/16/2022] Open
Abstract
RNA biology is revolutionized by recent developments of diverse high-throughput technologies for transcriptome-wide profiling of molecular RNA structures. RNA structurome profiling data can be used to identify differentially structured regions between groups of samples. Existing methods are limited in scope to specific technologies and/or do not account for biological variation. Here, we present dStruct which is the first broadly applicable method for differential analysis accounting for biological variation in structurome profiling data. dStruct is compatible with diverse profiling technologies, is validated with experimental data and simulations, and outperforms existing methods.
Collapse
Affiliation(s)
- Krishna Choudhary
- Department of Biomedical Engineering and Genome Center, University of California, Davis, One Shields Avenue, Davis, 95616 CA USA
| | - Yu-Hsuan Lai
- Department of Biochemistry, Purdue University, BCHM 305, 175 S. University Street, West Lafayette, 47907-2063 IN USA
| | - Elizabeth J. Tran
- Department of Biochemistry, Purdue University, BCHM 305, 175 S. University Street, West Lafayette, 47907-2063 IN USA
- Purdue University Center for Cancer Research, Purdue University, Hansen Life Sciences Research Building, Room 141, 201 S. University Street, West Lafayette, 47907-2064 IN USA
| | - Sharon Aviran
- Department of Biomedical Engineering and Genome Center, University of California, Davis, One Shields Avenue, Davis, 95616 CA USA
| |
Collapse
|
30
|
Mailler E, Paillart JC, Marquet R, Smyth RP, Vivet-Boudou V. The evolution of RNA structural probing methods: From gels to next-generation sequencing. WILEY INTERDISCIPLINARY REVIEWS-RNA 2018; 10:e1518. [PMID: 30485688 DOI: 10.1002/wrna.1518] [Citation(s) in RCA: 28] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/04/2018] [Revised: 09/13/2018] [Accepted: 10/17/2018] [Indexed: 01/09/2023]
Abstract
RNA molecules are important players in all domains of life and the study of the relationship between their multiple flexible states and the associated biological roles has increased in recent years. For several decades, chemical and enzymatic structural probing experiments have been used to determine RNA structure. During this time, there has been a steady improvement in probing reagents and experimental methods, and today the structural biologist community has a large range of tools at its disposal to probe the secondary structure of RNAs in vitro and in cells. Early experiments used radioactive labeling and polyacrylamide gel electrophoresis as read-out methods. This was superseded by capillary electrophoresis, and more recently by next-generation sequencing. Today, powerful structural probing methods can characterize RNA structure on a genome-wide scale. In this review, we will provide an overview of RNA structural probing methodologies from a historical and technical perspective. This article is categorized under: RNA Structure and Dynamics > RNA Structure, Dynamics, and Chemistry RNA Methods > RNA Analyses in vitro and In Silico RNA Methods > RNA Analyses in Cells.
Collapse
Affiliation(s)
- Elodie Mailler
- Architecture et Réactivité de l'ARN, Université de Strasbourg, CNRS, Strasbourg, France
| | | | - Roland Marquet
- Architecture et Réactivité de l'ARN, Université de Strasbourg, CNRS, Strasbourg, France
| | - Redmond P Smyth
- Architecture et Réactivité de l'ARN, Université de Strasbourg, CNRS, Strasbourg, France
| | - Valerie Vivet-Boudou
- Architecture et Réactivité de l'ARN, Université de Strasbourg, CNRS, Strasbourg, France
| |
Collapse
|
31
|
Extracting information from RNA SHAPE data: Kalman filtering approach. PLoS One 2018; 13:e0207029. [PMID: 30462682 PMCID: PMC6248965 DOI: 10.1371/journal.pone.0207029] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2018] [Accepted: 10/23/2018] [Indexed: 01/26/2023] Open
Abstract
RNA SHAPE experiments have become important and successful sources of information for RNA structure prediction. In such experiments, chemical reagents are used to probe RNA backbone flexibility at the nucleotide level, which in turn provides information on base pairing and therefore secondary structure. Little is known, however, about the statistics of such SHAPE data. In this work, we explore different representations of noise in SHAPE data and propose a statistically sound framework for extracting reliable reactivity information from multiple SHAPE replicates. Our analyses of RNA SHAPE experiments underscore that a normal noise model is not adequate to represent their data. We propose instead a log-normal representation of noise and discuss its relevance. Under this assumption, we observe that processing simulated SHAPE data by directly averaging different replicates leads to bias. Such bias can be reduced by analyzing the data following a log transformation, either by log-averaging or Kalman filtering. Application of Kalman filtering has the additional advantage that a prior on the nucleotide reactivities can be introduced. We show that the performance of Kalman filtering is then directly dependent on the quality of that prior. We conclude the paper with guidelines on signal processing of RNA SHAPE data.
Collapse
|
32
|
A Functional riboSNitch in the 3' Untranslated Region of FKBP5 Alters MicroRNA-320a Binding Efficiency and Mediates Vulnerability to Chronic Post-Traumatic Pain. J Neurosci 2018; 38:8407-8420. [PMID: 30150364 DOI: 10.1523/jneurosci.3458-17.2018] [Citation(s) in RCA: 43] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2017] [Revised: 07/11/2018] [Accepted: 07/13/2018] [Indexed: 01/30/2023] Open
Abstract
Previous studies have shown that common variants of the gene coding for FK506-binding protein 51 (FKBP5), a critical regulator of glucocorticoid sensitivity, affect vulnerability to stress-related disorders. In a previous report, FKBP5 rs1360780 was identified as a functional variant because of its effect on gene methylation. Here we report evidence for a novel functional FKBP5 allele, rs3800373. This study assessed the association between rs3800373 and post-traumatic chronic pain in 1607 women and men from two ethnically diverse human cohorts. The molecular mechanism through which rs3800373 affects adverse outcomes was established via in silico, in vivo, and in vitro analyses. The rs3800373 minor allele predicted worse adverse outcomes after trauma exposure, such that individuals with the minor (risk) allele developed more severe post-traumatic chronic musculoskeletal pain. Among these individuals, peritraumatic circulating FKBP5 expression levels increased as cortisol and glucocorticoid receptor (NR3C1) mRNA levels increased, consistent with increased glucocorticoid resistance. Bioinformatic, in vitro, and mutational analyses indicate that the rs3800373 minor allele reduces the binding of a stress- and pain-associated microRNA, miR-320a, to FKBP5 via altering the FKBP5 mRNA 3'UTR secondary structure (i.e., is a riboSNitch). This results in relatively greater FKBP5 translation, unchecked by miR-320a. Overall, these results identify an important gene-miRNA interaction influencing chronic pain risk in vulnerable individuals and suggest that exogenous methods to achieve targeted reduction in poststress FKBP5 mRNA expression may constitute useful therapeutic strategies.SIGNIFICANCE STATEMENT FKBP5 is a critical regulator of the stress response. Previous studies have shown that dysregulation of the expression of this gene plays a role in the pathogenesis of chronic pain development as well as a number of comorbid neuropsychiatric disorders. In the current study, we identified a functional allele (rs3800373) in the 3'UTR of FKBP5 that influences vulnerability to chronic post-traumatic pain in two ethnic cohorts. Using multiple complementary experimental approaches, we show that the FKBP5 rs3800373 minor allele alters the secondary structure of FKBP5 mRNA, decreasing the binding of a stress- and pain-associated microRNA, miR-320a. This results in relatively greater FKBP5 translation, unchecked by miR-320a, increasing glucocorticoid resistance and increasing vulnerability to post-traumatic pain.
Collapse
|
33
|
Automated Recognition of RNA Structure Motifs by Their SHAPE Data Signatures. Genes (Basel) 2018; 9:genes9060300. [PMID: 29904019 PMCID: PMC6027059 DOI: 10.3390/genes9060300] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2018] [Revised: 06/04/2018] [Accepted: 06/13/2018] [Indexed: 02/03/2023] Open
Abstract
High-throughput structure profiling (SP) experiments that provide information at nucleotide resolution are revolutionizing our ability to study RNA structures. Of particular interest are RNA elements whose underlying structures are necessary for their biological functions. We previously introduced patteRNA, an algorithm for rapidly mining SP data for patterns characteristic of such motifs. This work provided a proof-of-concept for the detection of motifs and the capability of distinguishing structures displaying pronounced conformational changes. Here, we describe several improvements and automation routines to patteRNA. We then consider more elaborate biological situations starting with the comparison or integration of results from searches for distinct motifs and across datasets. To facilitate such analyses, we characterize patteRNA’s outputs and describe a normalization framework that regularizes results. We then demonstrate that our algorithm successfully discerns between highly similar structural variants of the human immunodeficiency virus type 1 (HIV-1) Rev response element (RRE) and readily identifies its exact location in whole-genome structure profiles of HIV-1. This work highlights the breadth of information that can be gleaned from SP data and broadens the utility of data-driven methods as tools for the detection of novel RNA elements.
Collapse
|
34
|
Lotfi M, Zare-Mirakabad F, Montaseri S. RNA design using simulated SHAPE data. Genes Genet Syst 2018; 92:257-265. [PMID: 28757510 DOI: 10.1266/ggs.16-00067] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022] Open
Abstract
It has long been established that in addition to being involved in protein translation, RNA plays essential roles in numerous other cellular processes, including gene regulation and DNA replication. Such roles are known to be dictated by higher-order structures of RNA molecules. It is therefore of prime importance to find an RNA sequence that can fold to acquire a particular function that is desirable for use in pharmaceuticals and basic research. The challenge of finding an RNA sequence for a given structure is known as the RNA design problem. Although there are several algorithms to solve this problem, they mainly consider hard constraints, such as minimum free energy, to evaluate the predicted sequences. Recently, SHAPE data has emerged as a new soft constraint for RNA secondary structure prediction. To take advantage of this new experimental constraint, we report here a new method for accurate design of RNA sequences based on their secondary structures using SHAPE data as pseudo-free energy. We then compare our algorithm with four others: INFO-RNA, ERD, MODENA and RNAifold 2.0. Our algorithm precisely predicts 26 out of 29 new sequences for the structures extracted from the Rfam dataset, while the other four algorithms predict no more than 22 out of 29. The proposed algorithm is comparable to the above algorithms on RNA-SSD datasets, where they can predict up to 33 appropriate sequences for RNA secondary structures out of 34.
Collapse
Affiliation(s)
- Mohadeseh Lotfi
- Faculty of Mathematics and Computer Science, Amirkabir University of Technology
| | | | - Soheila Montaseri
- School of Mathematics, Statistics and Computer Science, College of Science, Enghelab Avenue, University of Tehran
| |
Collapse
|
35
|
Hurst T, Xu X, Zhao P, Chen SJ. Quantitative Understanding of SHAPE Mechanism from RNA Structure and Dynamics Analysis. J Phys Chem B 2018; 122:4771-4783. [PMID: 29659274 DOI: 10.1021/acs.jpcb.8b00575] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
The selective 2'-hydroxyl acylation analyzed by primer extension (SHAPE) method probes RNA local structural and dynamic information at single nucleotide resolution. To gain quantitative insights into the relationship between nucleotide flexibility, RNA 3D structure, and SHAPE reactivity, we develop a 3D Structure-SHAPE Relationship model (3DSSR) to rebuild SHAPE profiles from 3D structures. The model starts from RNA structures and combines nucleotide interaction strength and conformational propensity, ligand (SHAPE reagent) accessibility, and base-pairing pattern through a composite function to quantify the correlation between SHAPE reactivity and nucleotide conformational stability. The 3DSSR model shows the relationship between SHAPE reactivity and RNA structure and energetics. Comparisons between the 3DSSR-predicted SHAPE profile and the experimental SHAPE data show correlation, suggesting that the extracted analytical function may have captured the key factors that determine the SHAPE reactivity profile. Furthermore, the theory offers an effective method to sieve RNA 3D models and exclude models that are incompatible with experimental SHAPE data.
Collapse
Affiliation(s)
- Travis Hurst
- Department of Physics, Department of Biochemistry , and University of Missouri Informatics Institute , University of Missouri , Columbia , Missouri 65211 , United States
| | - Xiaojun Xu
- Department of Physics, Department of Biochemistry , and University of Missouri Informatics Institute , University of Missouri , Columbia , Missouri 65211 , United States
| | - Peinan Zhao
- Department of Physics, Department of Biochemistry , and University of Missouri Informatics Institute , University of Missouri , Columbia , Missouri 65211 , United States
| | - Shi-Jie Chen
- Department of Physics, Department of Biochemistry , and University of Missouri Informatics Institute , University of Missouri , Columbia , Missouri 65211 , United States
| |
Collapse
|
36
|
Lackey L, Coria A, Woods C, McArthur E, Laederach A. Allele-specific SHAPE-MaP assessment of the effects of somatic variation and protein binding on mRNA structure. RNA (NEW YORK, N.Y.) 2018; 24:513-528. [PMID: 29317542 PMCID: PMC5855952 DOI: 10.1261/rna.064469.117] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/06/2017] [Accepted: 01/04/2018] [Indexed: 05/22/2023]
Abstract
The impact of inherited and somatic mutations on messenger RNA (mRNA) structure remains poorly understood. Recent technological advances that leverage next-generation sequencing to obtain experimental structure data, such as SHAPE-MaP, can reveal structural effects of mutations, especially when these data are incorporated into structure modeling. Here, we analyze the ability of SHAPE-MaP to detect the relatively subtle structural changes caused by single-nucleotide mutations. We find that allele-specific sorting greatly improved our detection ability. Thus, we used SHAPE-MaP with a novel combination of clone-free robotic mutagenesis and allele-specific sorting to perform a rapid, comprehensive survey of noncoding somatic and inherited riboSNitches in two cancer-associated mRNAs, TPT1 and LCP1 Using rigorous thermodynamic modeling of the Boltzmann suboptimal ensemble, we identified a subset of mutations that change TPT1 and LCP1 RNA structure, with approximately 14% of all variants identified as riboSNitches. To confirm that these in vitro structures were biologically relevant, we tested how dependent TPT1 and LCP1 mRNA structures were on their environments. We performed SHAPE-MaP on TPT1 and LCP1 mRNAs in the presence or absence of cellular proteins and found that both mRNAs have similar overall folds in all conditions. RiboSNitches identified within these mRNAs in vitro likely exist under biological conditions. Overall, these data reveal a robust mRNA structural landscape where differences in environmental conditions and most sequence variants do not significantly alter RNA structural ensembles. Finally, predicting riboSNitches in mRNAs from sequence alone remains particularly challenging; these data will provide the community with benchmarks for further algorithmic development.
Collapse
Affiliation(s)
- Lela Lackey
- Department of Biology, University of North Carolina, Chapel Hill, North Carolina 27599, USA
| | - Aaztli Coria
- Department of Biology, University of North Carolina, Chapel Hill, North Carolina 27599, USA
| | - Chanin Woods
- Department of Biology, University of North Carolina, Chapel Hill, North Carolina 27599, USA
| | - Evonne McArthur
- School of Medicine, Vanderbilt University, Nashville, Tennessee 37232, USA
| | - Alain Laederach
- Department of Biology, University of North Carolina, Chapel Hill, North Carolina 27599, USA
| |
Collapse
|
37
|
Ledda M, Aviran S. PATTERNA: transcriptome-wide search for functional RNA elements via structural data signatures. Genome Biol 2018; 19:28. [PMID: 29495968 PMCID: PMC5833111 DOI: 10.1186/s13059-018-1399-z] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2017] [Accepted: 01/30/2018] [Indexed: 02/08/2023] Open
Abstract
Establishing a link between RNA structure and function remains a great challenge in RNA biology. The emergence of high-throughput structure profiling experiments is revolutionizing our ability to decipher structure, yet principled approaches for extracting information on structural elements directly from these data sets are lacking. We present PATTERNA, an unsupervised pattern recognition algorithm that rapidly mines RNA structure motifs from profiling data. We demonstrate that PATTERNA detects motifs with an accuracy comparable to commonly used thermodynamic models and highlight its utility in automating data-directed structure modeling from large data sets. PATTERNA is versatile and compatible with diverse profiling techniques and experimental conditions.
Collapse
Affiliation(s)
- Mirko Ledda
- Department of Biomedical Engineering and Genome Center, UC Davis, 1 Shields Ave, Davis, 95616 USA
- Integrative Genetics and Genomics Graduate Group, UC Davis, 1 Shields Ave, Davis, 95616 USA
| | - Sharon Aviran
- Department of Biomedical Engineering and Genome Center, UC Davis, 1 Shields Ave, Davis, 95616 USA
| |
Collapse
|
38
|
Statistical modeling of RNA structure profiling experiments enables parsimonious reconstruction of structure landscapes. Nat Commun 2018; 9:606. [PMID: 29426922 PMCID: PMC5807309 DOI: 10.1038/s41467-018-02923-8] [Citation(s) in RCA: 34] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2017] [Accepted: 01/09/2018] [Indexed: 11/23/2022] Open
Abstract
RNA plays key regulatory roles in diverse cellular processes, where its functionality often derives from folding into and converting between structures. Many RNAs further rely on co-existence of alternative structures, which govern their response to cellular signals. However, characterizing heterogeneous landscapes is difficult, both experimentally and computationally. Recently, structure profiling experiments have emerged as powerful and affordable structure characterization methods, which improve computational structure prediction. To date, efforts have centered on predicting one optimal structure, with much less progress made on multiple-structure prediction. Here, we report a probabilistic modeling approach that predicts a parsimonious set of co-existing structures and estimates their abundances from structure profiling data. We demonstrate robust landscape reconstruction and quantitative insights into structural dynamics by analyzing numerous data sets. This work establishes a framework for data-directed characterization of structure landscapes to aid experimentalists in performing structure-function studies. Different experimental and computational approaches can be used to study RNA structures. Here, the authors present a computational method for data-directed reconstruction of complex RNA structure landscapes, which predicts a parsimonious set of co-existing structures and estimates their abundances from structure profiling data.
Collapse
|
39
|
Mlýnský V, Bussi G. Molecular Dynamics Simulations Reveal an Interplay between SHAPE Reagent Binding and RNA Flexibility. J Phys Chem Lett 2018; 9:313-318. [PMID: 29265824 PMCID: PMC5830694 DOI: 10.1021/acs.jpclett.7b02921] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2017] [Accepted: 12/21/2017] [Indexed: 05/10/2023]
Abstract
The function of RNA molecules usually depends on their overall fold and on the presence of specific structural motifs. Chemical probing methods are routinely used in combination with nearest-neighbor models to determine RNA secondary structure. Among the available methods, SHAPE is relevant due to its capability to probe all RNA nucleotides and the possibility to be used in vivo. However, the structural determinants for SHAPE reactivity and its mechanism of reaction are still unclear. Here molecular dynamics simulations and enhanced sampling techniques are used to predict the accessibility of nucleotide analogs and larger RNA structural motifs to SHAPE reagents. We show that local RNA reconformations are crucial in allowing reagents to reach the 2'-OH group of a particular nucleotide and that sugar pucker is a major structural factor influencing SHAPE reactivity.
Collapse
Affiliation(s)
- Vojtěch Mlýnský
- Scuola Internazionale Superiore di
Studi Avanzati, SISSA, via Bonomea 265, 34136 Trieste, Italy
| | - Giovanni Bussi
- Scuola Internazionale Superiore di
Studi Avanzati, SISSA, via Bonomea 265, 34136 Trieste, Italy
| |
Collapse
|
40
|
Rife Magalis B, Kosakovsky Pond SL, Summers MF, Salemi M. Evaluation of global HIV/SIV envelope gp120 RNA structure and evolution within and among infected hosts. Virus Evol 2018; 4:vey018. [PMID: 29951250 PMCID: PMC6014367 DOI: 10.1093/ve/vey018] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
Abstract
Lentiviral RNA genomes contain structural elements that play critical roles in viral replication. Although structural features of 5'-untranslated regions have been well characterized, attempts to identify important structures in other genomic regions by Selective 2'-Hydroxyl Acylation analyzed by Primer Extension (SHAPE) have led to conflicting structural and mechanistic conclusions. Previous approaches accounted neither for sequence heterogeneity that is ubiquitous in viral populations, nor for selective constraints operating at the protein level. We developed an approach that augments SHAPE with phylogenetic analyses and applied it to investigate structure in coding regions (cRNA) within the HIV and SIV envelope genes. Analysis of single-genome SHAPE data with phylogenetic information from diverse lentiviral sequences argues against the conservation of a putative global gp120 RNA structure but points to the existence of core RNA sub-structures. Our findings establish a framework for considering sequence heterogeneity and protein function in de novo RNA structure inference approaches.
Collapse
Affiliation(s)
- Brittany Rife Magalis
- Emerging Pathogens Institute and Department of Pathology, Immunology and Laboratory Medicine, University of Florida, Gainesville, FL, USA
- Institute for Genomics and Evolutionary Medicine and Department of Biology, Temple University, Philadelphia, PA, USA
| | - Sergei L Kosakovsky Pond
- Institute for Genomics and Evolutionary Medicine and Department of Biology, Temple University, Philadelphia, PA, USA
| | - Michael F Summers
- Howard Hughes Medical Institute and Department of Chemistry and Biochemistry, University of Maryland Baltimore County, Baltimore, MD, USA
| | - Marco Salemi
- Emerging Pathogens Institute and Department of Pathology, Immunology and Laboratory Medicine, University of Florida, Gainesville, FL, USA
| |
Collapse
|
41
|
Abstract
Over the last two decades it has become clear that RNA is much more than just a boring intermediate in protein expression. Ancient RNAs still appear in the core information metabolism and comprise a surprisingly large component in bacterial gene regulation. A common theme with these types of mostly small RNAs is their reliance of conserved secondary structures. Large scale sequencing projects, on the other hand, have profoundly changed our understanding of eukaryotic genomes. Pervasively transcribed, they give rise to a plethora of large and evolutionarily extremely flexible noncoding RNAs that exert a vastly diverse array of molecule functions. In this chapter we provide a-necessarily incomplete-overview of the current state of comparative analysis of noncoding RNAs, emphasizing computational approaches as a means to gain a global picture of the modern RNA world.
Collapse
Affiliation(s)
- Rolf Backofen
- Bioinformatics Group, Department of Computer Science, University of Freiburg, Georges-Köhler-Allee 106, D-79110 Freiburg, Germany.,Center for non-coding RNA in Technology and Health, Department of Veterinary and Animal Sciences, University of Copenhagen, Grønnegårdsvej 3, DK-1870 Frederiksberg C, Denmark
| | - Jan Gorodkin
- Center for non-coding RNA in Technology and Health, Department of Veterinary and Animal Sciences, University of Copenhagen, Grønnegårdsvej 3, DK-1870 Frederiksberg C, Denmark
| | - Ivo L Hofacker
- Center for non-coding RNA in Technology and Health, Department of Veterinary and Animal Sciences, University of Copenhagen, Grønnegårdsvej 3, DK-1870 Frederiksberg C, Denmark.,Institute for Theoretical Chemistry, University of Vienna, Währingerstraße 17, A-1090 Wien, Austria.,Bioinformatics and Computational Biology Research Group, University of Vienna, Währingerstraße 17, A-1090 Vienna, Austria
| | - Peter F Stadler
- Center for non-coding RNA in Technology and Health, Department of Veterinary and Animal Sciences, University of Copenhagen, Grønnegårdsvej 3, DK-1870 Frederiksberg C, Denmark. .,Institute for Theoretical Chemistry, University of Vienna, Währingerstraße 17, A-1090 Wien, Austria. .,Bioinformatics Group, Department of Computer Science, Interdisciplinary Center for Bioinformatics, University of Leipzig, Härtelstraße 16-18, D-04107 Leipzig, Germany. .,Max Planck Institute for Mathematics in the Sciences, Inselstraße 22, D-04103 Leipzig, Germany. .,Fraunhofer Institute for Cell Therapy and Immunology, Perlickstraße 1, D-04103 Leipzig, Germany. .,Santa Fe Institute, 1399 Hyde Park Rd, Santa Fe, NM 87501, USA.
| |
Collapse
|
42
|
Montaseri S, Zare-Mirakabad F, Ganjtabesh M. Evaluating the quality of SHAPE data simulated by k-mers for RNA structure prediction. J Bioinform Comput Biol 2017; 15:1750023. [PMID: 29113564 DOI: 10.1142/s0219720017500238] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Finding an effective measure to predict a more accurate RNA secondary structure is a challenging problem. In the last decade, an experimental method, known as selective [Formula: see text]-hydroxyl acylation analyzed by primer extension (SHAPE), was proposed to measure the tendency of forming a base pair for almost all nucleotides in an RNA sequence. These SHAPE reactivities are then utilized to improve the accuracy of RNA structure prediction. Due to a significant impact of SHAPE reactivity and in order to reduce the experimental costs, we propose a new model called HL-k-mer. This model simulates the SHAPE reactivity for each nucleotide in an RNA sequence. This is done by fetching the SHAPE reactivities for all sub-sequences of length k (k-mers) appearing in helix and loop regions. For evaluating the quality of simulated SHAPE data, ESD-Fold method is used based on the SHAPE data simulated by the HL-k-mer model ([Formula: see text]). Also, for further evaluation of simulated SHAPE data, three different methods are employed. We also extend this model to simulate the SHAPE data for the RNA pseudoknotted structure. The results indicate that the average accuracies of prediction using the SHAPE data simulated by our models (for [Formula: see text]) are higher compared to the experimental SHAPE data.
Collapse
Affiliation(s)
- Soheila Montaseri
- 1 Department of Computer Science, School of Mathematics Statistics, and Computer Science, University of Tehran, Tehran, Iran
| | - Fatemeh Zare-Mirakabad
- 2 Department of Computer Science, Faculty of Mathematics and Computer Science, Amirkabir, University of Technology, Tehran, Iran
| | - Mohammad Ganjtabesh
- 1 Department of Computer Science, School of Mathematics Statistics, and Computer Science, University of Tehran, Tehran, Iran.,3 School of Biological Sciences, Institute for Research in Fundamental Sciences (IPM), Tehran, P.O. Box: 19395-5746, Iran
| |
Collapse
|
43
|
Rogers E, Murrugarra D, Heitsch C. Conditioning and Robustness of RNA Boltzmann Sampling under Thermodynamic Parameter Perturbations. Biophys J 2017. [PMID: 28629618 DOI: 10.1016/j.bpj.2017.05.026] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/23/2023] Open
Abstract
Understanding how RNA secondary structure prediction methods depend on the underlying nearest-neighbor thermodynamic model remains a fundamental challenge in the field. Minimum free energy (MFE) predictions are known to be "ill conditioned" in that small changes to the thermodynamic model can result in significantly different optimal structures. Hence, the best practice is now to sample from the Boltzmann distribution, which generates a set of suboptimal structures. Although the structural signal of this Boltzmann sample is known to be robust to stochastic noise, the conditioning and robustness under thermodynamic perturbations have yet to be addressed. We present here a mathematically rigorous model for conditioning inspired by numerical analysis, and also a biologically inspired definition for robustness under thermodynamic perturbation. We demonstrate the strong correlation between conditioning and robustness and use its tight relationship to define quantitative thresholds for well versus ill conditioning. These resulting thresholds demonstrate that the majority of the sequences are at least sample robust, which verifies the assumption of sampling's improved conditioning over the MFE prediction. Furthermore, because we find no correlation between conditioning and MFE accuracy, the presence of both well- and ill-conditioned sequences indicates the continued need for both thermodynamic model refinements and alternate RNA structure prediction methods beyond the physics-based ones.
Collapse
Affiliation(s)
- Emily Rogers
- School of Computational Science and Engineering, Georgia Institute of Technology, Atlanta, Georgia
| | - David Murrugarra
- Department of Mathematics, University of Kentucky, Lexington, Kentucky
| | - Christine Heitsch
- School of Mathematics, Georgia Institute of Technology, Atlanta, Georgia.
| |
Collapse
|
44
|
Li J, Xu C, Liang H, Cong W, Wang Y, Luan K, Liu Y. RGRNA: prediction of RNA secondary structure based on replacement and growth of stems. Comput Methods Biomech Biomed Engin 2017; 20:1261-1272. [DOI: 10.1080/10255842.2017.1340460] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Affiliation(s)
- Jin Li
- College of Automation, Harbin Engineering University, Harbin, China
| | - Chengzhen Xu
- College of Automation, Harbin Engineering University, Harbin, China
| | - Hong Liang
- College of Automation, Harbin Engineering University, Harbin, China
| | - Wang Cong
- College of Automation, Harbin Engineering University, Harbin, China
| | - Ying Wang
- College of Automation, Harbin Engineering University, Harbin, China
| | - Kuan Luan
- College of Automation, Harbin Engineering University, Harbin, China
| | - Yunlong Liu
- College of Automation, Harbin Engineering University, Harbin, China
- Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, Indianapolis, IN, USA
| |
Collapse
|
45
|
Tan Z, Sharma G, Mathews DH. Modeling RNA Secondary Structure with Sequence Comparison and Experimental Mapping Data. Biophys J 2017; 113:330-338. [PMID: 28735622 DOI: 10.1016/j.bpj.2017.06.039] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2017] [Revised: 06/07/2017] [Accepted: 06/19/2017] [Indexed: 10/19/2022] Open
Abstract
Secondary structure prediction is an important problem in RNA bioinformatics because knowledge of structure is critical to understanding the functions of RNA sequences. Significant improvements in prediction accuracy have recently been demonstrated though the incorporation of experimentally obtained structural information, for instance using selective 2'-hydroxyl acylation analyzed by primer extension (SHAPE) mapping. However, such mapping data is currently available only for a limited number of RNA sequences. In this article, we present a method for extending the benefit of experimental mapping data in secondary structure prediction to homologous sequences. Specifically, we propose a method for integrating experimental mapping data into a comparative sequence analysis algorithm for secondary structure prediction of multiple homologs, whereby the mapping data benefits not only the prediction for the specific sequence that was mapped but also other homologs. The proposed method is realized by modifying the TurboFold II algorithm for prediction of RNA secondary structures to utilize basepairing probabilities guided by SHAPE experimental data when such data are available. The SHAPE-mapping-guided basepairing probabilities are obtained using the RSample method. Results demonstrate that the SHAPE mapping data for a sequence improves structure prediction accuracy of other homologous sequences beyond the accuracy obtained by sequence comparison alone (TurboFold II). The updated version of TurboFold II is freely available as part of the RNAstructure software package.
Collapse
Affiliation(s)
- Zhen Tan
- Department of Biochemistry and Biophysics, University of Rochester Medical Center, Rochester, New York; Center for RNA Biology, University of Rochester Medical Center, Rochester, New York
| | - Gaurav Sharma
- Center for RNA Biology, University of Rochester Medical Center, Rochester, New York; Department of Electrical and Computer Engineering, University of Rochester Medical Center, Rochester, New York; Department of Biostatistics and Computational Biology, University of Rochester Medical Center, Rochester, New York.
| | - David H Mathews
- Department of Biochemistry and Biophysics, University of Rochester Medical Center, Rochester, New York; Center for RNA Biology, University of Rochester Medical Center, Rochester, New York; Department of Biostatistics and Computational Biology, University of Rochester Medical Center, Rochester, New York.
| |
Collapse
|
46
|
Bell DR, Cheng SY, Salazar H, Ren P. Capturing RNA Folding Free Energy with Coarse-Grained Molecular Dynamics Simulations. Sci Rep 2017; 7:45812. [PMID: 28393861 PMCID: PMC5385882 DOI: 10.1038/srep45812] [Citation(s) in RCA: 37] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2016] [Accepted: 03/06/2017] [Indexed: 01/25/2023] Open
Abstract
We introduce a coarse-grained RNA model for molecular dynamics simulations, RACER (RnA CoarsE-gRained). RACER achieves accurate native structure prediction for a number of RNAs (average RMSD of 2.93 Å) and the sequence-specific variation of free energy is in excellent agreement with experimentally measured stabilities (R2 = 0.93). Using RACER, we identified hydrogen-bonding (or base pairing), base stacking, and electrostatic interactions as essential driving forces for RNA folding. Also, we found that separating pairing vs. stacking interactions allowed RACER to distinguish folded vs. unfolded states. In RACER, base pairing and stacking interactions each provide an approximate stability of 3-4 kcal/mol for an A-form helix. RACER was developed based on PDB structural statistics and experimental thermodynamic data. In contrast with previous work, RACER implements a novel effective vdW potential energy function, which led us to re-parameterize hydrogen bond and electrostatic potential energy functions. Further, RACER is validated and optimized using a simulated annealing protocol to generate potential energy vs. RMSD landscapes. Finally, RACER is tested using extensive equilibrium pulling simulations (0.86 ms total) on eleven RNA sequences (hairpins and duplexes).
Collapse
Affiliation(s)
- David R. Bell
- Department of Biomedical Engineering, University of Texas at Austin, Austin, Texas 78712, United States
| | - Sara Y. Cheng
- Department of Physics, University of Texas at Austin, Austin, Texas 78712, United States
| | - Heber Salazar
- Department of Biomedical Engineering, University of Texas at Austin, Austin, Texas 78712, United States
| | - Pengyu Ren
- Department of Biomedical Engineering, University of Texas at Austin, Austin, Texas 78712, United States
| |
Collapse
|
47
|
Abstract
The discoveries of myriad non-coding RNA molecules, each transiting through multiple flexible states in cells or virions, present major challenges for structure determination. Advances in high-throughput chemical mapping give new routes for characterizing entire transcriptomes in vivo, but the resulting one-dimensional data generally remain too information-poor to allow accurate de novo structure determination. Multidimensional chemical mapping (MCM) methods seek to address this challenge. Mutate-and-map (M2), RNA interaction groups by mutational profiling (RING-MaP and MaP-2D analysis) and multiplexed •OH cleavage analysis (MOHCA) measure how the chemical reactivities of every nucleotide in an RNA molecule change in response to modifications at every other nucleotide. A growing body of in vitro blind tests and compensatory mutation/rescue experiments indicate that MCM methods give consistently accurate secondary structures and global tertiary structures for ribozymes, ribosomal domains and ligand-bound riboswitch aptamers up to 200 nucleotides in length. Importantly, MCM analyses provide detailed information on structurally heterogeneous RNA states, such as ligand-free riboswitches that are functionally important but difficult to resolve with other approaches. The sequencing requirements of currently available MCM protocols scale at least quadratically with RNA length, precluding general application to transcriptomes or viral genomes at present. We propose a modify-cross-link-map (MXM) expansion to overcome this and other current limitations to resolving the in vivo 'RNA structurome'.
Collapse
|
48
|
Choudhary K, Deng F, Aviran S. Comparative and integrative analysis of RNA structural profiling data: current practices and emerging questions. QUANTITATIVE BIOLOGY 2017; 5:3-24. [PMID: 28717530 PMCID: PMC5510538 DOI: 10.1007/s40484-017-0093-6] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2016] [Revised: 12/08/2016] [Accepted: 12/15/2016] [Indexed: 12/30/2022]
Abstract
BACKGROUND Structure profiling experiments provide single-nucleotide information on RNA structure. Recent advances in chemistry combined with application of high-throughput sequencing have enabled structure profiling at transcriptome scale and in living cells, creating unprecedented opportunities for RNA biology. Propelled by these experimental advances, massive data with ever-increasing diversity and complexity have been generated, which give rise to new challenges in interpreting and analyzing these data. RESULTS We review current practices in analysis of structure profiling data with emphasis on comparative and integrative analysis as well as highlight emerging questions. Comparative analysis has revealed structural patterns across transcriptomes and has become an integral component of recent profiling studies. Additionally, profiling data can be integrated into traditional structure prediction algorithms to improve prediction accuracy. CONCLUSIONS To keep pace with experimental developments, methods to facilitate, enhance and refine such analyses are needed. Parallel advances in analysis methodology will complement profiling technologies and help them reach their full potential.
Collapse
Affiliation(s)
| | | | - Sharon Aviran
- Department of Biomedical Engineering and Genome Center, University of California at Davis, Davis, CA 95616, USA
| |
Collapse
|
49
|
Lian DS, Zhao SJ. Capillary electrophoresis based on nucleic acid detection for diagnosing human infectious disease. Clin Chem Lab Med 2017; 54:707-38. [PMID: 26352354 DOI: 10.1515/cclm-2015-0096] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2015] [Accepted: 06/17/2015] [Indexed: 01/22/2023]
Abstract
Rapid transmission, high morbidity, and mortality are the features of human infectious diseases caused by microorganisms, such as bacteria, fungi, and viruses. These diseases may lead within a short period of time to great personal and property losses, especially in regions where sanitation is poor. Thus, rapid diagnoses are vital for the prevention and therapeutic intervention of human infectious diseases. Several conventional methods are often used to diagnose infectious diseases, e.g. methods based on cultures or morphology, or biochemical tests based on metabonomics. Although traditional methods are considered gold standards and are used most frequently, they are laborious, time consuming, and tedious and cannot meet the demand for rapid diagnoses. Disease diagnosis using capillary electrophoresis methods has the advantages of high efficiency, high throughput, and high speed, and coupled with the different nucleic acid detection strategies overcomes the drawbacks of traditional identification methods, precluding many types of false positive and negative results. Therefore, this review focuses on the application of capillary electrophoresis based on nucleic detection to the diagnosis of human infectious diseases, and offers an introduction to the limitations, advantages, and future developments of this approach.
Collapse
|
50
|
Evolutionary Algorithm for RNA Secondary Structure Prediction Based on Simulated SHAPE Data. PLoS One 2016; 11:e0166965. [PMID: 27893832 PMCID: PMC5125645 DOI: 10.1371/journal.pone.0166965] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2016] [Accepted: 11/06/2016] [Indexed: 11/29/2022] Open
Abstract
Background Non-coding RNAs perform a wide range of functions inside the living cells that are related to their structures. Several algorithms have been proposed to predict RNA secondary structure based on minimum free energy. Low prediction accuracy of these algorithms indicates that free energy alone is not sufficient to predict the functional secondary structure. Recently, the obtained information from the SHAPE experiment greatly improves the accuracy of RNA secondary structure prediction by adding this information to the thermodynamic free energy as pseudo-free energy. Method In this paper, a new method is proposed to predict RNA secondary structure based on both free energy and SHAPE pseudo-free energy. For each RNA sequence, a population of secondary structures is constructed and their SHAPE data are simulated. Then, an evolutionary algorithm is used to improve each structure based on both free and pseudo-free energies. Finally, a structure with minimum summation of free and pseudo-free energies is considered as the predicted RNA secondary structure. Results and Conclusions Computationally simulating the SHAPE data for a given RNA sequence requires its secondary structure. Here, we overcome this limitation by employing a population of secondary structures. This helps us to simulate the SHAPE data for any RNA sequence and consequently improves the accuracy of RNA secondary structure prediction as it is confirmed by our experiments. The source code and web server of our proposed method are freely available at http://mostafa.ut.ac.ir/ESD-Fold/.
Collapse
|