1
|
Greenwood T, Heitsch CE. How Parameters Influence SHAPE-Directed Predictions. Methods Mol Biol 2024; 2726:105-124. [PMID: 38780729 DOI: 10.1007/978-1-0716-3519-3_5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/25/2024]
Abstract
The structure of an RNA sequence encodes information about its biological function. Dynamic programming algorithms are often used to predict the conformation of an RNA molecule from its sequence alone, and adding experimental data as auxiliary information improves prediction accuracy. This auxiliary data is typically incorporated into the nearest neighbor thermodynamic model22 by converting the data into pseudoenergies. Here, we look at how much of the space of possible structures auxiliary data allows prediction methods to explore. We find that for a large class of RNA sequences, auxiliary data shifts the predictions significantly. Additionally, we find that predictions are highly sensitive to the parameters which define the auxiliary data pseudoenergies. In fact, the parameter space can typically be partitioned into regions where different structural predictions predominate.
Collapse
|
2
|
Mahadeshwar G, Tavares RDCA, Wan H, Perry ZR, Pyle AM. RSCanner: rapid assessment and visualization of RNA structure content. Bioinformatics 2023; 39:7066915. [PMID: 36857576 PMCID: PMC10017096 DOI: 10.1093/bioinformatics/btad111] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2022] [Revised: 02/06/2023] [Accepted: 02/27/2023] [Indexed: 03/03/2023] Open
Abstract
MOTIVATION The increasing availability of RNA structural information that spans many kilobases of transcript sequence imposes a need for tools that can rapidly screen, identify, and prioritize structural modules of interest. RESULTS We describe RNA Structural Content Scanner (RSCanner), an automated tool that scans RNA transcripts for regions that contain high levels of secondary structure and then classifies each region for its relative propensity to adopt stable or dynamic structures. RSCanner then generates an intuitive heatmap enabling users to rapidly pinpoint regions likely to contain a high or low density of discrete RNA structures, thereby informing downstream functional or structural investigation. AVAILABILITY AND IMPLEMENTATION RSCanner is freely available as both R script and R Markdown files, along with full documentation and test data (https://github.com/pylelab/RSCanner).
Collapse
Affiliation(s)
| | | | - Han Wan
- Department of Molecular, Cellular and Developmental Biology, Yale University, New Haven, CT 06511, United States
| | - Zion R Perry
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06511, United States
| | - Anna Marie Pyle
- Corresponding author. Department of Molecular, Cellular and Developmental Biology, Yale University, 266 Whitney Avenue, Yale Science Building Room 306, New Haven, CT, 06511, United States. E-mail:
| |
Collapse
|
3
|
Aviran S, Incarnato D. Computational approaches for RNA structure ensemble deconvolution from structure probing data. J Mol Biol 2022; 434:167635. [PMID: 35595163 DOI: 10.1016/j.jmb.2022.167635] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2022] [Revised: 04/29/2022] [Accepted: 05/05/2022] [Indexed: 12/15/2022]
Abstract
RNA structure probing experiments have emerged over the last decade as a straightforward way to determine the structure of RNA molecules in a number of different contexts. Although powerful, the ability of RNA to dynamically interconvert between, and to simultaneously populate, alternative structural configurations, poses a nontrivial challenge to the interpretation of data derived from these experiments. Recent efforts aimed at developing computational methods for the reconstruction of coexisting alternative RNA conformations from structure probing data are paving the way to the study of RNA structure ensembles, even in the context of living cells. In this review, we critically discuss these methods, their limitations and possible future improvements.
Collapse
Affiliation(s)
- Sharon Aviran
- Biomedical Engineering Department and Genome Center, University of California, Davis, CA, USA.
| | - Danny Incarnato
- Department of Molecular Genetics, Groningen Biomolecular Sciences and Biotechnology Institute (GBB), University of Groningen, Groningen, the Netherlands.
| |
Collapse
|
4
|
Yu AM, Gasper PM, Cheng L, Lai LB, Kaur S, Gopalan V, Chen AA, Lucks JB. Computationally reconstructing cotranscriptional RNA folding from experimental data reveals rearrangement of non-native folding intermediates. Mol Cell 2021; 81:870-883.e10. [PMID: 33453165 DOI: 10.1016/j.molcel.2020.12.017] [Citation(s) in RCA: 42] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2020] [Revised: 12/08/2020] [Accepted: 12/10/2020] [Indexed: 11/16/2022]
Abstract
The series of RNA folding events that occur during transcription can critically influence cellular RNA function. Here, we present reconstructing RNA dynamics from data (R2D2), a method to uncover details of cotranscriptional RNA folding. We model the folding of the Escherichia coli signal recognition particle (SRP) RNA and show that it requires specific local structural fluctuations within a key hairpin to engender efficient cotranscriptional conformational rearrangement into the functional structure. All-atom molecular dynamics simulations suggest that this rearrangement proceeds through an internal toehold-mediated strand-displacement mechanism, which can be disrupted with a point mutation that limits local structural fluctuations and rescued with compensating mutations that restore these fluctuations. Moreover, a cotranscriptional folding intermediate could be cleaved in vitro by recombinant E. coli RNase P, suggesting potential cotranscriptional processing. These results from experiment-guided multi-scale modeling demonstrate that even an RNA with a simple functional structure can undergo complex folding and processing during synthesis.
Collapse
Affiliation(s)
- Angela M Yu
- Tri-Institutional Program in Computational Biology and Medicine, Weill Cornell Medicine, New York, NY 10065, USA; Department of Chemical and Biological Engineering, Northwestern University, Evanston, IL 60201, USA
| | - Paul M Gasper
- Department of Chemistry and the RNA Institute, University at Albany, Albany, NY 12222, USA
| | - Luyi Cheng
- Interdisciplinary Biological Sciences Graduate Program, Northwestern University, Evanston, IL 60201, USA
| | - Lien B Lai
- Department of Chemistry and Biochemistry, The Ohio State University, Columbus, OH 43210, USA; Center for RNA Biology, The Ohio State University, Columbus, OH 43210, USA
| | - Simi Kaur
- Department of Chemistry and the RNA Institute, University at Albany, Albany, NY 12222, USA
| | - Venkat Gopalan
- Department of Chemistry and Biochemistry, The Ohio State University, Columbus, OH 43210, USA; Center for RNA Biology, The Ohio State University, Columbus, OH 43210, USA
| | - Alan A Chen
- Department of Chemistry and the RNA Institute, University at Albany, Albany, NY 12222, USA.
| | - Julius B Lucks
- Department of Chemical and Biological Engineering, Northwestern University, Evanston, IL 60201, USA.
| |
Collapse
|
5
|
Greenwood T, Heitsch CE. On the Problem of Reconstructing a Mixture of RNA Structures. Bull Math Biol 2020; 82:133. [PMID: 33029669 DOI: 10.1007/s11538-020-00804-0] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2020] [Accepted: 09/08/2020] [Indexed: 01/02/2023]
Abstract
A growing number of RNA sequences are now known to exist in some distribution with two or more different stable structures. Recent algorithms attempt to reconstruct such mixtures using the list of nucleotides in a sequence in conjunction with auxiliary experimental footprinting data. In this paper, we demonstrate some challenges which remain in addressing this problem; in particular we consider the difficulty of reconstructing a mixture of two RNA structures across a spectrum of different relative abundances. Although progress has been made in identifying the stable structures present, it remains nontrivial to predict the relative abundance of each within the experimentally sampled mixture. Because the ratio of structures present can change depending on experimental conditions, it is the footprinting data-and not the sequence-which must encode information on changes in the relative abundance. Here, we use simulated experimental data to demonstrate that there exist RNA sequences and relative abundance combinations which cannot be recovered by current methods. We then prove that this is not a single exception, but rather part of the rule. In particular, we show, using a Nussinov-Jacobson model, that recovering the relative abundances is difficult for a large proportion of RNA structure pairs. Lastly, we use information theory to establish a framework for quantifying how useful auxiliary data is in predicting the relative abundance of a structure. Together, these results demonstrate that aspects of the problem of reconstructing a mixture of RNA structures from experimental data remain open.
Collapse
|
6
|
Kuksa PP, Li F, Kannan S, Gregory BD, Leung YY, Wang LS. HiPR: High-throughput probabilistic RNA structure inference. Comput Struct Biotechnol J 2020; 18:1539-1547. [PMID: 32637050 PMCID: PMC7327253 DOI: 10.1016/j.csbj.2020.06.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2020] [Revised: 05/15/2020] [Accepted: 06/01/2020] [Indexed: 11/20/2022] Open
Abstract
Recent high-throughput structure-sensitive genome-wide sequencing-based assays have enabled large-scale studies of RNA structure, and robust transcriptome-wide computational prediction of individual RNA structures across RNA classes from these assays has potential to further improve the prediction accuracy. Here, we describe HiPR, a novel method for RNA structure prediction at single-nucleotide resolution that combines high-throughput structure probing data (DMS-seq, DMS-MaPseq) with a novel probabilistic folding algorithm. On validation data spanning a variety of RNA classes, HiPR often increases accuracy for predicting RNA structures, giving researchers new tools to study RNA structure.
Collapse
Affiliation(s)
- Pavel P. Kuksa
- Penn Neurodegeneration Genomics Center, Department of Pathology and Laboratory Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Fan Li
- Children’s Hospital Los Angeles, Los Angeles, CA 90027, USA
| | - Sampath Kannan
- Department of Computer and Information Science, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Brian D. Gregory
- Department of Biology, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Yuk Yee Leung
- Penn Neurodegeneration Genomics Center, Department of Pathology and Laboratory Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Li-San Wang
- Penn Neurodegeneration Genomics Center, Department of Pathology and Laboratory Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
- Department of Computer and Information Science, University of Pennsylvania, Philadelphia, PA 19104, USA
| |
Collapse
|
7
|
Liu XR, Zhang MM, Gross ML. Mass Spectrometry-Based Protein Footprinting for Higher-Order Structure Analysis: Fundamentals and Applications. Chem Rev 2020; 120:4355-4454. [PMID: 32319757 PMCID: PMC7531764 DOI: 10.1021/acs.chemrev.9b00815] [Citation(s) in RCA: 130] [Impact Index Per Article: 32.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023]
Abstract
Proteins adopt different higher-order structures (HOS) to enable their unique biological functions. Understanding the complexities of protein higher-order structures and dynamics requires integrated approaches, where mass spectrometry (MS) is now positioned to play a key role. One of those approaches is protein footprinting. Although the initial demonstration of footprinting was for the HOS determination of protein/nucleic acid binding, the concept was later adapted to MS-based protein HOS analysis, through which different covalent labeling approaches "mark" the solvent accessible surface area (SASA) of proteins to reflect protein HOS. Hydrogen-deuterium exchange (HDX), where deuterium in D2O replaces hydrogen of the backbone amides, is the most common example of footprinting. Its advantage is that the footprint reflects SASA and hydrogen bonding, whereas one drawback is the labeling is reversible. Another example of footprinting is slow irreversible labeling of functional groups on amino acid side chains by targeted reagents with high specificity, probing structural changes at selected sites. A third footprinting approach is by reactions with fast, irreversible labeling species that are highly reactive and footprint broadly several amino acid residue side chains on the time scale of submilliseconds. All of these covalent labeling approaches combine to constitute a problem-solving toolbox that enables mass spectrometry as a valuable tool for HOS elucidation. As there has been a growing need for MS-based protein footprinting in both academia and industry owing to its high throughput capability, prompt availability, and high spatial resolution, we present a summary of the history, descriptions, principles, mechanisms, and applications of these covalent labeling approaches. Moreover, their applications are highlighted according to the biological questions they can answer. This review is intended as a tutorial for MS-based protein HOS elucidation and as a reference for investigators seeking a MS-based tool to address structural questions in protein science.
Collapse
Affiliation(s)
| | | | - Michael L. Gross
- Department of Chemistry, Washington University in St. Louis, St. Louis, MO, USA, 63130
| |
Collapse
|
8
|
Spasic A, Assmann SM, Bevilacqua PC, Mathews DH. Modeling RNA secondary structure folding ensembles using SHAPE mapping data. Nucleic Acids Res 2019; 46:314-323. [PMID: 29177466 PMCID: PMC5758915 DOI: 10.1093/nar/gkx1057] [Citation(s) in RCA: 61] [Impact Index Per Article: 12.2] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2017] [Accepted: 10/30/2017] [Indexed: 12/22/2022] Open
Abstract
RNA secondary structure prediction is widely used for developing hypotheses about the structures of RNA sequences, and structure can provide insight about RNA function. The accuracy of structure prediction is known to be improved using experimental mapping data that provide information about the pairing status of single nucleotides, and these data can now be acquired for whole transcriptomes using high-throughput sequencing. Prior methods for using these experimental data focused on predicting structures for sequences assuming that they populate a single structure. Most RNAs populate multiple structures, however, where the ensemble of strands populates structures with different sets of canonical base pairs. The focus on modeling single structures has been a bottleneck for accurately modeling RNA structure. In this work, we introduce Rsample, an algorithm for using experimental data to predict more than one RNA structure for sequences that populate multiple structures at equilibrium. We demonstrate, using SHAPE mapping data, that we can accurately model RNA sequences that populate multiple structures, including the relative probabilities of those structures. This program is freely available as part of the RNAstructure software package.
Collapse
Affiliation(s)
- Aleksandar Spasic
- Department of Biochemistry & Biophysics, University of Rochester Medical Center, Rochester, NY 14642, USA.,Center for RNA Biology, University of Rochester Medical Center, Rochester, NY 14642, USA
| | - Sarah M Assmann
- Department of Biology, Pennsylvania State University, University Park, PA 16802, USA
| | - Philip C Bevilacqua
- Department of Chemistry, Department of Biochemistry & Molecular Biology, Center for RNA Molecular Biology, Pennsylvania State University, University Park, PA 16802, USA
| | - David H Mathews
- Department of Biochemistry & Biophysics, University of Rochester Medical Center, Rochester, NY 14642, USA.,Center for RNA Biology, University of Rochester Medical Center, Rochester, NY 14642, USA.,Department of Biostatistics & Computational Biology, University of Rochester Medical Center, Rochester, NY 14642, USA
| |
Collapse
|
9
|
Schroeder SJ. Challenges and approaches to predicting RNA with multiple functional structures. RNA (NEW YORK, N.Y.) 2018; 24:1615-1624. [PMID: 30143552 PMCID: PMC6239171 DOI: 10.1261/rna.067827.118] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
The revolution in sequencing technology demands new tools to interpret the genetic code. As in vivo transcriptome-wide chemical probing techniques advance, new challenges emerge in the RNA folding problem. The emphasis on one sequence folding into a single minimum free energy structure is fading as a new focus develops on generating RNA structural ensembles and identifying functional structural features in ensembles. This review describes an efficient combinatorially complete method and three free energy minimization approaches to predicting RNA structures with more than one functional fold, as well as two methods for analysis of a thermodynamics-based Boltzmann ensemble of structures. The review then highlights two examples of viral RNA 3'-UTR regions that fold into more than one conformation and have been characterized by single molecule fluorescence energy resonance transfer or NMR spectroscopy. These examples highlight the different approaches and challenges in predicting structure and function from sequence for RNA with multiple biological roles and folds. More well-defined examples and new metrics for measuring differences in RNA structures will guide future improvements in prediction of RNA structure and function from sequence.
Collapse
Affiliation(s)
- Susan J Schroeder
- Department of Chemistry and Biochemistry, Department of Microbiology and Plant Biology, University of Oklahoma, Norman, Oklahoma 73019, USA
| |
Collapse
|
10
|
Statistical modeling of RNA structure profiling experiments enables parsimonious reconstruction of structure landscapes. Nat Commun 2018; 9:606. [PMID: 29426922 PMCID: PMC5807309 DOI: 10.1038/s41467-018-02923-8] [Citation(s) in RCA: 34] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2017] [Accepted: 01/09/2018] [Indexed: 11/23/2022] Open
Abstract
RNA plays key regulatory roles in diverse cellular processes, where its functionality often derives from folding into and converting between structures. Many RNAs further rely on co-existence of alternative structures, which govern their response to cellular signals. However, characterizing heterogeneous landscapes is difficult, both experimentally and computationally. Recently, structure profiling experiments have emerged as powerful and affordable structure characterization methods, which improve computational structure prediction. To date, efforts have centered on predicting one optimal structure, with much less progress made on multiple-structure prediction. Here, we report a probabilistic modeling approach that predicts a parsimonious set of co-existing structures and estimates their abundances from structure profiling data. We demonstrate robust landscape reconstruction and quantitative insights into structural dynamics by analyzing numerous data sets. This work establishes a framework for data-directed characterization of structure landscapes to aid experimentalists in performing structure-function studies. Different experimental and computational approaches can be used to study RNA structures. Here, the authors present a computational method for data-directed reconstruction of complex RNA structure landscapes, which predicts a parsimonious set of co-existing structures and estimates their abundances from structure profiling data.
Collapse
|
11
|
Martens L, Rühle F, Stoll M. LncRNA secondary structure in the cardiovascular system. Noncoding RNA Res 2017; 2:137-142. [PMID: 30159432 PMCID: PMC6084829 DOI: 10.1016/j.ncrna.2017.12.001] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2017] [Revised: 11/28/2017] [Accepted: 12/08/2017] [Indexed: 01/27/2023] Open
Abstract
Long non-coding RNAs (lncRNAs) have been increasingly studied during the past decade. This led to an immense number of annotated transcripts, out of which many were linked to a diverse range of biological mechanisms and diseases. Due to the variety of their regulatory potential, they are seen as an important link in understanding complex epigenetic mechanisms. Prominent examples of lncRNAs in the cardiovascular system are ANRIL, Braveheart, MALAT1 and HOTAIR which have been excessively studied. But despite the impressive number of described transcripts, only a few examples are characterized functionally. One way to do this is to identify accessible structural domains in the RNA secondary structure which have the ability to bind to DNA, RNA or proteins. Through recent improvements in computational as well as experimental methods, this exploration of secondary structure became not only more efficient than traditional methods like crystallization, but also feasible to investigate whole genome RNA structures. The purpose of this review is to highlight the recent advances in secondary structure probing methods and how these can be applied in order to investigate the functional roles of lncRNAs in the cardiovascular system.
Collapse
Affiliation(s)
- Leonie Martens
- Department of Genetic Epidemiology, Institute of Human Genetics, University of Münster, Münster, Germany
| | - Frank Rühle
- Department of Genetic Epidemiology, Institute of Human Genetics, University of Münster, Münster, Germany
| | - Monika Stoll
- Department of Genetic Epidemiology, Institute of Human Genetics, University of Münster, Münster, Germany
- Department of Biochemistry, Genetic Epidemiology and Statistical Genetics, CARIM School for Cardiovascular Diseases, Maastricht Center for Systems Biology (MaCSBio), Maastricht University, Maastricht, The Netherlands
| |
Collapse
|
12
|
Tan Z, Sharma G, Mathews DH. Modeling RNA Secondary Structure with Sequence Comparison and Experimental Mapping Data. Biophys J 2017; 113:330-338. [PMID: 28735622 DOI: 10.1016/j.bpj.2017.06.039] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2017] [Revised: 06/07/2017] [Accepted: 06/19/2017] [Indexed: 10/19/2022] Open
Abstract
Secondary structure prediction is an important problem in RNA bioinformatics because knowledge of structure is critical to understanding the functions of RNA sequences. Significant improvements in prediction accuracy have recently been demonstrated though the incorporation of experimentally obtained structural information, for instance using selective 2'-hydroxyl acylation analyzed by primer extension (SHAPE) mapping. However, such mapping data is currently available only for a limited number of RNA sequences. In this article, we present a method for extending the benefit of experimental mapping data in secondary structure prediction to homologous sequences. Specifically, we propose a method for integrating experimental mapping data into a comparative sequence analysis algorithm for secondary structure prediction of multiple homologs, whereby the mapping data benefits not only the prediction for the specific sequence that was mapped but also other homologs. The proposed method is realized by modifying the TurboFold II algorithm for prediction of RNA secondary structures to utilize basepairing probabilities guided by SHAPE experimental data when such data are available. The SHAPE-mapping-guided basepairing probabilities are obtained using the RSample method. Results demonstrate that the SHAPE mapping data for a sequence improves structure prediction accuracy of other homologous sequences beyond the accuracy obtained by sequence comparison alone (TurboFold II). The updated version of TurboFold II is freely available as part of the RNAstructure software package.
Collapse
Affiliation(s)
- Zhen Tan
- Department of Biochemistry and Biophysics, University of Rochester Medical Center, Rochester, New York; Center for RNA Biology, University of Rochester Medical Center, Rochester, New York
| | - Gaurav Sharma
- Center for RNA Biology, University of Rochester Medical Center, Rochester, New York; Department of Electrical and Computer Engineering, University of Rochester Medical Center, Rochester, New York; Department of Biostatistics and Computational Biology, University of Rochester Medical Center, Rochester, New York.
| | - David H Mathews
- Department of Biochemistry and Biophysics, University of Rochester Medical Center, Rochester, New York; Center for RNA Biology, University of Rochester Medical Center, Rochester, New York; Department of Biostatistics and Computational Biology, University of Rochester Medical Center, Rochester, New York.
| |
Collapse
|
13
|
Woods CT, Lackey L, Williams B, Dokholyan NV, Gotz D, Laederach A. Comparative Visualization of the RNA Suboptimal Conformational Ensemble In Vivo. Biophys J 2017. [PMID: 28625696 PMCID: PMC5529173 DOI: 10.1016/j.bpj.2017.05.031] [Citation(s) in RCA: 26] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
Abstract
When a ribonucleic acid (RNA) molecule folds, it often does not adopt a single, well-defined conformation. The folding energy landscape of an RNA is highly dependent on its nucleotide sequence and molecular environment. Cellular molecules sometimes alter the energy landscape, thereby changing the ensemble of likely low-energy conformations. The effects of these energy landscape changes on the conformational ensemble are particularly challenging to visualize for large RNAs. We have created a robust approach for visualizing the conformational ensemble of RNAs that is well suited for in vitro versus in vivo comparisons. Our method creates a stable map of conformational space for a given RNA sequence. We first identify single point mutations in the RNA that maximally sample suboptimal conformational space based on the ensemble’s partition function. Then, we cluster these diverse ensembles to identify the most diverse partition functions for Boltzmann stochastic sampling. By using, to our knowledge, a novel nestedness distance metric, we iteratively add mutant suboptimal ensembles to converge on a stable 2D map of conformational space. We then compute the selective 2′ hydroxyl acylation by primer extension (SHAPE)-directed ensemble for the RNA folding under different conditions, and we project these ensembles on the map to visualize. To validate our approach, we established a conformational map of the Vibrio vulnificus add adenine riboswitch that reveals five classes of structures. In the presence of adenine, projection of the SHAPE-directed sampling correctly identified the on-conformation; without the ligand, only off-conformations were visualized. We also collected the whole-transcript in vitro and in vivo SHAPE-MaP for human β-actin messenger RNA that revealed similar global folds in both conditions. Nonetheless, a comparison of in vitro and in vivo data revealed that specific regions exhibited significantly different SHAPE-MaP profiles indicative of structural rearrangements, including rearrangement consistent with binding of the zipcode protein in a region distal to the stop codon.
Collapse
Affiliation(s)
- Chanin T Woods
- Curriculum in Bioinformatics and Computational Biology, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina; Department of Biology, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina
| | - Lela Lackey
- Department of Biology, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina
| | - Benfeard Williams
- Department of Biochemistry and Biophysics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina
| | - Nikolay V Dokholyan
- Department of Biochemistry and Biophysics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina
| | - David Gotz
- Carolina Health Informatics Program, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina; School of Information and Library Science, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina
| | - Alain Laederach
- Curriculum in Bioinformatics and Computational Biology, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina; Department of Biology, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina.
| |
Collapse
|
14
|
Choudhary K, Deng F, Aviran S. Comparative and integrative analysis of RNA structural profiling data: current practices and emerging questions. QUANTITATIVE BIOLOGY 2017; 5:3-24. [PMID: 28717530 PMCID: PMC5510538 DOI: 10.1007/s40484-017-0093-6] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2016] [Revised: 12/08/2016] [Accepted: 12/15/2016] [Indexed: 12/30/2022]
Abstract
BACKGROUND Structure profiling experiments provide single-nucleotide information on RNA structure. Recent advances in chemistry combined with application of high-throughput sequencing have enabled structure profiling at transcriptome scale and in living cells, creating unprecedented opportunities for RNA biology. Propelled by these experimental advances, massive data with ever-increasing diversity and complexity have been generated, which give rise to new challenges in interpreting and analyzing these data. RESULTS We review current practices in analysis of structure profiling data with emphasis on comparative and integrative analysis as well as highlight emerging questions. Comparative analysis has revealed structural patterns across transcriptomes and has become an integral component of recent profiling studies. Additionally, profiling data can be integrated into traditional structure prediction algorithms to improve prediction accuracy. CONCLUSIONS To keep pace with experimental developments, methods to facilitate, enhance and refine such analyses are needed. Parallel advances in analysis methodology will complement profiling technologies and help them reach their full potential.
Collapse
Affiliation(s)
| | | | - Sharon Aviran
- Department of Biomedical Engineering and Genome Center, University of California at Davis, Davis, CA 95616, USA
| |
Collapse
|
15
|
Kutchko KM, Laederach A. Transcending the prediction paradigm: novel applications of SHAPE to RNA function and evolution. WILEY INTERDISCIPLINARY REVIEWS-RNA 2016; 8. [PMID: 27396578 PMCID: PMC5179297 DOI: 10.1002/wrna.1374] [Citation(s) in RCA: 36] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/04/2016] [Revised: 04/29/2016] [Accepted: 05/23/2016] [Indexed: 12/31/2022]
Abstract
Selective 2′‐hydroxyl acylation analyzed by primer extension (SHAPE) provides information on RNA structure at single‐nucleotide resolution. It is most often used in conjunction with RNA secondary structure prediction algorithms as a probabilistic or thermodynamic restraint. With the recent advent of ultra‐high‐throughput approaches for collecting SHAPE data, the applications of this technology are extending beyond structure prediction. In this review, we discuss recent applications of SHAPE data in the transcriptomic context and how this new experimental paradigm is changing our understanding of these experiments and RNA folding in general. SHAPE experiments probe both the secondary and tertiary structure of an RNA, suggesting that model‐free approaches for within and comparative RNA structure analysis can provide significant structural insight without the need for a full structural model. New methods incorporating SHAPE at different nucleotide resolutions are required to parse these transcriptomic data sets to transcend secondary structure modeling with global structural metrics. These ‘multiscale’ approaches provide deeper insights into RNA global structure, evolution, and function in the cell. WIREs RNA 2017, 8:e1374. doi: 10.1002/wrna.1374 For further resources related to this article, please visit the WIREs website.
Collapse
Affiliation(s)
- Katrina M Kutchko
- Department of Biology, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA.,Curriculum in Bioinformatics and Computational Biology, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Alain Laederach
- Department of Biology, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA.,Curriculum in Bioinformatics and Computational Biology, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| |
Collapse
|
16
|
Hamada M, Ono Y, Kiryu H, Sato K, Kato Y, Fukunaga T, Mori R, Asai K. Rtools: a web server for various secondary structural analyses on single RNA sequences. Nucleic Acids Res 2016; 44:W302-7. [PMID: 27131356 PMCID: PMC4987903 DOI: 10.1093/nar/gkw337] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2016] [Accepted: 04/15/2016] [Indexed: 11/12/2022] Open
Abstract
The secondary structures, as well as the nucleotide sequences, are the important features of RNA molecules to characterize their functions. According to the thermodynamic model, however, the probability of any secondary structure is very small. As a consequence, any tool to predict the secondary structures of RNAs has limited accuracy. On the other hand, there are a few tools to compensate the imperfect predictions by calculating and visualizing the secondary structural information from RNA sequences. It is desirable to obtain the rich information from those tools through a friendly interface. We implemented a web server of the tools to predict secondary structures and to calculate various structural features based on the energy models of secondary structures. By just giving an RNA sequence to the web server, the user can get the different types of solutions of the secondary structures, the marginal probabilities such as base-paring probabilities, loop probabilities and accessibilities of the local bases, the energy changes by arbitrary base mutations as well as the measures for validations of the predicted secondary structures. The web server is available at http://rtools.cbrc.jp, which integrates software tools, CentroidFold, CentroidHomfold, IPKnot, CapR, Raccess, Rchange and RintD.
Collapse
Affiliation(s)
- Michiaki Hamada
- Department of Electrical Engineering and Bioscience, Faculty of Science and Engineering, Waseda University, 55N-06-10, 3-4-1, Okubo Shinjuku-ku, Tokyo 169-8555, Japan Artificial Intelligence Research Center (AIRC), National Institute of Advanced Industrial Science and Technology (AIST), 2-4-7 Aomi, Koto-ku, 135-0064 Tokyo, Japan
| | - Yukiteru Ono
- IMSBIO Co., Ltd, 4-21-1-601 Higashi-Ikebukuro, Toshima-ku, Tokyo 170-0013, Japan
| | - Hisanori Kiryu
- Graduate School of Frontier Sciences, University of Tokyo, 5-1-5 Kashiwanoha, Kashiwa 277-8562, Japan
| | - Kengo Sato
- Department of Biosciences and Informatics, Keio University, 3-14-1 Hiyoshi, Kohoku-ku, Yokohama 223-8522, Japan
| | - Yuki Kato
- Center for iPS Cell Research and Application (CiRA), Kyoto University, Shogoin, Sakyo-ku, Kyoto 606-8507, Japan
| | - Tsukasa Fukunaga
- Graduate School of Frontier Sciences, University of Tokyo, 5-1-5 Kashiwanoha, Kashiwa 277-8562, Japan
| | - Ryota Mori
- Graduate School of Frontier Sciences, University of Tokyo, 5-1-5 Kashiwanoha, Kashiwa 277-8562, Japan
| | - Kiyoshi Asai
- Artificial Intelligence Research Center (AIRC), National Institute of Advanced Industrial Science and Technology (AIST), 2-4-7 Aomi, Koto-ku, 135-0064 Tokyo, Japan Graduate School of Frontier Sciences, University of Tokyo, 5-1-5 Kashiwanoha, Kashiwa 277-8562, Japan
| |
Collapse
|
17
|
Sahoo S, Świtnicki MP, Pedersen JS. ProbFold: a probabilistic method for integration of probing data in RNA secondary structure prediction. Bioinformatics 2016; 32:2626-35. [PMID: 27153612 DOI: 10.1093/bioinformatics/btw175] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2015] [Accepted: 03/28/2016] [Indexed: 12/22/2022] Open
Abstract
MOTIVATION Recently, new RNA secondary structure probing techniques have been developed, including Next Generation Sequencing based methods capable of probing transcriptome-wide. These techniques hold great promise for improving structure prediction accuracy. However, each new data type comes with its own signal properties and biases, which may even be experiment specific. There is therefore a growing need for RNA structure prediction methods that can be automatically trained on new data types and readily extended to integrate and fully exploit multiple types of data. RESULTS Here, we develop and explore a modular probabilistic approach for integrating probing data in RNA structure prediction. It can be automatically trained given a set of known structures with probing data. The approach is demonstrated on SHAPE datasets, where we evaluate and selectively model specific correlations. The approach often makes superior use of the probing data signal compared to other methods. We illustrate the use of ProbFold on multiple data types using both simulations and a small set of structures with both SHAPE, DMS and CMCT data. Technically, the approach combines stochastic context-free grammars (SCFGs) with probabilistic graphical models. This approach allows rapid adaptation and integration of new probing data types. AVAILABILITY AND IMPLEMENTATION ProbFold is implemented in C ++. Models are specified using simple textual formats. Data reformatting is done using separate C ++ programs. Source code, statically compiled binaries for x86 Linux machines, C ++ programs, example datasets and a tutorial is available from http://moma.ki.au.dk/prj/probfold/ CONTACT : jakob.skou@clin.au.dk SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Sudhakar Sahoo
- Department of Molecular Medicine (MOMA), Aarhus University Hospital, Aarhus N 8200, Denmark
| | - Michał P Świtnicki
- Department of Molecular Medicine (MOMA), Aarhus University Hospital, Aarhus N 8200, Denmark
| | - Jakob Skou Pedersen
- Department of Molecular Medicine (MOMA), Aarhus University Hospital, Aarhus N 8200, Denmark Bioinformatics Research Centre, Aarhus University, Aarhus C DK-8000, Denmark
| |
Collapse
|
18
|
Vinogradova SV, Sutormin RA, Mironov AA, Soldatov RA. Probing-directed identification of novel structured RNAs. RNA Biol 2016; 13:232-42. [PMID: 26732206 DOI: 10.1080/15476286.2015.1132140] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022] Open
Abstract
Transcripts often harbor RNA elements, which regulate cell processes co- or post-transcriptionally. The functions of many regulatory RNA elements depend on their structure, thus it is important to determine the structure as well as to scan genomes for structured elements. State of the art ab initio approaches to predict structured RNAs rely on DNA sequence analysis. They use 2 major types of information inferred from a sequence: thermodynamic stability of an RNA structure and evolutionary footprints of base-pair interactions. In recent years, chemical probing of RNA has arisen as an alternative source of structural information. RNA probing experiments detect positions accessible to specific types of chemicals or enzymes indicating their propensity to be in a paired or unpaired state. There exist several strategies to integrate probing data into RNA secondary structure prediction algorithms that substantially improve the prediction quality. However, whether and how probing data could contribute to detection of structured RNAs remains an open question. We previously developed the energy-based approach RNASurface to detect locally optimal structured RNA elements. Here, we integrate probing data into the RNASurface energy model using a general framework. We show that the use of experimental data allows for better discrimination of ncRNAs from other transcripts. Application of RNASurface to genome-wide analysis of the human transcriptome with PARS data identifies previously undetectable segments, with evidence of functionality for some of them.
Collapse
Affiliation(s)
- Svetlana V Vinogradova
- a Department of Bioengineering and Bioinformatics , Lomonosov Moscow State University, 1-73 Vorobievy Gory , Moscow , 119991 , Russia.,b Institute for Information Transmission Problems, Russian Academy of Sciences, 19 Bolshoi Karetnyi per , Moscow , 127994 , Russia
| | - Roman A Sutormin
- a Department of Bioengineering and Bioinformatics , Lomonosov Moscow State University, 1-73 Vorobievy Gory , Moscow , 119991 , Russia.,c Lawrence Berkeley National Laboratory , Berkeley , 94710 , CA , USA
| | - Andrey A Mironov
- a Department of Bioengineering and Bioinformatics , Lomonosov Moscow State University, 1-73 Vorobievy Gory , Moscow , 119991 , Russia.,b Institute for Information Transmission Problems, Russian Academy of Sciences, 19 Bolshoi Karetnyi per , Moscow , 127994 , Russia
| | - Ruslan A Soldatov
- a Department of Bioengineering and Bioinformatics , Lomonosov Moscow State University, 1-73 Vorobievy Gory , Moscow , 119991 , Russia.,b Institute for Information Transmission Problems, Russian Academy of Sciences, 19 Bolshoi Karetnyi per , Moscow , 127994 , Russia
| |
Collapse
|
19
|
Abstract
Experimental probing data can be used to improve the accuracy of RNA secondary structure prediction. The software package RNAstructure can take advantage of enzymatic cleavage data, FMN cleavage data, traditional chemical modification reactivity data, and SHAPE reactivity data for secondary structure modeling. This chapter provides protocols for using experimental probing data with RNAstructure to restrain or constrain RNA secondary structure prediction.
Collapse
Affiliation(s)
- Zhenjiang Zech Xu
- Department of Biochemistry & Biophysics, University of Rochester Medical Center, 601 Elmwood Avenue, Box 712, Rochester, NY, 14642, USA
- Center for RNA Biology, University of Rochester Medical Center, 601 Elmwood Avenue, Box 712, Rochester, NY, 14642, USA
| | - David H Mathews
- Department of Biochemistry & Biophysics, University of Rochester Medical Center, 601 Elmwood Avenue, Box 712, Rochester, NY, 14642, USA.
- Center for RNA Biology, University of Rochester Medical Center, 601 Elmwood Avenue, Box 712, Rochester, NY, 14642, USA.
- Department of Biostatistics & Computational Biology, University of Rochester Medical Center, 601 Elmwood Avenue, Box 712, Rochester, NY, 14642, USA.
| |
Collapse
|
20
|
Wu Y, Shi B, Ding X, Liu T, Hu X, Yip KY, Yang ZR, Mathews DH, Lu ZJ. Improved prediction of RNA secondary structure by integrating the free energy model with restraints derived from experimental probing data. Nucleic Acids Res 2015; 43:7247-59. [PMID: 26170232 PMCID: PMC4551937 DOI: 10.1093/nar/gkv706] [Citation(s) in RCA: 62] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2015] [Accepted: 06/30/2015] [Indexed: 12/30/2022] Open
Abstract
Recently, several experimental techniques have emerged for probing RNA structures based on high-throughput sequencing. However, most secondary structure prediction tools that incorporate probing data are designed and optimized for particular types of experiments. For example, RNAstructure-Fold is optimized for SHAPE data, while SeqFold is optimized for PARS data. Here, we report a new RNA secondary structure prediction method, restrained MaxExpect (RME), which can incorporate multiple types of experimental probing data and is based on a free energy model and an MEA (maximizing expected accuracy) algorithm. We first demonstrated that RME substantially improved secondary structure prediction with perfect restraints (base pair information of known structures). Next, we collected structure-probing data from diverse experiments (e.g. SHAPE, PARS and DMS-seq) and transformed them into a unified set of pairing probabilities with a posterior probabilistic model. By using the probability scores as restraints in RME, we compared its secondary structure prediction performance with two other well-known tools, RNAstructure-Fold (based on a free energy minimization algorithm) and SeqFold (based on a sampling algorithm). For SHAPE data, RME and RNAstructure-Fold performed better than SeqFold, because they markedly altered the energy model with the experimental restraints. For high-throughput data (e.g. PARS and DMS-seq) with lower probing efficiency, the secondary structure prediction performances of the tested tools were comparable, with performance improvements for only a portion of the tested RNAs. However, when the effects of tertiary structure and protein interactions were removed, RME showed the highest prediction accuracy in the DMS-accessible regions by incorporating in vivo DMS-seq data.
Collapse
Affiliation(s)
- Yang Wu
- MOE Key Laboratory of Bioinformatics, Center for Synthetic and Systems Biology, Center for Plant Biology and Tsinghua-Peking Joint Center for Life Sciences, School of Life Sciences, Tsinghua University, Beijing 100084, China
| | - Binbin Shi
- MOE Key Laboratory of Bioinformatics, Center for Synthetic and Systems Biology, Center for Plant Biology and Tsinghua-Peking Joint Center for Life Sciences, School of Life Sciences, Tsinghua University, Beijing 100084, China
| | - Xinqiang Ding
- MOE Key Laboratory of Bioinformatics, Center for Synthetic and Systems Biology, Center for Plant Biology and Tsinghua-Peking Joint Center for Life Sciences, School of Life Sciences, Tsinghua University, Beijing 100084, China
| | - Tong Liu
- MOE Key Laboratory of Bioinformatics, Center for Synthetic and Systems Biology, Center for Plant Biology and Tsinghua-Peking Joint Center for Life Sciences, School of Life Sciences, Tsinghua University, Beijing 100084, China
| | - Xihao Hu
- Department of Computer Science and Engineering, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong, China
| | - Kevin Y Yip
- Department of Computer Science and Engineering, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong, China
| | - Zheng Rong Yang
- School of Biosciences, University of Exeter, UK Exeter EX4 4QD, UK
| | - David H Mathews
- Department of Biochemistry and Biophysics and Center for RNA Biology, University of Rochester Medical Center, Rochester, New York 14642, USA
| | - Zhi John Lu
- MOE Key Laboratory of Bioinformatics, Center for Synthetic and Systems Biology, Center for Plant Biology and Tsinghua-Peking Joint Center for Life Sciences, School of Life Sciences, Tsinghua University, Beijing 100084, China
| |
Collapse
|
21
|
Attatippaholkun W, Pankhong P, Nisalak A, Kalayanarooj S. Evolutionary relationship of 5'-untranslated regions among Thai dengue-3 viruses, Bangkok isolates, during 24 year-evolution. ASIAN PAC J TROP MED 2015; 8:176-84. [PMID: 25902157 DOI: 10.1016/s1995-7645(14)60311-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2014] [Revised: 01/20/2015] [Accepted: 02/15/2015] [Indexed: 10/23/2022] Open
Abstract
OBJECTIVE To study evolutionary relationship of the 5'untranslated regions (5'UTRs) in low passage dengue3 viruses (DEN3) isolated from hospitalized children with different clinical manifestations in Bangkok during 24 year-evolution (1977-2000) comparing to the DEN3 prototype (H87). METHODS The 5'UTRs of these Thai DEN3 and the H87 prototype were amplified by RT-PCR and sequenced. Their multiple sequence alignments were done by Codon Code Aligner v 4.0.4 software and their RNA secondary structures were predicted by MFOLD software. Replication of five Thai DEN3 candidates comparing to the H87 prototype were done in human (HepG2) and the mosquito (C6/36) cell lines. RESULTS Among these Thai DEN3, the completely identical sequences of their first 89 nucleotides, their high-order secondary structure of 5'UTRs and three SNPs including the predominant C90T, and two minor SNPs including A109G and A112G were found. The C90T of Thai DEN3, Bangkok isolates was shown predominantly before 1977. Five Thai DEN3 candidates with the predominant C90T were shown to replicate in human (HepG2) and the mosquito (C6/36) cell lines better than the H87 prototype. However, their highly conserved sequences as well as SNPs of the 5'UTR did not appear to correlate with their disease severity in human. CONCLUSIONS Our findings highlighted evolutionary relationship of the completely identical 89 nucleotide sequence, the high-order secondary structure and the predominant C90T of the 5'UTR of these Thai DEN3 during 24 year-evolution further suggesting to be their genetic markers and magic targets for future research on antiviral therapy as well as vaccine approaches of Thai DEN3.
Collapse
Affiliation(s)
- Watcharee Attatippaholkun
- Department of Clinical Chemistry, Faculty of Medical Technology, Mahidol University, Bangkok 10700, Thailand.
| | - Panyupa Pankhong
- Department of Clinical Chemistry, Faculty of Medical Technology, Mahidol University, Bangkok 10700, Thailand
| | - Ananda Nisalak
- Department of Virology, US Army Medical Component, Armed Forces Research Institute of Medical Sciences, Bangkok 10400, Thailand
| | | |
Collapse
|
22
|
The RNA structurome: transcriptome-wide structure probing with next-generation sequencing. Trends Biochem Sci 2015; 40:221-32. [DOI: 10.1016/j.tibs.2015.02.005] [Citation(s) in RCA: 122] [Impact Index Per Article: 13.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/24/2014] [Revised: 02/16/2015] [Accepted: 02/17/2015] [Indexed: 01/16/2023]
|
23
|
Corley M, Solem A, Qu K, Chang HY, Laederach A. Detecting riboSNitches with RNA folding algorithms: a genome-wide benchmark. Nucleic Acids Res 2015; 43:1859-68. [PMID: 25618847 PMCID: PMC4330374 DOI: 10.1093/nar/gkv010] [Citation(s) in RCA: 37] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022] Open
Abstract
Ribonucleic acid (RNA) secondary structure prediction continues to be a significant challenge, in particular when attempting to model sequences with less rigidly defined structures, such as messenger and non-coding RNAs. Crucial to interpreting RNA structures as they pertain to individual phenotypes is the ability to detect RNAs with large structural disparities caused by a single nucleotide variant (SNV) or riboSNitches. A recently published human genome-wide parallel analysis of RNA structure (PARS) study identified a large number of riboSNitches as well as non-riboSNitches, providing an unprecedented set of RNA sequences against which to benchmark structure prediction algorithms. Here we evaluate 11 different RNA folding algorithms’ riboSNitch prediction performance on these data. We find that recent algorithms designed specifically to predict the effects of SNVs on RNA structure, in particular remuRNA, RNAsnp and SNPfold, perform best on the most rigorously validated subsets of the benchmark data. In addition, our benchmark indicates that general structure prediction algorithms (e.g. RNAfold and RNAstructure) have overall better performance if base pairing probabilities are considered rather than minimum free energy calculations. Although overall aggregate algorithmic performance on the full set of riboSNitches is relatively low, significant improvement is possible if the highest confidence predictions are evaluated independently.
Collapse
Affiliation(s)
- Meredith Corley
- Department of Biology, University of North Carolina at Chapel Hill, Chapel Hill, NC 37599, USA Curriculum in Bioinformatics and Computational Biology, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Amanda Solem
- Department of Biology, University of North Carolina at Chapel Hill, Chapel Hill, NC 37599, USA
| | - Kun Qu
- Program in Epithelial Biology, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Howard Y Chang
- Program in Epithelial Biology, Stanford University School of Medicine, Stanford, CA 94305, USA Howard Hughes Medical Institute, Stanford University, Stanford, CA 94305, USA
| | - Alain Laederach
- Department of Biology, University of North Carolina at Chapel Hill, Chapel Hill, NC 37599, USA Curriculum in Bioinformatics and Computational Biology, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| |
Collapse
|
24
|
Aviran S, Pachter L. Rational experiment design for sequencing-based RNA structure mapping. RNA (NEW YORK, N.Y.) 2014; 20:1864-1877. [PMID: 25332375 PMCID: PMC4238353 DOI: 10.1261/rna.043844.113] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/08/2013] [Accepted: 09/07/2014] [Indexed: 05/30/2023]
Abstract
Structure mapping is a classic experimental approach for determining nucleic acid structure that has gained renewed interest in recent years following advances in chemistry, genomics, and informatics. The approach encompasses numerous techniques that use different means to introduce nucleotide-level modifications in a structure-dependent manner. Modifications are assayed via cDNA fragment analysis, using electrophoresis or next-generation sequencing (NGS). The recent advent of NGS has dramatically increased the throughput, multiplexing capacity, and scope of RNA structure mapping assays, thereby opening new possibilities for genome-scale, de novo, and in vivo studies. From an informatics standpoint, NGS is more informative than prior technologies by virtue of delivering direct molecular measurements in the form of digital sequence counts. Motivated by these new capabilities, we introduce a novel model-based in silico approach for quantitative design of large-scale multiplexed NGS structure mapping assays, which takes advantage of the direct and digital nature of NGS readouts. We use it to characterize the relationship between controllable experimental parameters and the precision of mapping measurements. Our results highlight the complexity of these dependencies and shed light on relevant tradeoffs and pitfalls, which can be difficult to discern by intuition alone. We demonstrate our approach by quantitatively assessing the robustness of SHAPE-Seq measurements, obtained by multiplexing SHAPE (selective 2'-hydroxyl acylation analyzed by primer extension) chemistry in conjunction with NGS. We then utilize it to elucidate design considerations in advanced genome-wide approaches for probing the transcriptome, which recently obtained in vivo information using dimethyl sulfate (DMS) chemistry.
Collapse
Affiliation(s)
- Sharon Aviran
- Biomedical Engineering Department and Genome Center, University of California at Davis, Davis, California 95616, USA
| | - Lior Pachter
- Center for Computational Biology and Departments of Molecular and Cell Biology and Mathematics, University of California at Berkeley, Berkeley, California 94720, USA
| |
Collapse
|
25
|
Tian S, Cordero P, Kladwang W, Das R. High-throughput mutate-map-rescue evaluates SHAPE-directed RNA structure and uncovers excited states. RNA (NEW YORK, N.Y.) 2014; 20:1815-26. [PMID: 25183835 PMCID: PMC4201832 DOI: 10.1261/rna.044321.114] [Citation(s) in RCA: 42] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
The three-dimensional conformations of noncoding RNAs underpin their biochemical functions but have largely eluded experimental characterization. Here, we report that integrating a classic mutation/rescue strategy with high-throughput chemical mapping enables rapid RNA structure inference with unusually strong validation. We revisit a 16S rRNA domain for which SHAPE (selective 2'-hydroxyl acylation with primer extension) and limited mutational analysis suggested a conformational change between apo- and holo-ribosome conformations. Computational support estimates, data from alternative chemical probes, and mutate-and-map (M(2)) experiments highlight issues of prior methodology and instead give a near-crystallographic secondary structure. Systematic interrogation of single base pairs via a high-throughput mutation/rescue approach then permits incisive validation and refinement of the M(2)-based secondary structure. The data further uncover the functional conformation as an excited state (20 ± 10% population) accessible via a single-nucleotide register shift. These results correct an erroneous SHAPE inference of a ribosomal conformational change, expose critical limitations of conventional structure mapping methods, and illustrate practical steps for more incisively dissecting RNA dynamic structure landscapes.
Collapse
Affiliation(s)
- Siqi Tian
- Department of Biochemistry, Stanford University, Stanford, California 94305, USA
| | - Pablo Cordero
- Biomedical Informatics Program, Stanford University, Stanford, California 94305, USA
| | - Wipapat Kladwang
- Department of Physics, Stanford University, Stanford, California 94305, USA
| | - Rhiju Das
- Department of Biochemistry, Stanford University, Stanford, California 94305, USA Biomedical Informatics Program, Stanford University, Stanford, California 94305, USA Department of Physics, Stanford University, Stanford, California 94305, USA
| |
Collapse
|
26
|
|
27
|
Abstract
Self-assembling RNA molecules present compelling substrates for the rational interrogation and control of living systems. However, imperfect in silico models--even at the secondary structure level--hinder the design of new RNAs that function properly when synthesized. Here, we present a unique and potentially general approach to such empirical problems: the Massive Open Laboratory. The EteRNA project connects 37,000 enthusiasts to RNA design puzzles through an online interface. Uniquely, EteRNA participants not only manipulate simulated molecules but also control a remote experimental pipeline for high-throughput RNA synthesis and structure mapping. We show herein that the EteRNA community leveraged dozens of cycles of continuous wet laboratory feedback to learn strategies for solving in vitro RNA design problems on which automated methods fail. The top strategies--including several previously unrecognized negative design rules--were distilled by machine learning into an algorithm, EteRNABot. Over a rigorous 1-y testing phase, both the EteRNA community and EteRNABot significantly outperformed prior algorithms in a dozen RNA secondary structure design tests, including the creation of dendrimer-like structures and scaffolds for small molecule sensors. These results show that an online community can carry out large-scale experiments, hypothesis generation, and algorithm design to create practical advances in empirical science.
Collapse
|
28
|
Abstract
Transcriptomics experiments and computational predictions both enable systematic discovery of new functional RNAs. However, many putative noncoding transcripts arise instead from artifacts and biological noise, and current computational prediction methods have high false positive rates. I discuss prospects for improving computational methods for analyzing and identifying functional RNAs, with a focus on detecting signatures of conserved RNA secondary structure. An interesting new front is the application of chemical and enzymatic experiments that probe RNA structure on a transcriptome-wide scale. I review several proposed approaches for incorporating structure probing data into the computational prediction of RNA secondary structure. Using probabilistic inference formalisms, I show how all these approaches can be unified in a well-principled framework, which in turn allows RNA probing data to be easily integrated into a wide range of analyses that depend on RNA secondary structure inference. Such analyses include homology search and genome-wide detection of new structural RNAs.
Collapse
Affiliation(s)
- Sean R Eddy
- Howard Hughes Medical Institute Janelia Farm Research Campus, Ashburn, Virginia 20147;
| |
Collapse
|
29
|
Li X, Kazan H, Lipshitz HD, Morris QD. Finding the target sites of RNA-binding proteins. WILEY INTERDISCIPLINARY REVIEWS-RNA 2013; 5:111-30. [PMID: 24217996 PMCID: PMC4253089 DOI: 10.1002/wrna.1201] [Citation(s) in RCA: 69] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/15/2013] [Revised: 09/27/2013] [Accepted: 10/01/2013] [Indexed: 12/15/2022]
Abstract
RNA–protein interactions differ from DNA–protein interactions because of the central role of RNA secondary structure. Some RNA-binding domains (RBDs) recognize their target sites mainly by their shape and geometry and others are sequence-specific but are sensitive to secondary structure context. A number of small- and large-scale experimental approaches have been developed to measure RNAs associated in vitro and in vivo with RNA-binding proteins (RBPs). Generalizing outside of the experimental conditions tested by these assays requires computational motif finding. Often RBP motif finding is done by adapting DNA motif finding methods; but modeling secondary structure context leads to better recovery of RBP-binding preferences. Genome-wide assessment of mRNA secondary structure has recently become possible, but these data must be combined with computational predictions of secondary structure before they add value in predicting in vivo binding. There are two main approaches to incorporating structural information into motif models: supplementing primary sequence motif models with preferred secondary structure contexts (e.g., MEMERIS and RNAcontext) and directly modeling secondary structure recognized by the RBP using stochastic context-free grammars (e.g., CMfinder and RNApromo). The former better reconstruct known binding preferences for sequence-specific RBPs but are not suitable for modeling RBPs that recognize shape and geometry of RNAs. Future work in RBP motif finding should incorporate interactions between multiple RBDs and multiple RBPs in binding to RNA. WIREs RNA 2014, 5:111–130. doi: 10.1002/wrna.1201
Collapse
Affiliation(s)
- Xiao Li
- Department of Molecular Genetics, University of Toronto, Toronto, Ontario, Canada
| | | | | | | |
Collapse
|
30
|
Evolutionary evidence for alternative structure in RNA sequence co-variation. PLoS Comput Biol 2013; 9:e1003152. [PMID: 23935473 PMCID: PMC3723493 DOI: 10.1371/journal.pcbi.1003152] [Citation(s) in RCA: 34] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2013] [Accepted: 06/05/2013] [Indexed: 02/06/2023] Open
Abstract
Sequence conservation and co-variation of base pairs are hallmarks of structured RNAs. For certain RNAs (e.g. riboswitches), a single sequence must adopt at least two alternative secondary structures to effectively regulate the message. If alternative secondary structures are important to the function of an RNA, we expect to observe evolutionary co-variation supporting multiple conformations. We set out to characterize the evolutionary co-variation supporting alternative conformations in riboswitches to determine the extent to which alternative secondary structures are conserved. We found strong co-variation support for the terminator, P1, and anti-terminator stems in the purine riboswitch by extending alignments to include terminator sequences. When we performed Boltzmann suboptimal sampling on purine riboswitch sequences with terminators we found that these sequences appear to have evolved to favor specific alternative conformations. We extended our analysis of co-variation to classic alignments of group I/II introns, tRNA, and other classes of riboswitches. In a majority of these RNAs, we found evolutionary evidence for alternative conformations that are compatible with the Boltzmann suboptimal ensemble. Our analyses suggest that alternative conformations are selected for and thus likely play functional roles in even the most structured of RNAs. RNA (Ribonucleic Acid) is a messenger of genetic information, master regulator, and catalyst in the cell. To carry out its function, RNA can fold into complex three-dimensional structures. Certain classes of RNAs, called riboswitches, adopt at least two alternative structures to act as a switch. We set out to detect the evolutionary signal for alternative structures in riboswitches as we hypothesize that these RNA sequences must have evolved to allow both conformations. We find that indeed such signals exist when we compare the sequences of riboswitches from multiple species. When we extend this analysis to other RNA regulators in the cell that are not thought of as switches, we detect equivalent evolutionary support for alternative structures. Viewed through the lens of evolutionary structure conservation RNA sequences appear to have adapted to adopt multiple conformations.
Collapse
|
31
|
Hamada M. Direct updating of an RNA base-pairing probability matrix with marginal probability constraints. J Comput Biol 2013; 19:1265-76. [PMID: 23210474 DOI: 10.1089/cmb.2012.0215] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022] Open
Abstract
A base-pairing probability matrix (BPPM) stores the probabilities for every possible base pair in an RNA sequence and has been used in many algorithms in RNA informatics (e.g., RNA secondary structure prediction and motif search). In this study, we propose a novel algorithm to perform iterative updates of a given BPPM, satisfying marginal probability constraints that are (approximately) given by recently developed biochemical experiments, such as SHAPE, PAR, and FragSeq. The method is easily implemented and is applicable to common models for RNA secondary structures, such as energy-based or machine-learning-based models. In this article, we focus mainly on the details of the algorithms, although preliminary computational experiments will also be presented.
Collapse
Affiliation(s)
- Michiaki Hamada
- The University of Tokyo, Graduate School of Frontier Science, Kashiwa, Japan.
| |
Collapse
|
32
|
Chen C, Mitra S, Jonikas M, Martin J, Brenowitz M, Laederach A. Understanding the role of three-dimensional topology in determining the folding intermediates of group I introns. Biophys J 2013; 104:1326-37. [PMID: 23528092 DOI: 10.1016/j.bpj.2013.02.007] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2012] [Revised: 01/28/2013] [Accepted: 02/07/2013] [Indexed: 11/30/2022] Open
Abstract
Many RNA molecules exert their biological function only after folding to unique three-dimensional structures. For long, noncoding RNA molecules, the complexity of finding the native topology can be a major impediment to correct folding to the biologically active structure. An RNA molecule may fold to a near-native structure but not be able to continue to the correct structure due to a topological barrier such as crossed strands or incorrectly stacked helices. Achieving the native conformation thus requires unfolding and refolding, resulting in a long-lived intermediate. We investigate the role of topology in the folding of two phylogenetically related catalytic group I introns, the Twort and Azoarcus group I ribozymes. The kinetic models describing the Mg(2+)-mediated folding of these ribozymes were previously determined by time-resolved hydroxyl (∙OH) radical footprinting. Two intermediates formed by parallel intermediates were resolved for each RNA. These data and analytical ultracentrifugation compaction analyses are used herein to constrain coarse-grained models of these folding intermediates as we investigate the role of nonnative topology in dictating the lifetime of the intermediates. Starting from an ensemble of unfolded conformations, we folded the RNA molecules by progressively adding native constraints to subdomains of the RNA defined by the ∙OH time-progress curves to simulate folding through the different kinetic pathways. We find that nonnative topologies (arrangement of helices) occur frequently in the folding simulations despite using only native constraints to drive the reaction, and that the initial conformation, rather than the folding pathway, is the major determinant of whether the RNA adopts nonnative topology during folding. From these analyses we conclude that biases in the initial conformation likely determine the relative flux through parallel RNA folding pathways.
Collapse
Affiliation(s)
- Chunxia Chen
- Department of Biology, University of North Carolina, Chapel Hill, NC, USA
| | | | | | | | | | | |
Collapse
|
33
|
Sükösd Z, Swenson MS, Kjems J, Heitsch CE. Evaluating the accuracy of SHAPE-directed RNA secondary structure predictions. Nucleic Acids Res 2013; 41:2807-16. [PMID: 23325843 PMCID: PMC3597644 DOI: 10.1093/nar/gks1283] [Citation(s) in RCA: 61] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023] Open
Abstract
Recent advances in RNA structure determination include using data from high-throughput probing experiments to improve thermodynamic prediction accuracy. We evaluate the extent and nature of improvements in data-directed predictions for a diverse set of 16S/18S ribosomal sequences using a stochastic model of experimental SHAPE data. The average accuracy for 1000 data-directed predictions always improves over the original minimum free energy (MFE) structure. However, the amount of improvement varies with the sequence, exhibiting a correlation with MFE accuracy. Further analysis of this correlation shows that accurate MFE base pairs are typically preserved in a data-directed prediction, whereas inaccurate ones are not. Thus, the positive predictive value of common base pairs is consistently higher than the directed prediction accuracy. Finally, we confirm sequence dependencies in the directability of thermodynamic predictions and investigate the potential for greater accuracy improvements in the worst performing test sequence.
Collapse
Affiliation(s)
- Zsuzsanna Sükösd
- Interdisciplinary Nanoscience Center, Aarhus University, Ny Munkegade 120, Aarhus C DK-8000, Denmark
| | | | | | | |
Collapse
|
34
|
Integrating chemical footprinting data into RNA secondary structure prediction. PLoS One 2012; 7:e45160. [PMID: 23091593 PMCID: PMC3473038 DOI: 10.1371/journal.pone.0045160] [Citation(s) in RCA: 73] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2012] [Accepted: 08/16/2012] [Indexed: 01/20/2023] Open
Abstract
Chemical and enzymatic footprinting experiments, such as shape (selective 2′-hydroxyl acylation analyzed by primer extension), yield important information about RNA secondary structure. Indeed, since the -hydroxyl is reactive at flexible (loop) regions, but unreactive at base-paired regions, shape yields quantitative data about which RNA nucleotides are base-paired. Recently, low error rates in secondary structure prediction have been reported for three RNAs of moderate size, by including base stacking pseudo-energy terms derived from shape data into the computation of minimum free energy secondary structure. Here, we describe a novel method, RNAsc (RNA soft constraints), which includes pseudo-energy terms for each nucleotide position, rather than only for base stacking positions. We prove that RNAsc is self-consistent, in the sense that the nucleotide-specific probabilities of being unpaired in the low energy Boltzmann ensemble always become more closely correlated with the input shape data after application of RNAsc. From this mathematical perspective, the secondary structure predicted by RNAsc should be ‘correct’, in as much as the shape data is ‘correct’. We benchmark RNAsc against the previously mentioned method for eight RNAs, for which both shape data and native structures are known, to find the same accuracy in 7 out of 8 cases, and an improvement of 25% in one case. Furthermore, we present what appears to be the first direct comparison of shape data and in-line probing data, by comparing yeast asp-tRNA shape data from the literature with data from in-line probing experiments we have recently performed. With respect to several criteria, we find that shape data appear to be more robust than in-line probing data, at least in the case of asp-tRNA.
Collapse
|
35
|
Ouyang Z, Snyder MP, Chang HY. SeqFold: genome-scale reconstruction of RNA secondary structure integrating high-throughput sequencing data. Genome Res 2012; 23:377-87. [PMID: 23064747 PMCID: PMC3561878 DOI: 10.1101/gr.138545.112] [Citation(s) in RCA: 85] [Impact Index Per Article: 7.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
Abstract
We present an integrative approach, SeqFold, that combines high-throughput RNA structure profiling data with computational prediction for genome-scale reconstruction of RNA secondary structures. SeqFold transforms experimental RNA structure information into a structure preference profile (SPP) and uses it to select stable RNA structure candidates representing the structure ensemble. Under a high-dimensional classification framework, SeqFold efficiently matches a given SPP to the most likely cluster of structures sampled from the Boltzmann-weighted ensemble. SeqFold is able to incorporate diverse types of RNA structure profiling data, including parallel analysis of RNA structure (PARS), selective 2′-hydroxyl acylation analyzed by primer extension sequencing (SHAPE-Seq), fragmentation sequencing (FragSeq) data generated by deep sequencing, and conventional SHAPE data. Using the known structures of a wide range of mRNAs and noncoding RNAs as benchmarks, we demonstrate that SeqFold outperforms or matches existing approaches in accuracy and is more robust to noise in experimental data. Application of SeqFold to reconstruct the secondary structures of the yeast transcriptome reveals the diverse impact of RNA secondary structure on gene regulation, including translation efficiency, transcription initiation, and protein-RNA interactions. SeqFold can be easily adapted to incorporate any new types of high-throughput RNA structure profiling data and is widely applicable to analyze RNA structures in any transcriptome.
Collapse
Affiliation(s)
- Zhengqing Ouyang
- Howard Hughes Medical Institute and Program in Epithelial Biology, Stanford University School of Medicine, Stanford, California 94305, USA.
| | | | | |
Collapse
|
36
|
Bida JP, Das R. Squaring theory with practice in RNA design. Curr Opin Struct Biol 2012; 22:457-66. [PMID: 22832174 DOI: 10.1016/j.sbi.2012.06.003] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2012] [Accepted: 06/20/2012] [Indexed: 11/26/2022]
Abstract
Ribonucleic acid (RNA) design offers unique opportunities for engineering genetic networks and nanostructures that self-assemble within living cells. Recent years have seen the creation of increasingly complex RNA devices, including proof-of-concept applications for in vivo three-dimensional scaffolding, imaging, computing, and control of biological behaviors. Expert intuition and simple design rules--the stability of double helices, the modularity of noncanonical RNA motifs, and geometric closure--have enabled these successful applications. Going beyond heuristics, emerging algorithms may enable automated design of RNAs with nucleotide-level accuracy but, as illustrated on a recent RNA square design, are not yet fully predictive. Looking ahead, technological advances in RNA synthesis and interrogation are poised to radically accelerate the discovery and stringent testing of design methods.
Collapse
Affiliation(s)
- J P Bida
- Department of Biochemistry, Stanford University, Stanford, CA 94305, USA
| | | |
Collapse
|
37
|
Bleckley S, Schroeder SJ. Incorporating global features of RNA motifs in predictions for an ensemble of secondary structures for encapsidated MS2 bacteriophage RNA. RNA (NEW YORK, N.Y.) 2012; 18:1309-1318. [PMID: 22645379 PMCID: PMC3383962 DOI: 10.1261/rna.032326.112] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/07/2012] [Accepted: 05/02/2012] [Indexed: 06/01/2023]
Abstract
The secondary structure of encapsidated MS2 genomic RNA poses an interesting RNA folding challenge. Cryoelectron microscopy has demonstrated that encapsidated MS2 RNA is well-ordered. Models of MS2 assembly suggest that the RNA hairpin-protein interactions and the appropriate placement of hairpins in the MS2 RNA secondary structure can guide the formation of the correct icosahedral particle. The RNA hairpin motif that is recognized by the MS2 capsid protein dimers, however, is energetically unfavorable, and thus free energy predictions are biased against this motif. Computer programs called Crumple, Sliding Windows, and Assembly provide useful tools for prediction of viral RNA secondary structures when the traditional assumptions of RNA structure prediction by free energy minimization may not apply. These methods allow incorporation of global features of the RNA fold and motifs that are difficult to include directly in minimum free energy predictions. For example, with MS2 RNA the experimental data from SELEX experiments, crystallography, and theoretical calculations of the path for the series of hairpins can be incorporated in the RNA structure prediction, and thus the influence of free energy considerations can be modulated. This approach thoroughly explores conformational space and generates an ensemble of secondary structures. The predictions from this new approach can test hypotheses and models of viral assembly and guide construction of complete three-dimensional models of virus particles.
Collapse
Affiliation(s)
- Samuel Bleckley
- Department of Chemistry and Biochemistry, Department of Microbiology and Plant Biology, University of Oklahoma, Norman, Oklahoma 73019, USA
| | - Susan J. Schroeder
- Department of Chemistry and Biochemistry, Department of Microbiology and Plant Biology, University of Oklahoma, Norman, Oklahoma 73019, USA
| |
Collapse
|
38
|
Ritz J, Martin JS, Laederach A. Evaluating our ability to predict the structural disruption of RNA by SNPs. BMC Genomics 2012; 13 Suppl 4:S6. [PMID: 22759654 PMCID: PMC3303743 DOI: 10.1186/1471-2164-13-s4-s6] [Citation(s) in RCA: 41] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023] Open
Abstract
The structure of RiboNucleic Acid (RNA) has the potential to be altered by a Single Nucleotide Polymorphism (SNP). Disease-associated SNPs mapping to non-coding regions of the genome that are transcribed into RiboNucleic Acid (RNA) can potentially affect cellular regulation (and cause disease) by altering the structure of the transcript. We performed a large-scale meta-analysis of Selective 2'-Hydroxyl Acylation analyzed by Primer Extension (SHAPE) data, which probes the structure of RNA. We found that several single point mutations exist that significantly disrupt RNA secondary structure in the five transcripts we analyzed. Thus, every RNA that is transcribed has the potential to be a “RiboSNitch;” where a SNP causes a large conformational change that alters regulatory function. Predicting the SNPs that will have the largest effect on RNA structure remains a contemporary computational challenge. We therefore benchmarked the most popular RNA structure prediction algorithms for their ability to identify mutations that maximally affect structure. We also evaluated metrics for rank ordering the extent of the structural change. Although no single algorithm/metric combination dramatically outperformed the others, small differences in AUC (Area Under the Curve) values reveal that certain approaches do provide better agreement with experiment. The experimental data we analyzed nonetheless show that multiple single point mutations exist in all RNA transcripts that significantly disrupt structure in agreement with the predictions.
Collapse
Affiliation(s)
- Justin Ritz
- Department of Biology, University of North Carolina, Chapel Hill, NC 27599, USA
| | | | | |
Collapse
|
39
|
Huang W, Kim J, Jha S, Aboul-Ela F. Conformational heterogeneity of the SAM-I riboswitch transcriptional ON state: a chaperone-like role for S-adenosyl methionine. J Mol Biol 2012; 418:331-49. [PMID: 22425639 DOI: 10.1016/j.jmb.2012.02.019] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2011] [Revised: 02/09/2012] [Accepted: 02/15/2012] [Indexed: 10/28/2022]
Abstract
Riboswitches are promising targets for the design of novel antibiotics and engineering of portable genetic regulatory elements. There is evidence that variability in riboswitch properties allows tuning of expression for genes involved in different stages of biosynthetic pathways by mechanisms that are not currently understood. Here, we explore the mechanism for tuning of S-adenosyl methionine (SAM)-I riboswitch folding. Most SAM-I riboswitches function at the transcriptional level by sensing the cognate ligand SAM. SAM-I riboswitches orchestrate the biosynthetic pathways of cysteine, methionine, SAM, and so forth. We use base-pair probability predictions to examine the secondary-structure folding landscape of several SAM-I riboswitch sequences. We predict different folding behaviors for different SAM-I riboswitch sequences. We identify several "decoy" base-pairing interactions involving 5' riboswitch residues that can compete with the formation of a P1 helix, a component of the ligand-bound "transcription OFF" state, in the absence of SAM. We hypothesize that blockage of these interactions through SAM contacts contributes to stabilization of the OFF state in the presence of ligand. We also probe folding patterns for a SAM-I riboswitch RNA using constructs with different 3' truncation points experimentally. Folding was monitored through fluorescence, susceptibility to base-catalyzed cleavage, nuclear magnetic resonance, and indirectly through SAM binding. We identify key decision windows at which SAM can affect the folding pathway towards the OFF state. The presence of decoy conformations and differential sensitivities to SAM at different transcript lengths is crucial for SAM-I riboswitches to modulate gene expression in the context of global cellular metabolism.
Collapse
Affiliation(s)
- Wei Huang
- Department of Biological Science, Louisiana State University, Baton Rouge, LA 70803, USA
| | | | | | | |
Collapse
|
40
|
Washietl S, Hofacker IL, Stadler PF, Kellis M. RNA folding with soft constraints: reconciliation of probing data and thermodynamic secondary structure prediction. Nucleic Acids Res 2012; 40:4261-72. [PMID: 22287623 PMCID: PMC3378861 DOI: 10.1093/nar/gks009] [Citation(s) in RCA: 90] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022] Open
Abstract
Thermodynamic folding algorithms and structure probing experiments are commonly used to determine the secondary structure of RNAs. Here we propose a formal framework to reconcile information from both prediction algorithms and probing experiments. The thermodynamic energy parameters are adjusted using ‘pseudo-energies’ to minimize the discrepancy between prediction and experiment. Our framework differs from related approaches that used pseudo-energies in several key aspects. (i) The energy model is only changed when necessary and no adjustments are made if prediction and experiment are consistent. (ii) Pseudo-energies remain biophysically interpretable and hold positional information where experiment and model disagree. (iii) The whole thermodynamic ensemble of structures is considered thus allowing to reconstruct mixtures of suboptimal structures from seemingly contradicting data. (iv) The noise of the energy model and the experimental data is explicitly modeled leading to an intuitive weighting factor through which the problem can be seen as folding with ‘soft’ constraints of different strength. We present an efficient algorithm to iteratively calculate pseudo-energies within this framework and demonstrate how this approach can be used in combination with SHAPE chemical probing data to improve secondary structure prediction. We further demonstrate that the pseudo-energies correlate with biophysical effects that are known to affect RNA folding such as chemical nucleotide modifications and protein binding.
Collapse
Affiliation(s)
- Stefan Washietl
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, 32 Vassar Street, Cambridge, MA 02139, USA.
| | | | | | | |
Collapse
|
41
|
Martin JS, Halvorsen M, Davis-Neulander L, Ritz J, Gopinath C, Beauregard A, Laederach A. Structural effects of linkage disequilibrium on the transcriptome. RNA (NEW YORK, N.Y.) 2012; 18:77-87. [PMID: 22109839 PMCID: PMC3261746 DOI: 10.1261/rna.029900.111] [Citation(s) in RCA: 36] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/14/2023]
Abstract
A majority of SNPs (single nucleotide polymorphisms) map to noncoding and intergenic regions of the genome. Noncoding SNPs are often identified in genome-wide association studies (GWAS) as strongly associated with human disease. Two such disease-associated SNPs in the 5' UTR of the human FTL (Ferritin Light Chain) gene are predicted to alter the ensemble of structures adopted by the mRNA. High-accuracy single nucleotide resolution chemical mapping reveals that these SNPs result in substantial changes in the structural ensemble in agreement with the computational prediction. Furthermore six rescue mutations are correctly predicted to restore the mRNA to its wild-type ensemble. Our data confirm that the FTL 5' UTR is a "RiboSNitch," an RNA that changes structure if a particular disease-associated SNP is present. The structural change observed is analogous to that of a bacterial Riboswitch in that it likely regulates translation. These data further suggest that specific pairs of SNPs in high linkage disequilibrium (LD) will form RNA structure-stabilizing haplotypes (SSHs). We identified 484 SNP pairs that form SSHs in UTRs of the human genome, and in eight of the 10 SSH-containing transcripts, SNP pairs stabilize RNA protein binding sites. The ubiquitous nature of SSHs in the transcriptome suggests that certain haplotypes are conserved to avoid RiboSNitch formation.
Collapse
Affiliation(s)
- Joshua S. Martin
- Department of Biology, University of North Carolina, Chapel Hill, North Carolina 27599, USA
| | - Matthew Halvorsen
- Department of Biology, University of North Carolina, Chapel Hill, North Carolina 27599, USA
| | - Lauren Davis-Neulander
- Developmental Genetics and Bioinformatics, Wadsworth Center, Albany, New York 12208, USA
| | - Justin Ritz
- Department of Biology, University of North Carolina, Chapel Hill, North Carolina 27599, USA
| | - Chetna Gopinath
- Biomedical Sciences Department, University at Albany, Albany, New York 12208, USA
| | - Arthur Beauregard
- Biomedical Sciences Department, University at Albany, Albany, New York 12208, USA
| | - Alain Laederach
- Department of Biology, University of North Carolina, Chapel Hill, North Carolina 27599, USA
- Corresponding author.E-mail .
| |
Collapse
|
42
|
Kladwang W, VanLang CC, Cordero P, Das R. A two-dimensional mutate-and-map strategy for non-coding RNA structure. Nat Chem 2011; 3:954-62. [PMID: 22109276 PMCID: PMC3725140 DOI: 10.1038/nchem.1176] [Citation(s) in RCA: 92] [Impact Index Per Article: 7.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2011] [Accepted: 09/15/2011] [Indexed: 12/24/2022]
Abstract
Non-coding RNAs fold into precise base-pairing patterns to carry out critical roles in genetic regulation and protein synthesis, but determining RNA structure remains difficult. Here, we show that coupling systematic mutagenesis with high-throughput chemical mapping enables accurate base-pair inference of domains from ribosomal RNA, ribozymes and riboswitches. For a six-RNA benchmark that has challenged previous chemical/computational methods, this 'mutate-and-map' strategy gives secondary structures that are in agreement with crystallography (helix error rates, 2%), including a blind test on a double-glycine riboswitch. Through modelling of partially ordered states, the method enables the first test of an interdomain helix-swap hypothesis for ligand-binding cooperativity in a glycine riboswitch. Finally, the data report on tertiary contacts within non-coding RNAs, and coupling to the Rosetta/FARFAR algorithm gives nucleotide-resolution three-dimensional models (helix root-mean-squared deviation, 5.7 Å) of an adenine riboswitch. These results establish a promising two-dimensional chemical strategy for inferring the secondary and tertiary structures that underlie non-coding RNA behaviour.
Collapse
Affiliation(s)
- Wipapat Kladwang
- Department of Biochemistry, Stanford University, Stanford, California 94305, USA
| | - Christopher C. VanLang
- Department of Chemical Engineering, Stanford University, Stanford, California 94305, USA
| | - Pablo Cordero
- Program in Biomedical Informatics, Stanford University, Stanford, California 94305, USA
| | - Rhiju Das
- Department of Biochemistry, Stanford University, Stanford, California 94305, USA
- Program in Biomedical Informatics, Stanford University, Stanford, California 94305, USA
- Department of Physics, Stanford University, Stanford, California 94305, USA
| |
Collapse
|
43
|
Schroeder SJ, Stone JW, Bleckley S, Gibbons T, Mathews DM. Ensemble of secondary structures for encapsidated satellite tobacco mosaic virus RNA consistent with chemical probing and crystallography constraints. Biophys J 2011; 101:167-75. [PMID: 21723827 DOI: 10.1016/j.bpj.2011.05.053] [Citation(s) in RCA: 37] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2011] [Revised: 05/16/2011] [Accepted: 05/19/2011] [Indexed: 11/15/2022] Open
Abstract
Viral genomic RNA adopts many conformations during its life cycle as the genome is replicated, translated, and encapsidated. The high-resolution crystallographic structure of the satellite tobacco mosaic virus (STMV) particle reveals 30 helices of well-ordered RNA. The crystallographic data provide global constraints on the possible secondary structures for the encapsidated RNA. Traditional free energy minimization methods of RNA secondary structure prediction do not generate structures consistent with the crystallographic data, and to date no complete STMV RNA basepaired secondary structure has been generated. RNA-protein interactions and tertiary interactions may contribute a significant degree of stability, and the kinetics of viral assembly may dominate the folding process. The computational tools, Helix Find & Combine, Crumple, and Sliding Windows and Assembly, evaluate and explore the possible secondary structures for encapsidated STMV RNA. All possible hairpins consistent with the experimental data and a cotranscriptional folding and assembly hypothesis were generated, and the combination of hairpins that was most consistent with experimental data is presented as the best representative structure of the ensemble. Multiple solutions to the genome packaging problem could be an evolutionary advantage for viruses. In such cases, an ensemble of structures that share favorable global features best represents the RNA fold.
Collapse
Affiliation(s)
- Susan J Schroeder
- Department of Chemistry and Biochemistry, University of Oklahoma, Norman, Oklahoma, USA.
| | | | | | | | | |
Collapse
|
44
|
Wan Y, Kertesz M, Spitale RC, Segal E, Chang HY. Understanding the transcriptome through RNA structure. Nat Rev Genet 2011; 12:641-55. [PMID: 21850044 DOI: 10.1038/nrg3049] [Citation(s) in RCA: 325] [Impact Index Per Article: 25.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
RNA structure is crucial for gene regulation and function. In the past, transcriptomes have largely been parsed by primary sequences and expression levels, but it is now becoming feasible to annotate and compare transcriptomes based on RNA structure. In addition to computational prediction methods, the recent advent of experimental techniques to probe RNA structure by high-throughput sequencing has enabled genome-wide measurements of RNA structure and has provided the first picture of the structural organization of a eukaryotic transcriptome - the 'RNA structurome'. With additional advances in method refinement and interpretation, structural views of the transcriptome should help to identify and validate regulatory RNA motifs that are involved in diverse cellular processes and thereby increase understanding of RNA function.
Collapse
Affiliation(s)
- Yue Wan
- Howard Hughes Medical Institute and Program in Epithelial Biology, Stanford University School of Medicine, Stanford, California 94305, USA
| | | | | | | | | |
Collapse
|
45
|
Rocca-Serra P, Bellaousov S, Birmingham A, Chen C, Cordero P, Das R, Davis-Neulander L, Duncan CD, Halvorsen M, Knight R, Leontis NB, Mathews DH, Ritz J, Stombaugh J, Weeks KM, Zirbel CL, Laederach A. Sharing and archiving nucleic acid structure mapping data. RNA (NEW YORK, N.Y.) 2011; 17:1204-12. [PMID: 21610212 PMCID: PMC3138558 DOI: 10.1261/rna.2753211] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
Nucleic acids are particularly amenable to structural characterization using chemical and enzymatic probes. Each individual structure mapping experiment reveals specific information about the structure and/or dynamics of the nucleic acid. Currently, there is no simple approach for making these data publically available in a standardized format. We therefore developed a standard for reporting the results of single nucleotide resolution nucleic acid structure mapping experiments, or SNRNASMs. We propose a schema for sharing nucleic acid chemical probing data that uses generic public servers for storing, retrieving, and searching the data. We have also developed a consistent nomenclature (ontology) within the Ontology of Biomedical Investigations (OBI), which provides unique identifiers (termed persistent URLs, or PURLs) for classifying the data. Links to standardized data sets shared using our proposed format along with a tutorial and links to templates can be found at http://snrnasm.bio.unc.edu.
Collapse
Affiliation(s)
| | - Stanislav Bellaousov
- Department of Biochemistry and Biophysics and Center for RNA Biology, University of Rochester, Rochester, New York 14642, USA
| | | | - Chunxia Chen
- Biology Department, University of North Carolina, Chapel Hill, North Carolina 27599-3290, USA
| | - Pablo Cordero
- Biochemistry Department, Stanford University, Stanford, California 94305, USA
| | - Rhiju Das
- Biochemistry Department, Stanford University, Stanford, California 94305, USA
| | - Lauren Davis-Neulander
- Biology Department, University of North Carolina, Chapel Hill, North Carolina 27599-3290, USA
| | - Caia D.S. Duncan
- Department of Chemistry, University of North Carolina, Chapel Hill, North Carolina 27599-3290, USA
| | - Matthew Halvorsen
- Biology Department, University of North Carolina, Chapel Hill, North Carolina 27599-3290, USA
| | - Rob Knight
- Department of Chemistry and Biochemistry, University of Colorado, Boulder, Colorado 80309, USA
- Howard Hughes Medical Institute, Boulder, Colorado 80309, USA
| | - Neocles B. Leontis
- Department of Chemistry, Bowling Green State University, Bowling Green, Ohio 43403, USA
| | - David H. Mathews
- Department of Biochemistry and Biophysics and Center for RNA Biology, University of Rochester, Rochester, New York 14642, USA
| | - Justin Ritz
- Biology Department, University of North Carolina, Chapel Hill, North Carolina 27599-3290, USA
| | - Jesse Stombaugh
- Department of Chemistry and Biochemistry, University of Colorado, Boulder, Colorado 80309, USA
| | - Kevin M. Weeks
- Department of Chemistry, University of North Carolina, Chapel Hill, North Carolina 27599-3290, USA
| | - Craig L. Zirbel
- Department of Mathematics and Statistics, Bowling Green State University, Bowling Green, Ohio 43403, USA
| | - Alain Laederach
- Biology Department, University of North Carolina, Chapel Hill, North Carolina 27599-3290, USA
- Corresponding author.E-mail .
| |
Collapse
|
46
|
Kladwang W, Cordero P, Das R. A mutate-and-map strategy accurately infers the base pairs of a 35-nucleotide model RNA. RNA (NEW YORK, N.Y.) 2011; 17:522-34. [PMID: 21239468 PMCID: PMC3039151 DOI: 10.1261/rna.2516311] [Citation(s) in RCA: 39] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/27/2010] [Accepted: 12/13/2010] [Indexed: 05/21/2023]
Abstract
We present a rapid experimental strategy for inferring base pairs in structured RNAs via an information-rich extension of classic chemical mapping approaches. The mutate-and-map method, previously applied to a DNA/RNA helix, systematically searches for single mutations that enhance the chemical accessibility of base-pairing partners distant in sequence. To test this strategy for structured RNAs, we have carried out mutate-and-map measurements for a 35-nt hairpin, called the MedLoop RNA, embedded within an 80-nt sequence. We demonstrate the synthesis of all 105 single mutants of the MedLoop RNA sequence and present high-throughput DMS, CMCT, and SHAPE modification measurements for this library at single-nucleotide resolution. The resulting two-dimensional data reveal visually clear, punctate features corresponding to RNA base pair interactions as well as more complex features; these signals can be qualitatively rationalized by comparison to secondary structure predictions. Finally, we present an automated, sequence-blind analysis that permits the confident identification of nine of the 10 MedLoop RNA base pairs at single-nucleotide resolution, while discriminating against all 1460 false-positive base pairs. These results establish the accuracy and information content of the mutate-and-map strategy and support its feasibility for rapidly characterizing the base-pairing patterns of larger and more complex RNA systems.
Collapse
Affiliation(s)
- Wipapat Kladwang
- Department of Biochemistry, Stanford University, Stanford, California 94305, USA
| | | | | |
Collapse
|
47
|
Halvorsen M, Martin JS, Broadaway S, Laederach A. Disease-associated mutations that alter the RNA structural ensemble. PLoS Genet 2010; 6:e1001074. [PMID: 20808897 PMCID: PMC2924325 DOI: 10.1371/journal.pgen.1001074] [Citation(s) in RCA: 244] [Impact Index Per Article: 17.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2010] [Accepted: 07/15/2010] [Indexed: 12/28/2022] Open
Abstract
Genome-wide association studies (GWAS) often identify disease-associated mutations in intergenic and non-coding regions of the genome. Given the high percentage of the human genome that is transcribed, we postulate that for some observed associations the disease phenotype is caused by a structural rearrangement in a regulatory region of the RNA transcript. To identify such mutations, we have performed a genome-wide analysis of all known disease-associated Single Nucleotide Polymorphisms (SNPs) from the Human Gene Mutation Database (HGMD) that map to the untranslated regions (UTRs) of a gene. Rather than using minimum free energy approaches (e.g. mFold), we use a partition function calculation that takes into consideration the ensemble of possible RNA conformations for a given sequence. We identified in the human genome disease-associated SNPs that significantly alter the global conformation of the UTR to which they map. For six disease-states (Hyperferritinemia Cataract Syndrome, β-Thalassemia, Cartilage-Hair Hypoplasia, Retinoblastoma, Chronic Obstructive Pulmonary Disease (COPD), and Hypertension), we identified multiple SNPs in UTRs that alter the mRNA structural ensemble of the associated genes. Using a Boltzmann sampling procedure for sub-optimal RNA structures, we are able to characterize and visualize the nature of the conformational changes induced by the disease-associated mutations in the structural ensemble. We observe in several cases (specifically the 5′ UTRs of FTL and RB1) SNP–induced conformational changes analogous to those observed in bacterial regulatory Riboswitches when specific ligands bind. We propose that the UTR and SNP combinations we identify constitute a “RiboSNitch,” that is a regulatory RNA in which a specific SNP has a structural consequence that results in a disease phenotype. Our SNPfold algorithm can help identify RiboSNitches by leveraging GWAS data and an analysis of the mRNA structural ensemble. Genome-wide association studies identify mutations in the human genome that correlate with a particular disease. It is common to find mutations associated with disease in the non-coding region of the genome. These non-coding mutations are more difficult to interpret at a molecular level, because they do not affect the protein sequence. In this study, we analyze disease-associated mutations in non-coding regions of our genome in the context of their structural effect on the message of genetic information in our cells, Ribonucleic Acid (RNA). We focus in particular on the regulatory parts of our genes known as untranslated regions. We find that certain disease-associated mutations in these regulatory untranslated regions have a significant effect on the structure of the RNA message. We call these elements “RiboSNitches,” because they act like switches turning on and off genes, but are caused by Single Nucleotide Polymorphisms (SNPs), which are single point mutations in our genome. The RiboSNitches we identify are potentially a new class of pharmaceutical targets, as it is possible to change the structure of RNA with small drug-like molecules.
Collapse
Affiliation(s)
- Matthew Halvorsen
- Biomedical Sciences Department, University at Albany, Albany, New York, United States of America
| | - Joshua S. Martin
- Developmental Genetics and Bioinformatics, Wadsworth Center, Albany, New York, United States of America
| | - Sam Broadaway
- Developmental Genetics and Bioinformatics, Wadsworth Center, Albany, New York, United States of America
| | - Alain Laederach
- Biomedical Sciences Department, University at Albany, Albany, New York, United States of America
- Developmental Genetics and Bioinformatics, Wadsworth Center, Albany, New York, United States of America
- * E-mail:
| |
Collapse
|