1
|
Gray M, Will S, Jabbari H. SparseRNAfolD: optimized sparse RNA pseudoknot-free folding with dangle consideration. Algorithms Mol Biol 2024; 19:9. [PMID: 38433200 PMCID: PMC11289965 DOI: 10.1186/s13015-024-00256-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2023] [Accepted: 02/13/2024] [Indexed: 03/05/2024] Open
Abstract
MOTIVATION Computational RNA secondary structure prediction by free energy minimization is indispensable for analyzing structural RNAs and their interactions. These methods find the structure with the minimum free energy (MFE) among exponentially many possible structures and have a restrictive time and space complexity ( O ( n 3 ) time and O ( n 2 ) space for pseudoknot-free structures) for longer RNA sequences. Furthermore, accurate free energy calculations, including dangle contributions can be difficult and costly to implement, particularly when optimizing for time and space requirements. RESULTS Here we introduce a fast and efficient sparsified MFE pseudoknot-free structure prediction algorithm, SparseRNAFolD, that utilizes an accurate energy model that accounts for dangle contributions. While the sparsification technique was previously employed to improve the time and space complexity of a pseudoknot-free structure prediction method with a realistic energy model, SparseMFEFold, it was not extended to include dangle contributions due to the complexity of computation. This may come at the cost of prediction accuracy. In this work, we compare three different sparsified implementations for dangle contributions and provide pros and cons of each method. As well, we compare our algorithm to LinearFold, a linear time and space algorithm, where we find that in practice, SparseRNAFolD has lower memory consumption across all lengths of sequence and a faster time for lengths up to 1000 bases. CONCLUSION Our SparseRNAFolD algorithm is an MFE-based algorithm that guarantees optimality of result and employs the most general energy model, including dangle contributions. We provide a basis for applying dangles to sparsified recursion in a pseudoknot-free model that has the potential to be extended to pseudoknots.
Collapse
Affiliation(s)
- Mateo Gray
- Department of Biomedical Engineering, University of Alberta, Street, Edmonton, T6G2R3, AB, Canada.
| | - Sebastian Will
- Department of Computer Science CNRS/LIX (UMR 7161), Institut Polytechnique de Paris, Street, Paris, 10587, France
| | - Hosna Jabbari
- Department of Biomedical Engineering, University of Alberta, Street, Edmonton, T6G2R3, AB, Canada.
| |
Collapse
|
2
|
Zuber J, Mathews DH. Estimating RNA Secondary Structure Folding Free Energy Changes with efn2. Methods Mol Biol 2024; 2726:1-13. [PMID: 38780725 DOI: 10.1007/978-1-0716-3519-3_1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/25/2024]
Abstract
A number of analyses require estimates of the folding free energy changes of specific RNA secondary structures. These predictions are often based on a set of nearest neighbor parameters that models the folding stability of a RNA secondary structure as the sum of folding stabilities of the structural elements that comprise the secondary structure. In the software suite RNAstructure, the free energy change calculation is implemented in the program efn2. The efn2 program estimates the folding free energy change and the experimental uncertainty in the folding free energy change. It can be run through the graphical user interface for RNAstructure, from the command line, or a web server. This chapter provides detailed protocols for using efn2.
Collapse
Affiliation(s)
- Jeffrey Zuber
- Department of Biochemistry & Biophysics and Center for RNA Biology, University of Rochester Medical Center, Rochester, NY, USA
| | - David H Mathews
- Department of Biochemistry & Biophysics and Center for RNA Biology, University of Rochester Medical Center, Rochester, NY, USA.
- Department of Biostatistics & Computational Biology, University of Rochester Medical Center, Rochester, NY, USA.
| |
Collapse
|
3
|
Pham TM, Miffin T, Sun H, Sharp KK, Wang X, Zhu M, Hoshika S, Peterson RJ, Benner SA, Kahn JD, Mathews DH. DNA Structure Design Is Improved Using an Artificially Expanded Alphabet of Base Pairs Including Loop and Mismatch Thermodynamic Parameters. ACS Synth Biol 2023; 12:2750-2763. [PMID: 37671922 PMCID: PMC10510751 DOI: 10.1021/acssynbio.3c00358] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2023] [Indexed: 09/07/2023]
Abstract
We show that in silico design of DNA secondary structures is improved by extending the base pairing alphabet beyond A-T and G-C to include the pair between 2-amino-8-(1'-β-d-2'-deoxyribofuranosyl)-imidazo-[1,2-a]-1,3,5-triazin-(8H)-4-one and 6-amino-3-(1'-β-d-2'-deoxyribofuranosyl)-5-nitro-(1H)-pyridin-2-one, abbreviated as P and Z. To obtain the thermodynamic parameters needed to include P-Z pairs in the designs, we performed 47 optical melting experiments and combined the results with previous work to fit free energy and enthalpy nearest neighbor folding parameters for P-Z pairs and G-Z wobble pairs. We find G-Z pairs have stability comparable to that of A-T pairs and should therefore be included as base pairs in structure prediction and design algorithms. Additionally, we extrapolated the set of loop, terminal mismatch, and dangling end parameters to include the P and Z nucleotides. These parameters were incorporated into the RNAstructure software package for secondary structure prediction and analysis. Using the RNAstructure Design program, we solved 99 of the 100 design problems posed by Eterna using the ACGT alphabet or supplementing it with P-Z pairs. Extending the alphabet reduced the propensity of sequences to fold into off-target structures, as evaluated by the normalized ensemble defect (NED). The NED values were improved relative to those from the Eterna example solutions in 91 of 99 cases in which Eterna-player solutions were provided. P-Z-containing designs had average NED values of 0.040, significantly below the 0.074 of standard-DNA-only designs, and inclusion of the P-Z pairs decreased the time needed to converge on a design. This work provides a sample pipeline for inclusion of any expanded alphabet nucleotides into prediction and design workflows.
Collapse
Affiliation(s)
- Tuan M. Pham
- Department
of Biochemistry & Biophysics and Center for RNA Biology, University of Rochester Medical Center, Rochester, New York 14642, United States
| | - Terrel Miffin
- Department
of Chemistry & Biochemistry, University
of Maryland, College
Park, Maryland 20742, United States
| | - Hongying Sun
- Department
of Surgery, University of Rochester Medical
Center, Rochester, New York 14642, United States
| | - Kenneth K. Sharp
- Department
of Chemistry & Biochemistry, University
of Maryland, College
Park, Maryland 20742, United States
| | - Xiaoyu Wang
- Department
of Chemistry & Biochemistry, University
of Maryland, College
Park, Maryland 20742, United States
| | - Mingyi Zhu
- Department
of Biochemistry & Biophysics and Center for RNA Biology, University of Rochester Medical Center, Rochester, New York 14642, United States
| | - Shuichi Hoshika
- Foundation
for Applied Molecular Evolution, Alachua, Florida 32615, United States
| | | | - Steven A. Benner
- Foundation
for Applied Molecular Evolution, Alachua, Florida 32615, United States
| | - Jason D. Kahn
- Department
of Chemistry & Biochemistry, University
of Maryland, College
Park, Maryland 20742, United States
| | - David H. Mathews
- Department
of Biochemistry & Biophysics and Center for RNA Biology, University of Rochester Medical Center, Rochester, New York 14642, United States
| |
Collapse
|
4
|
Pham TM, Miffin T, Sun H, Sharp KK, Wang X, Zhu M, Hoshika S, Peterson RJ, Benner SA, Kahn JD, Mathews DH. DNA Structure Design Is Improved Using an Artificially Expanded Alphabet of Base Pairs Including Loop and Mismatch Thermodynamic Parameters. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.06.06.543917. [PMID: 37333404 PMCID: PMC10274641 DOI: 10.1101/2023.06.06.543917] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/20/2023]
Abstract
We show that in silico design of DNA secondary structures is improved by extending the base pairing alphabet beyond A-T and G-C to include the pair between 2-amino-8-(1'-β-D-2'-deoxyribofuranosyl)-imidazo-[1,2- a ]-1,3,5-triazin-(8 H )-4-one and 6-amino-3-(1'-β-D-2'-deoxyribofuranosyl)-5-nitro-(1 H )-pyridin-2-one, simply P and Z. To obtain the thermodynamic parameters needed to include P-Z pairs in the designs, we performed 47 optical melting experiments and combined the results with previous work to fit a new set of free energy and enthalpy nearest neighbor folding parameters for P-Z pairs and G-Z wobble pairs. We find that G-Z pairs have stability comparable to A-T pairs and therefore should be considered quantitatively by structure prediction and design algorithms. Additionally, we extrapolated the set of loop, terminal mismatch, and dangling end parameters to include P and Z nucleotides. These parameters were incorporated into the RNAstructure software package for secondary structure prediction and analysis. Using the RNAstructure Design program, we solved 99 of the 100 design problems posed by Eterna using the ACGT alphabet or supplementing with P-Z pairs. Extending the alphabet reduced the propensity of sequences to fold into off-target structures, as evaluated by the normalized ensemble defect (NED). The NED values were improved relative to those from the Eterna example solutions in 91 of 99 cases where Eterna-player solutions were provided. P-Z-containing designs had average NED values of 0.040, significantly below the 0.074 of standard-DNA-only designs, and inclusion of the P-Z pairs decreased the time needed to converge on a design. This work provides a sample pipeline for inclusion of any expanded alphabet nucleotides into prediction and design workflows.
Collapse
Affiliation(s)
- Tuan M. Pham
- Department of Biochemistry & Biophysics and Center for RNA Biology, University of Rochester Medical Center, Rochester, NY
| | - Terrel Miffin
- Department of Chemistry & Biochemistry, University of Maryland, College Park, MD
| | - Hongying Sun
- Department of Surgery, University of Rochester Medical Center, Rochester, NY
| | - Kenneth K. Sharp
- Department of Chemistry & Biochemistry, University of Maryland, College Park, MD
| | - Xiaoyu Wang
- Department of Chemistry & Biochemistry, University of Maryland, College Park, MD
| | - Mingyi Zhu
- Department of Biochemistry & Biophysics and Center for RNA Biology, University of Rochester Medical Center, Rochester, NY
| | | | | | | | - Jason D. Kahn
- Department of Chemistry & Biochemistry, University of Maryland, College Park, MD
| | - David H. Mathews
- Department of Biochemistry & Biophysics and Center for RNA Biology, University of Rochester Medical Center, Rochester, NY
| |
Collapse
|
5
|
Szabat M, Prochota M, Kierzek R, Kierzek E, Mathews DH. A Test and Refinement of Folding Free Energy Nearest Neighbor Parameters for RNA Including N 6-Methyladenosine. J Mol Biol 2022; 434:167632. [PMID: 35588868 PMCID: PMC11235186 DOI: 10.1016/j.jmb.2022.167632] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2022] [Revised: 04/29/2022] [Accepted: 05/07/2022] [Indexed: 12/26/2022]
Abstract
RNA folding free energy change parameters are widely used to predict RNA secondary structure and to design RNA sequences. These parameters include terms for the folding free energies of helices and loops. Although the full set of parameters has only been traditionally available for the four common bases and backbone, it is well known that covalent modifications of nucleotides are widespread in natural RNAs. Covalent modifications are also widely used in engineered sequences. We recently derived a full set of nearest neighbor terms for RNA that includes N6-methyladenosine (m6A). In this work, we test the model using 98 optical melting experiments, matching duplexes with or without N6-methylation of A. Most experiments place RRACH, the consensus site of N6-methylation, in a variety of contexts, including helices, bulge loops, internal loops, dangling ends, and terminal mismatches. For matched sets of experiments that include either A or m6A in the same context, we find that the parameters for m6A are as accurate as those for A. Across all experiments, the root mean squared deviation between estimated and experimental free energy changes is 0.67 kcal/mol. We used the new experimental data to refine the set of nearest neighbor parameter terms for m6A. These parameters enable prediction of RNA secondary structures including m6A, which can be used to model how N6-methylation of A affects RNA structure.
Collapse
Affiliation(s)
- Marta Szabat
- Institute of Bioorganic Chemistry Polish Academy of Sciences, Noskowskiego 12/14, 61-704 Poznan, Poland
| | - Martina Prochota
- Institute of Bioorganic Chemistry Polish Academy of Sciences, Noskowskiego 12/14, 61-704 Poznan, Poland
| | - Ryszard Kierzek
- Institute of Bioorganic Chemistry Polish Academy of Sciences, Noskowskiego 12/14, 61-704 Poznan, Poland
| | - Elzbieta Kierzek
- Institute of Bioorganic Chemistry Polish Academy of Sciences, Noskowskiego 12/14, 61-704 Poznan, Poland.
| | - David H Mathews
- Department of Biochemistry & Biophysics and Center for RNA Biology, 601 Elmwood Avenue, Box 712, School of Medicine and Dentistry, University of Rochester, Rochester, NY 14642, United States.
| |
Collapse
|
6
|
Secondary structure prediction for RNA sequences including N 6-methyladenosine. Nat Commun 2022; 13:1271. [PMID: 35277476 PMCID: PMC8917230 DOI: 10.1038/s41467-022-28817-4] [Citation(s) in RCA: 27] [Impact Index Per Article: 13.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2021] [Accepted: 02/10/2022] [Indexed: 01/22/2023] Open
Abstract
There is increasing interest in the roles of covalently modified nucleotides in RNA. There has been, however, an inability to account for modifications in secondary structure prediction because of a lack of software and thermodynamic parameters. We report the solution for these issues for N6-methyladenosine (m6A), allowing secondary structure prediction for an alphabet of A, C, G, U, and m6A. The RNAstructure software now works with user-defined nucleotide alphabets of any size. We also report a set of nearest neighbor parameters for helices and loops containing m6A, using experiments. Interestingly, N6-methylation decreases folding stability for adenosines in the middle of a helix, has little effect on folding stability for adenosines at the ends of helices, and increases folding stability for unpaired adenosines stacked on a helix. We demonstrate predictions for an N6-methylation-activated protein recognition site from MALAT1 and human transcriptome-wide effects of N6-methylation on the probability of adenosine being buried in a helix. RNA folding free energy nearest neighbor parameters were determined for sequences with the nucleotide m6A. The RNAstructure software package can accommodate modified nucleotides, enabling secondary structure prediction of sequences with m6A.
Collapse
|
7
|
Yang TH. An Aggregation Method to Identify the RNA Meta-Stable Secondary Structure and its Functionally Interpretable Structure Ensemble. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:75-86. [PMID: 34014829 DOI: 10.1109/tcbb.2021.3082396] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
RNA can provide vital cellular functions through its secondary or tertiary structure. Due to the low-throughput nature of experimental approaches, studies on RNA structures mainly resort to computational methods. However, current existing tools fail to consider RNA structure ensembles and do not provide ways to decipher functional hypotheses for the new predictions. In this research, a novel method was proposed to identify the functionally interpretable structure ensemble of a given RNA sequence and provide the meta-stable structure, or the most frequently observed functional RNA cellular conformation, based on the ensemble. In the prediction of meta-stable structures, the proposed method outperformed existing tools on a yeast test set. The inferred functional aspects were then manually checked and demonstrated a micro-averaging F1 value of 0.92. Further, a biological example of the yeast ASH1-E1 element was discussed to articulate that these functional aspects can also suggest testable hypotheses. Then the proposed method was verified to be well applicable to other species through a human test set. Finally, the proposed method was demonstrated to show resistance to sequence length-dependent performance deterioration.
Collapse
|
8
|
Zhang H, Zhang L, Li S, Mathews DH, Huang L. LazySampling and LinearSampling: Fast Stochastic Sampling of RNA Secondary Structure with Applications to SARS-CoV-2. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2021:2020.12.29.424617. [PMID: 33398265 PMCID: PMC7781300 DOI: 10.1101/2020.12.29.424617] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]
Abstract
Many RNAs fold into multiple structures at equilibrium. The classical stochastic sampling algorithm can sample secondary structures according to their probabilities in the Boltzmann ensemble, and is widely used. However, this algorithm, consisting of a bottom-up partition function phase followed by a top-down sampling phase, suffers from three limitations: (a) the formulation and implementation of the sampling phase are unnecessarily complicated; (b) the sampling phase repeatedly recalculates many redundant recursions already done during the partition function phase; (c) the partition function runtime scales cubically with the sequence length. These issues prevent stochastic sampling from being used for very long RNAs such as the full genomes of SARS-CoV-2. To address these problems, we first adopt a hypergraph framework under which the sampling algorithm can be greatly simplified. We then present three sampling algorithms under this framework, among which the LazySampling algorithm is the fastest by eliminating redundant work in the sampling phase via on-demand caching. Based on LazySampling, we further replace the cubic-time partition function by a linear-time approximate one, and derive LinearSampling, an end-to-end linear-time sampling algorithm that is orders of magnitude faster than the standard one. For instance, LinearSampling is 176Ã- faster (38.9s vs. 1.9h) than Vienna RNAsubopt on the full genome of Ebola virus (18,959 nt ). More importantly, LinearSampling is the first RNA structure sampling algorithm to scale up to the full-genome of SARS-CoV-2 without local window constraints, taking only 69.2 seconds on its reference sequence (29,903 nt ). The resulting sample correlates well with the experimentally-guided structures. On the SARS-CoV-2 genome, LinearSampling finds 23 regions of 15 nt with high accessibilities, which are potential targets for COVID-19 diagnostics and drug design. See code: https://github.com/LinearFold/LinearSampling.
Collapse
|
9
|
Cheng Y, Zhang S, Xu X, Chen SJ. Vfold2D-MC: A Physics-Based Hybrid Model for Predicting RNA Secondary Structure Folding. J Phys Chem B 2021; 125:10108-10118. [PMID: 34473508 DOI: 10.1021/acs.jpcb.1c04731] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Accurate prediction of RNA structure and folding stability has a far-reaching impact on our understanding of RNA functions. Here we develop Vfold2D-MC, a new physics-based model, to predict RNA structure and folding thermodynamics from the sequence. The model employs virtual bond-based coarse-graining of RNA backbone conformation and generates RNA conformations through Monte Carlo sampling of the bond angles and torsional angles of the virtual bonds. Using a coarse-grained statistical potential derived from the known structures, we assign each conformation with a statistical weight. The weighted average over the conformational ensemble gives the entropy and free energy parameters for the hairpin, bulge, and internal loops, and multiway junctions. From the thermodynamic parameters, we predict RNA structures, melting curves, and structural changes from the sequence. Theory-experiment comparisons indicate that Vfold2D-MC not only gives improved structure predictions but also enables the interpretation of thermodynamic results for different RNA structures, including multibranched junctions. This new model sets a promising framework to treat more complicated RNA structures, such as pseudoknotted and intramolecular kissing loops, for which experimental thermodynamic parameters are often unavailable.
Collapse
Affiliation(s)
- Yi Cheng
- Department of Physics, Department of Biochemistry, and Institute for Data Science and Informatics, University of Missouri, Columbia, Missouri 65211, United States
| | - Sicheng Zhang
- Department of Physics, Department of Biochemistry, and Institute for Data Science and Informatics, University of Missouri, Columbia, Missouri 65211, United States
| | - Xiaojun Xu
- Institute of Bioinformatics and Medical Engineering, Jiangsu University of Technology, Changzhou, Jiangsu 213001, China
| | - Shi-Jie Chen
- Department of Physics, Department of Biochemistry, and Institute for Data Science and Informatics, University of Missouri, Columbia, Missouri 65211, United States
| |
Collapse
|
10
|
Sakuraba S, Iwakiri J, Hamada M, Kameda T, Tsuji G, Kimura Y, Abe H, Asai K. Free-Energy Calculation of Ribonucleic Inosines and Its Application to Nearest-Neighbor Parameters. J Chem Theory Comput 2020; 16:5923-5935. [PMID: 32786906 DOI: 10.1021/acs.jctc.0c00270] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
Can current simulations quantitatively predict the stability of ribonucleic acids (RNAs)? In this research, we apply a free-energy perturbation simulation of RNAs containing inosine, a modified ribonucleic base, to the derivation of RNA nearest-neighbor parameters. A parameter set derived solely from 30 simulations was used to predict the free-energy difference of the RNA duplex with a mean unbiased error of 0.70 kcal/mol, which is a level of accuracy comparable to that obtained with parameters derived from 25 experiments. We further show that the error can be lowered to 0.60 kcal/mol by combining the simulation-derived free-energy differences with experimentally measured differences. This protocol can be used as a versatile method for deriving nearest-neighbor parameters of RNAs with various modified bases.
Collapse
Affiliation(s)
- Shun Sakuraba
- Institute for Quantum Life Science, National Institutes for Quantum and Radiological Science and Technology, Kyoto 619-0215, Japan.,Graduate School of Frontier Sciences, The University of Tokyo, Kashiwa, Chiba 277-8561, Japan
| | - Junichi Iwakiri
- Graduate School of Frontier Sciences, The University of Tokyo, Kashiwa, Chiba 277-8561, Japan
| | - Michiaki Hamada
- Faculty of Science and Engineering, Waseda University, Shinjuku-ku, Tokyo 169-8555, Japan.,Computational Bio Big-Data Open Innovation Laboratory (CBBD-OIL), National Institute of Advanced Industrial Science and Technology (AIST), Shinjuku-ku, Tokyo 169-8555, Japan
| | - Tomoshi Kameda
- Artificial Intelligence Research Center (AIRC), National Institute of Advanced Industrial Science and Technology (AIST), Tokyo 135-0064, Japan
| | - Genichiro Tsuji
- Department of Chemistry, Graduate School of Science, Nagoya University, Furo, Chikusa, Nagoya 464-8602, Japan.,Division of Organic Chemistry, National Institute of Health Sciences, 3-25-26 Tonomachi, Kawasaki-ku, Kawasaki, Kanagawa 210-9501, Japan
| | - Yasuaki Kimura
- Department of Chemistry, Graduate School of Science, Nagoya University, Furo, Chikusa, Nagoya 464-8602, Japan
| | - Hiroshi Abe
- Department of Chemistry, Graduate School of Science, Nagoya University, Furo, Chikusa, Nagoya 464-8602, Japan
| | - Kiyoshi Asai
- Graduate School of Frontier Sciences, The University of Tokyo, Kashiwa, Chiba 277-8561, Japan.,Artificial Intelligence Research Center (AIRC), National Institute of Advanced Industrial Science and Technology (AIST), Tokyo 135-0064, Japan
| |
Collapse
|
11
|
Zhang H, Zhang L, Mathews DH, Huang L. LinearPartition: linear-time approximation of RNA folding partition function and base-pairing probabilities. Bioinformatics 2020; 36:i258-i267. [PMID: 32657379 PMCID: PMC7355276 DOI: 10.1093/bioinformatics/btaa460] [Citation(s) in RCA: 32] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023] Open
Abstract
MOTIVATION RNA secondary structure prediction is widely used to understand RNA function. Recently, there has been a shift away from the classical minimum free energy methods to partition function-based methods that account for folding ensembles and can therefore estimate structure and base pair probabilities. However, the classical partition function algorithm scales cubically with sequence length, and is therefore prohibitively slow for long sequences. This slowness is even more severe than cubic-time free energy minimization due to a substantially larger constant factor in runtime. RESULTS Inspired by the success of our recent LinearFold algorithm that predicts the approximate minimum free energy structure in linear time, we design a similar linear-time heuristic algorithm, LinearPartition, to approximate the partition function and base-pairing probabilities, which is shown to be orders of magnitude faster than Vienna RNAfold and CONTRAfold (e.g. 2.5 days versus 1.3 min on a sequence with length 32 753 nt). More interestingly, the resulting base-pairing probabilities are even better correlated with the ground-truth structures. LinearPartition also leads to a small accuracy improvement when used for downstream structure prediction on families with the longest length sequences (16S and 23S rRNAs), as well as a substantial improvement on long-distance base pairs (500+ nt apart). AVAILABILITY AND IMPLEMENTATION Code: http://github.com/LinearFold/LinearPartition; Server: http://linearfold.org/partition. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- He Zhang
- Baidu Research, Sunnyvale, CA 94089, USA
- School of Electrical Engineering and Computer Science, Oregon State University, Corvallis, OR 97330, USA
| | - Liang Zhang
- School of Electrical Engineering and Computer Science, Oregon State University, Corvallis, OR 97330, USA
| | - David H Mathews
- Department of Biochemistry & Biophysics, University of Rochester Medical Center, Rochester, NY 48306, USA
- Center for RNA Biology, University of Rochester Medical Center, Rochester, NY 48306, USA
- Department of Biostatistics & Computational Biology, University of Rochester Medical Center, Rochester, NY 48306, USA
| | - Liang Huang
- Baidu Research, Sunnyvale, CA 94089, USA
- School of Electrical Engineering and Computer Science, Oregon State University, Corvallis, OR 97330, USA
| |
Collapse
|
12
|
Ward M, Sun H, Datta A, Wise M, Mathews DH. Determining parameters for non-linear models of multi-loop free energy change. Bioinformatics 2020; 35:4298-4306. [PMID: 30923811 DOI: 10.1093/bioinformatics/btz222] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2018] [Revised: 02/10/2019] [Accepted: 03/27/2019] [Indexed: 12/12/2022] Open
Abstract
MOTIVATION Predicting the secondary structure of RNA is a fundamental task in bioinformatics. Algorithms that predict secondary structure given only the primary sequence, and a model to evaluate the quality of a structure, are an integral part of this. These algorithms have been updated as our model of RNA thermodynamics changed and expanded. An exception to this has been the treatment of multi-loops. Although more advanced models of multi-loop free energy change have been suggested, a simple, linear model has been used since the 1980s. However, recently, new dynamic programing algorithms for secondary structure prediction that could incorporate these models were presented. Unfortunately, these models appear to have lower accuracy for secondary structure prediction. RESULTS We apply linear regression and a new parameter optimization algorithm to find better parameters for the existing linear model and advanced non-linear multi-loop models. These include the Jacobson-Stockmayer and Aalberts & Nandagopal models. We find that the current linear model parameters may be near optimal for the linear model, and that no advanced model performs better than the existing linear model parameters even after parameter optimization. AVAILABILITY AND IMPLEMENTATION Source code and data is available at https://github.com/maxhwardg/advanced_multiloops. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Max Ward
- Computer Science & Software Engineering, The University of Western Australia, Crawley, WA, Australia
| | - Hongying Sun
- Department of Biochemistry & Biophysics, University of Rochester, Rochester, NY, USA.,Center for RNA Biology, University of Rochester, Rochester, NY, USA
| | - Amitava Datta
- Computer Science & Software Engineering, The University of Western Australia, Crawley, WA, Australia
| | - Michael Wise
- Computer Science & Software Engineering, The University of Western Australia, Crawley, WA, Australia.,The Marshall Centre for Infectious Diseases Research and Training, The University of Western Australia, Crawley, WA, Australia
| | - David H Mathews
- Department of Biostatistics & Computational Biology, University of Rochester, Rochester, NY, USA
| |
Collapse
|
13
|
Shi S, Zhang XL, Yang L, Du W, Zhao XL, Wang YJ. Prediction of RNA Secondary Structure Using Quantum-inspired Genetic Algorithms. Curr Bioinform 2020. [DOI: 10.2174/1574893614666190916154103] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Background:
The prediction of RNA secondary structure using optimization algorithms
is key to understand the real structure of an RNA. Evolutionary algorithms (EAs) are popular
strategies for RNA secondary structure prediction. However, compared to most state-of-the-art
software based on DPAs, the performances of EAs are a bit far from satisfactory.
Objective:
Therefore, a more powerful strategy is required to improve the performances of EAs
when applied to the prediciton of RNA secondary structures.
Methods:
The idea of quantum computing is introduced here yielding a new strategy to find all
possible legal paired-bases with the constraint of minimum free energy. The sate of a stem pool
with size N is encoded as a population of QGA, which is represented by N quantum bits but not
classical bits. The updating of populations is accomplished by so-called quantum crossover
operations, quantum mutation operations and quantum rotation operations.
Results:
The numerical results show that the performances of traditional EAs are significantly
improved by using QGA with regard to not only prediction accuracy and sensitivity but also
complexity. Moreover, for RNA sequences with middle-short length, QGA even improves the
state-of-art software based on DPAs in terms of both prediction accuracy and sensitivity.
Conclusion:
This work sheds an interesting light on the applications of quantum computing on
RNA structure prediction.
Collapse
Affiliation(s)
- Sha Shi
- Engineering Research Centre of Molecular and Neuro Imaging Ministry of Education, School of life Science and Technology, Xidian University, Xi’an, China
| | - Xin-Li Zhang
- Xinxiang Medical University, Xinxiang, Henan, China
| | - Le Yang
- The First Affiliated Hospical of Xi’an Jiaotong University, Xi’an, China
| | - Wei Du
- The First Affiliated Hospical of Zhengzhou University, Zhengzhou, China
| | - Xian-Li Zhao
- Northwestern Women and Children’s Hospital, Xi'an, China
| | - Yun-Jiang Wang
- State Key Laboratory of Integrated Services Networks, Xidian University, Xi’an, China
| |
Collapse
|
14
|
Ghosh S, Takahashi S, Endoh T, Tateishi-Karimata H, Hazra S, Sugimoto N. Validation of the nearest-neighbor model for Watson-Crick self-complementary DNA duplexes in molecular crowding condition. Nucleic Acids Res 2019; 47:3284-3294. [PMID: 30753582 PMCID: PMC6468326 DOI: 10.1093/nar/gkz071] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2018] [Revised: 01/21/2019] [Accepted: 01/29/2019] [Indexed: 01/03/2023] Open
Abstract
Recent advancement in nucleic acid techniques inside cells demands the knowledge of the stability of nucleic acid structures in molecular crowding. The nearest-neighbor model has been successfully used to predict thermodynamic parameters for the formation of nucleic acid duplexes, with significant accuracy in a dilute solution. However, knowledge about the applicability of the model in molecular crowding is still limited. To determine and predict the stabilities of DNA duplexes in a cell-like crowded environment, we systematically investigated the validity of the nearest-neighbor model for Watson–Crick self-complementary DNA duplexes in molecular crowding. The thermodynamic parameters for the duplex formation were measured in the presence of 40 wt% poly(ethylene glycol)200 for different self-complementary DNA oligonucleotides consisting of identical nearest-neighbors in a physiological buffer containing 0.1 M NaCl. The thermodynamic parameters as well as the melting temperatures (Tm) obtained from the UV melting studies revealed similar values for the oligonucleotides having identical nearest-neighbors, suggesting the validity of the nearest-neighbor model in the crowding condition. Linear relationships between the measured ΔG°37 and Tm in crowding condition and those predicted in dilute solutions allowed us to predict ΔG°37, Tm and nearest-neighbor parameters in molecular crowding using existing parameters in the dilute condition, which provides useful information about the thermostability of the self-complementary DNA duplexes in molecular crowding.
Collapse
Affiliation(s)
- Saptarshi Ghosh
- Frontier Institute for Biomolecular Engineering Research (FIBER), Konan University, 7-1-20 Minatojima-Minamimachi, Chuo-ku, Kobe, 650-0047, Japan
| | - Shuntaro Takahashi
- Frontier Institute for Biomolecular Engineering Research (FIBER), Konan University, 7-1-20 Minatojima-Minamimachi, Chuo-ku, Kobe, 650-0047, Japan
| | - Tamaki Endoh
- Frontier Institute for Biomolecular Engineering Research (FIBER), Konan University, 7-1-20 Minatojima-Minamimachi, Chuo-ku, Kobe, 650-0047, Japan
| | - Hisae Tateishi-Karimata
- Frontier Institute for Biomolecular Engineering Research (FIBER), Konan University, 7-1-20 Minatojima-Minamimachi, Chuo-ku, Kobe, 650-0047, Japan
| | - Soumitra Hazra
- Frontier Institute for Biomolecular Engineering Research (FIBER), Konan University, 7-1-20 Minatojima-Minamimachi, Chuo-ku, Kobe, 650-0047, Japan
| | - Naoki Sugimoto
- Frontier Institute for Biomolecular Engineering Research (FIBER), Konan University, 7-1-20 Minatojima-Minamimachi, Chuo-ku, Kobe, 650-0047, Japan.,Graduate School of Frontiers of Innovative Research in Science and Technology (FIRST), Konan University, 7-1-20 Minatojima-Minamimachi, Chuo-ku, Kobe, 650-0047, Japan
| |
Collapse
|
15
|
Nishida S, Sakuraba S, Asai K, Hamada M. Estimating Energy Parameters for RNA Secondary Structure Predictions Using Both Experimental and Computational Data. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2019; 16:1645-1655. [PMID: 29994069 DOI: 10.1109/tcbb.2018.2813388] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Computational RNA secondary structure prediction depends on a large number of nearest-neighbor free-energy parameters, including 10 parameters for Watson-Crick stacked base pairs that were estimated from experimental measurements of the free energies of 90 RNA duplexes. These experimental data are provided by time-consuming and cost-intensive experiments. In contrast, various modified nucleotides in RNAs, which would affect not only their structures but also functions, have been found, and rapid determination of energy parameters for a such modified nucleotides is needed. To reduce the high cost of determining energy parameters, we propose a novel method to estimate energy parameters from both experimental and computational data, where the computational data are provided by a recently developed molecular dynamics simulation protocol. We evaluate our method for Watson-Crick stacked base pairs, and show that parameters estimated from 10 experimental data items and 10 computational data items can predict RNA secondary structures with accuracy comparable to that using conventional parameters. The results indicate that the combination of experimental free-energy measurements and molecular dynamics simulations is capable of estimating the thermodynamic properties of RNA secondary structures at lower cost.
Collapse
|
16
|
Shi S, Zhang XL, Zhao XL, Yang L, Du W, Wang YJ. Prediction of the RNA Secondary Structure Using a Multi-Population Assisted Quantum Genetic Algorithm. Hum Hered 2019; 84:1-8. [PMID: 31461710 DOI: 10.1159/000501480] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2019] [Accepted: 06/13/2019] [Indexed: 12/15/2022] Open
Abstract
Quantum-inspired genetic algorithms (QGAs) were recently introduced for the prediction of RNA secondary structures, and they showed some superiority over the existing popular strategies. In this paper, for RNA secondary structure prediction, we introduce a new QGA named multi-population assisted quantum genetic algorithm (MAQGA). In contrast to the existing QGAs, our strategy involves multi-populations which evolve together in a cooperative way in each iteration, and the genetic exchange between various populations is performed by an operator transfer operation. The numerical results show that the performances of existing genetic algorithms (evolutionary algorithms [EAs]), including traditional EAs and QGAs, can be significantly improved by using our approach. Moreover, for RNA sequences with middle-short length, the MAQGA improves even this state-of-the-art software in terms of both prediction accuracy and sensitivity.
Collapse
Affiliation(s)
- Sha Shi
- Engineering Research Center of Molecular and Neuroimaging, Ministry of Education of China, and School of Life Science and Technology, Xidian University, Xi'an, China
| | | | - Xian-Li Zhao
- Northwestern Women and Children's Hospital, Xi'an, China
| | - Le Yang
- The First Affiliated Hospital of Xi'an Jiaotong University, Xi'an Jiaotong University, Xi'an, China
| | - Wei Du
- The First Affiliated Hospital of Zhengzhou University, Zhengzhou, China
| | - Yun-Jiang Wang
- The State Key Laboratory of Integrated Services Network (ISN), Xidian University, Xi'an, China,
| |
Collapse
|
17
|
Zuber J, Mathews DH. Estimating uncertainty in predicted folding free energy changes of RNA secondary structures. RNA (NEW YORK, N.Y.) 2019; 25:747-754. [PMID: 30952689 PMCID: PMC6521603 DOI: 10.1261/rna.069203.118] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/08/2018] [Accepted: 04/02/2019] [Indexed: 06/09/2023]
Abstract
Nearest neighbor parameters for estimating the folding stability of RNA are commonly used in secondary structure prediction, for generating folding ensembles of structures, and for analyzing RNA function. Previously, we demonstrated that we could quantify the uncertainties in each nearest neighbor parameter by perturbing the underlying optical melting data within experimental error and rederiving the parameters, which accounts for the substantial correlations that exist between the parameters. In this contribution, we describe a method to estimate uncertainty in the estimated folding stabilities of RNA structures, accounting for correlations in the nearest neighbor parameters. This method is incorporated in the RNA structure software package.
Collapse
Affiliation(s)
- Jeffrey Zuber
- Department of Biochemistry and Biophysics and Center for RNA Biology, University of Rochester Medical Center, Rochester, New York, 14642, USA
| | - David H Mathews
- Department of Biochemistry and Biophysics and Center for RNA Biology, University of Rochester Medical Center, Rochester, New York, 14642, USA
- Department of Biostatistics and Computational Biology, University of Rochester Medical Center, Rochester, New York, 14642, USA
| |
Collapse
|
18
|
Mathews DH. How to benchmark RNA secondary structure prediction accuracy. Methods 2019; 162-163:60-67. [PMID: 30951834 DOI: 10.1016/j.ymeth.2019.04.003] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2018] [Revised: 03/24/2019] [Accepted: 04/01/2019] [Indexed: 11/18/2022] Open
Abstract
RNA secondary structure prediction is widely used. As new methods are developed, these are often benchmarked for accuracy against existing methods. This review discusses good practices for performing these benchmarks, including the choice of benchmarking structures, metrics to quantify accuracy, the importance of allowing flexibility for pairs in the accepted structure, and the importance of statistical testing for significance.
Collapse
Affiliation(s)
- David H Mathews
- Center for RNA Biology, Department of Biochemistry & Biophysics, and Department of Biostatistics & Computational Biology, University of Rochester Medical Center, 601 Elmwood Avenue, Box 712, Rochester, NY 14642, United States.
| |
Collapse
|
19
|
Jiang G, Chen K, Sun J. Accurate prediction of secondary structure of tRNAs. Biochem Biophys Res Commun 2019; 509:64-68. [DOI: 10.1016/j.bbrc.2018.12.042] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2018] [Accepted: 12/05/2018] [Indexed: 11/28/2022]
|
20
|
Schroeder SJ. Challenges and approaches to predicting RNA with multiple functional structures. RNA (NEW YORK, N.Y.) 2018; 24:1615-1624. [PMID: 30143552 PMCID: PMC6239171 DOI: 10.1261/rna.067827.118] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
The revolution in sequencing technology demands new tools to interpret the genetic code. As in vivo transcriptome-wide chemical probing techniques advance, new challenges emerge in the RNA folding problem. The emphasis on one sequence folding into a single minimum free energy structure is fading as a new focus develops on generating RNA structural ensembles and identifying functional structural features in ensembles. This review describes an efficient combinatorially complete method and three free energy minimization approaches to predicting RNA structures with more than one functional fold, as well as two methods for analysis of a thermodynamics-based Boltzmann ensemble of structures. The review then highlights two examples of viral RNA 3'-UTR regions that fold into more than one conformation and have been characterized by single molecule fluorescence energy resonance transfer or NMR spectroscopy. These examples highlight the different approaches and challenges in predicting structure and function from sequence for RNA with multiple biological roles and folds. More well-defined examples and new metrics for measuring differences in RNA structures will guide future improvements in prediction of RNA structure and function from sequence.
Collapse
Affiliation(s)
- Susan J Schroeder
- Department of Chemistry and Biochemistry, Department of Microbiology and Plant Biology, University of Oklahoma, Norman, Oklahoma 73019, USA
| |
Collapse
|
21
|
Smith LG, Tan Z, Spasic A, Dutta D, Salas-Estrada LA, Grossfield A, Mathews DH. Chemically Accurate Relative Folding Stability of RNA Hairpins from Molecular Simulations. J Chem Theory Comput 2018; 14:6598-6612. [PMID: 30375860 DOI: 10.1021/acs.jctc.8b00633] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/21/2023]
Abstract
To benchmark RNA force fields, we compared the folding stabilities of three 12-nucleotide hairpin stem loops estimated by simulation to stabilities determined by experiment. We used umbrella sampling and a reaction coordinate of end-to-end (5' to 3' hydroxyl oxygen) distance to estimate the free energy change of the transition from the native conformation to a fully extended conformation with no hydrogen bonds between non-neighboring bases. Each simulation was performed four times using the AMBER FF99+bsc0+χOL3 force field, and each window, spaced at 1 Å intervals, was sampled for 1 μs, for a total of 552 μs of simulation. We compared differences in the simulated free energy changes to analogous differences in free energies from optical melting experiments using thermodynamic cycles where the free energy change between stretched and random coil sequences is assumed to be sequence-independent. The differences between experimental and simulated ΔΔ G° are, on average, 0.98 ± 0.66 kcal/mol, which is chemically accurate and suggests that analogous simulations could be used predictively. We also report a novel method to identify where replica free energies diverge along a reaction coordinate, thus indicating where additional sampling would most improve convergence. We conclude by discussing methods to more economically perform these simulations.
Collapse
Affiliation(s)
- Louis G Smith
- Department of Biochemistry & Biophysics , University of Rochester , Rochester , New York 14642 , United States.,Center for RNA Biology , University of Rochester , Rochester , New York 14642 , United States
| | - Zhen Tan
- Department of Biochemistry & Biophysics , University of Rochester , Rochester , New York 14642 , United States.,Center for RNA Biology , University of Rochester , Rochester , New York 14642 , United States
| | - Aleksandar Spasic
- Department of Biochemistry & Biophysics , University of Rochester , Rochester , New York 14642 , United States.,Center for RNA Biology , University of Rochester , Rochester , New York 14642 , United States
| | - Debapratim Dutta
- Department of Biochemistry & Biophysics , University of Rochester , Rochester , New York 14642 , United States.,Center for RNA Biology , University of Rochester , Rochester , New York 14642 , United States
| | - Leslie A Salas-Estrada
- Department of Biochemistry & Biophysics , University of Rochester , Rochester , New York 14642 , United States
| | - Alan Grossfield
- Department of Biochemistry & Biophysics , University of Rochester , Rochester , New York 14642 , United States
| | - David H Mathews
- Department of Biochemistry & Biophysics , University of Rochester , Rochester , New York 14642 , United States.,Department of Biostatistics and Computational Biology , University of Rochester , Rochester , New York 14642 , United States.,Center for RNA Biology , University of Rochester , Rochester , New York 14642 , United States
| |
Collapse
|
22
|
Zuber J, Cabral BJ, McFadyen I, Mauger DM, Mathews DH. Analysis of RNA nearest neighbor parameters reveals interdependencies and quantifies the uncertainty in RNA secondary structure prediction. RNA (NEW YORK, N.Y.) 2018; 24:1568-1582. [PMID: 30104207 PMCID: PMC6191722 DOI: 10.1261/rna.065102.117] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/28/2017] [Accepted: 08/07/2018] [Indexed: 05/08/2023]
Abstract
RNA secondary structure prediction is often used to develop hypotheses about structure-function relationships for newly discovered RNA sequences, to identify unknown functional RNAs, and to design sequences. Secondary structure prediction methods typically use a thermodynamic model that estimates the free energy change of possible structures based on a set of nearest neighbor parameters. These parameters were derived from optical melting experiments of small model oligonucleotides. This work aims to better understand the precision of structure prediction. Here, the experimental errors in optical melting experiments were propagated to errors in the derived nearest neighbor parameter values and then to errors in RNA secondary structure prediction. To perform this analysis, the optical melting experimental values were systematically perturbed within the estimates of experimental error and alternative sets of nearest neighbor parameters were then derived from these error-bounded values. Secondary structure predictions using either the perturbed or reference parameter sets were then compared. This work demonstrated that the precision of RNA secondary structure prediction is more robust than suggested by previous work based on perturbation of the nearest neighbor parameters. This robustness is due to correlations between parameters. Additionally, this work identified weaknesses in the parameter derivation that makes accurate assessment of parameter uncertainty difficult. Considerations for experimental design are provided to mitigate these weaknesses are provided.
Collapse
Affiliation(s)
- Jeffrey Zuber
- Department of Biochemistry and Biophysics and Center for RNA Biology, University of Rochester Medical Center, Rochester, New York 14642, USA
| | - B Joseph Cabral
- Computational Sciences, Moderna Therapeutics, Cambridge, Massachusetts 02141, USA
| | - Iain McFadyen
- Computational Sciences, Moderna Therapeutics, Cambridge, Massachusetts 02141, USA
| | - David M Mauger
- Computational Sciences, Moderna Therapeutics, Cambridge, Massachusetts 02141, USA
| | - David H Mathews
- Department of Biochemistry and Biophysics and Center for RNA Biology, University of Rochester Medical Center, Rochester, New York 14642, USA
- Department of Biostatistics and Computational Biology, University of Rochester Medical Center, Rochester, New York 14642, USA
| |
Collapse
|
23
|
Ledda M, Aviran S. PATTERNA: transcriptome-wide search for functional RNA elements via structural data signatures. Genome Biol 2018; 19:28. [PMID: 29495968 PMCID: PMC5833111 DOI: 10.1186/s13059-018-1399-z] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2017] [Accepted: 01/30/2018] [Indexed: 02/08/2023] Open
Abstract
Establishing a link between RNA structure and function remains a great challenge in RNA biology. The emergence of high-throughput structure profiling experiments is revolutionizing our ability to decipher structure, yet principled approaches for extracting information on structural elements directly from these data sets are lacking. We present PATTERNA, an unsupervised pattern recognition algorithm that rapidly mines RNA structure motifs from profiling data. We demonstrate that PATTERNA detects motifs with an accuracy comparable to commonly used thermodynamic models and highlight its utility in automating data-directed structure modeling from large data sets. PATTERNA is versatile and compatible with diverse profiling techniques and experimental conditions.
Collapse
Affiliation(s)
- Mirko Ledda
- Department of Biomedical Engineering and Genome Center, UC Davis, 1 Shields Ave, Davis, 95616 USA
- Integrative Genetics and Genomics Graduate Group, UC Davis, 1 Shields Ave, Davis, 95616 USA
| | - Sharon Aviran
- Department of Biomedical Engineering and Genome Center, UC Davis, 1 Shields Ave, Davis, 95616 USA
| |
Collapse
|
24
|
Rogers E, Murrugarra D, Heitsch C. Conditioning and Robustness of RNA Boltzmann Sampling under Thermodynamic Parameter Perturbations. Biophys J 2017. [PMID: 28629618 DOI: 10.1016/j.bpj.2017.05.026] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/23/2023] Open
Abstract
Understanding how RNA secondary structure prediction methods depend on the underlying nearest-neighbor thermodynamic model remains a fundamental challenge in the field. Minimum free energy (MFE) predictions are known to be "ill conditioned" in that small changes to the thermodynamic model can result in significantly different optimal structures. Hence, the best practice is now to sample from the Boltzmann distribution, which generates a set of suboptimal structures. Although the structural signal of this Boltzmann sample is known to be robust to stochastic noise, the conditioning and robustness under thermodynamic perturbations have yet to be addressed. We present here a mathematically rigorous model for conditioning inspired by numerical analysis, and also a biologically inspired definition for robustness under thermodynamic perturbation. We demonstrate the strong correlation between conditioning and robustness and use its tight relationship to define quantitative thresholds for well versus ill conditioning. These resulting thresholds demonstrate that the majority of the sequences are at least sample robust, which verifies the assumption of sampling's improved conditioning over the MFE prediction. Furthermore, because we find no correlation between conditioning and MFE accuracy, the presence of both well- and ill-conditioned sequences indicates the continued need for both thermodynamic model refinements and alternate RNA structure prediction methods beyond the physics-based ones.
Collapse
Affiliation(s)
- Emily Rogers
- School of Computational Science and Engineering, Georgia Institute of Technology, Atlanta, Georgia
| | - David Murrugarra
- Department of Mathematics, University of Kentucky, Lexington, Kentucky
| | - Christine Heitsch
- School of Mathematics, Georgia Institute of Technology, Atlanta, Georgia.
| |
Collapse
|