1
|
Malik A, Zhang L, Gautam M, Dai N, Li S, Zhang H, Mathews DH, Huang L. LinearAlifold: Linear-Time Consensus Structure Prediction for RNA Alignments. J Mol Biol 2024:168694. [PMID: 38971557 DOI: 10.1016/j.jmb.2024.168694] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2024] [Revised: 06/28/2024] [Accepted: 07/01/2024] [Indexed: 07/08/2024]
Abstract
Predicting the consensus structure of a set of aligned RNA homologs is a convenient method to find conserved structures in an RNA genome, which has many applications including viral diagnostics and therapeutics. However, the most commonly used tool for this task, RNAalifold, is prohibitively slow for long sequences, due to a cubic scaling with the sequence length, taking over a day on 400 SARS-CoV-2 and SARS-related genomes (∼30,000nt). We present LinearAlifold, a much faster alternative that scales linearly with both the sequence length and the number of sequences, based on our work LinearFold that folds a single RNA in linear time. Our work is orders of magnitude faster than RNAalifold (0.7 hours on the above 400 genomes, or ∼36× speedup) and achieves higher accuracies when compared to a database of known structures. More interestingly, LinearAlifold's prediction on SARS-CoV-2 correlates well with experimentally determined structures, substantially outperforming RNAalifold. Finally, LinearAlifold supports two energy models (Vienna and BL*) and four modes: minimum free energy (MFE), maximum expected accuracy (MEA), ThreshKnot, and stochastic sampling, each of which takes under an hour for hundreds of SARS-CoV variants. Our resource is at: https://github.com/LinearFold/LinearAlifold (code) and http://linearfold.org/linear-alifold (server).
Collapse
Affiliation(s)
- Apoorv Malik
- School of EECS , Oregon State University, Corvallis, OR 97330, USA
| | - Liang Zhang
- School of EECS , Oregon State University, Corvallis, OR 97330, USA
| | - Milan Gautam
- School of EECS , Oregon State University, Corvallis, OR 97330, USA
| | - Ning Dai
- School of EECS , Oregon State University, Corvallis, OR 97330, USA
| | - Sizhen Li
- School of EECS , Oregon State University, Corvallis, OR 97330, USA
| | - He Zhang
- School of EECS , Oregon State University, Corvallis, OR 97330, USA
| | - David H Mathews
- Dept. of Biochemistry & Biophysics, University of Rochester Medical Center, Rochester, NY 14642, USA; Center for RNA Biology, University of Rochester Medical Center, Rochester, NY 14642, USA; Dept. of Biostatistics & Computational Biology, University of Rochester Medical Center, Rochester, NY 14642, USA
| | - Liang Huang
- School of EECS , Oregon State University, Corvallis, OR 97330, USA; Dept. of Biochemistry & Biophysics, Oregon State University, Corvallis, OR 97330, USA.
| |
Collapse
|
2
|
Newman T, Chang HFK, Jabbari H. DinoKnot: Duplex Interaction of Nucleic Acids With PseudoKnots. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2024; 21:348-359. [PMID: 38345958 DOI: 10.1109/tcbb.2024.3362308] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2024]
Abstract
Interaction of nucleic acid molecules is essential for their functional roles in the cell and their applications in biotechnology. While simple duplex interactions have been studied before, the problem of efficiently predicting the minimum free energy structure of more complex interactions with possibly pseudoknotted structures remains a challenge. In this work, we introduce a novel and efficient algorithm for prediction of Duplex Interaction of Nucleic acids with pseudoKnots, DinoKnot follows the hierarchical folding hypothesis to predict the secondary structure of two interacting nucleic acid strands (both homo- and hetero-dimers). DinoKnot utilizes the structure of molecules before interaction as a guide to find their duplex structure allowing for possible base pair competitions. To showcase DinoKnots's capabilities we evaluated its predicted structures against (1) experimental results for SARS-CoV-2 genome and nine primer-probe sets, (2) a clinically verified example of a mutation affecting detection, and (3) a known nucleic acid interaction involving a pseudoknot. In addition, we compared our results against our closest competition, RNAcofold, further highlighting DinoKnot's strengths. We believe DinoKnot can be utilized for various applications including screening new variants for potential detection issues and supporting existing applications involving DNA/RNA interactions, adding structural considerations to the interaction to elicit functional information.
Collapse
|
3
|
Szikszai M, Magnus M, Sanghi S, Kadyan S, Bouatta N, Rivas E. RNA3DB: A structurally-dissimilar dataset split for training and benchmarking deep learning models for RNA structure prediction. J Mol Biol 2024:168552. [PMID: 38552946 DOI: 10.1016/j.jmb.2024.168552] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2024] [Revised: 03/19/2024] [Accepted: 03/22/2024] [Indexed: 04/09/2024]
Abstract
With advances in protein structure prediction thanks to deep learning models like AlphaFold, RNA structure prediction has recently received increased attention from deep learning researchers. RNAs introduce substantial challenges due to the sparser availability and lower structural diversity of the experimentally resolved RNA structures in comparison to protein structures. These challenges are often poorly addressed by the existing literature, many of which report inflated performance due to using training and testing sets with significant structural overlap. Further, the most recent Critical Assessment of Structure Prediction (CASP15) has shown that deep learning models for RNA structure are currently outperformed by traditional methods. In this paper we present RNA3DB, a dataset of structured RNAs, derived from the Protein Data Bank (PDB), that is designed for training and benchmarking deep learning models. The RNA3DB method arranges the RNA 3D chains into distinct groups (Components) that are non-redundant both with regard to sequence as well as structure, providing a robust way of dividing training, validation, and testing sets. Any split of these structurally-dissimilar Components are guaranteed to produce test and validations sets that are distinct by sequence and structure from those in the training set. We provide the RNA3DB dataset, a particular train/test split of the RNA3DB Components (in an approximate 70/30 ratio) that will be updated periodically. We also provide the RNA3DB methodology along with the source-code, with the goal of creating a reproducible and customizable tool for producing structurally-dissimilar dataset splits for structural RNAs.
Collapse
Affiliation(s)
- Marcell Szikszai
- Department of Molecular and Cellular Biology, Harvard University, Cambridge, 02138, MA, USA
| | - Marcin Magnus
- Department of Molecular and Cellular Biology, Harvard University, Cambridge, 02138, MA, USA
| | - Siddhant Sanghi
- Department of Systems Biology, Columbia University, New York 10027, NY, USA; College of Biological Sciences, UC Davis, Davis 95616, CA, USA
| | - Sachin Kadyan
- Department of Systems Biology, Columbia University, New York 10027, NY, USA
| | - Nazim Bouatta
- Laboratory of Systems Pharmacology, Harvard Medical School, Boston 02115, MA, USA
| | - Elena Rivas
- Department of Molecular and Cellular Biology, Harvard University, Cambridge, 02138, MA, USA
| |
Collapse
|
4
|
Mittal A, Turner DH, Mathews DH. NNDB: An Expanded Database of Nearest Neighbor Parameters for Predicting Stability of Nucleic Acid Secondary Structures. J Mol Biol 2024:168549. [PMID: 38522645 DOI: 10.1016/j.jmb.2024.168549] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/01/2024] [Revised: 03/18/2024] [Accepted: 03/19/2024] [Indexed: 03/26/2024]
Abstract
Nearest neighbor thermodynamic parameters are widely used for RNA and DNA secondary structure prediction and to model thermodynamic ensembles of secondary structures. The Nearest Neighbor Database (NNDB) is a freely available web resource (https://rna.urmc.rochester.edu/NNDB) that provides the functional forms, parameter values, and example calculations. The NNDB provides the 1999 and 2004 set of RNA folding nearest neighbor parameters. We expanded the database to include a set of DNA parameters and a set of RNA parameters that includes m6A in addition to the canonical RNA nucleobases. The site was redesigned using the Quarto open-source publishing system. A downloadable PDF version of the complete resource and downloadable sets of nearest neighbor parameters are available.
Collapse
Affiliation(s)
- Abhinav Mittal
- Department of Biochemistry & Biophysics, University of Rochester Medical Center, Rochester, NY 14642, USA; Center for RNA Biology, University of Rochester Medical Center, Rochester, NY 14642, USA
| | - Douglas H Turner
- Center for RNA Biology, University of Rochester Medical Center, Rochester, NY 14642, USA; Department of Chemistry, University of Rochester, Rochester, NY 14627, USA
| | - David H Mathews
- Department of Biochemistry & Biophysics, University of Rochester Medical Center, Rochester, NY 14642, USA; Center for RNA Biology, University of Rochester Medical Center, Rochester, NY 14642, USA.
| |
Collapse
|
5
|
Zuber J, Mathews DH. Estimating RNA Secondary Structure Folding Free Energy Changes with efn2. Methods Mol Biol 2024; 2726:1-13. [PMID: 38780725 DOI: 10.1007/978-1-0716-3519-3_1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/25/2024]
Abstract
A number of analyses require estimates of the folding free energy changes of specific RNA secondary structures. These predictions are often based on a set of nearest neighbor parameters that models the folding stability of a RNA secondary structure as the sum of folding stabilities of the structural elements that comprise the secondary structure. In the software suite RNAstructure, the free energy change calculation is implemented in the program efn2. The efn2 program estimates the folding free energy change and the experimental uncertainty in the folding free energy change. It can be run through the graphical user interface for RNAstructure, from the command line, or a web server. This chapter provides detailed protocols for using efn2.
Collapse
Affiliation(s)
- Jeffrey Zuber
- Department of Biochemistry & Biophysics and Center for RNA Biology, University of Rochester Medical Center, Rochester, NY, USA
| | - David H Mathews
- Department of Biochemistry & Biophysics and Center for RNA Biology, University of Rochester Medical Center, Rochester, NY, USA.
- Department of Biostatistics & Computational Biology, University of Rochester Medical Center, Rochester, NY, USA.
| |
Collapse
|
6
|
Sato K, Hamada M. Recent trends in RNA informatics: a review of machine learning and deep learning for RNA secondary structure prediction and RNA drug discovery. Brief Bioinform 2023; 24:bbad186. [PMID: 37232359 PMCID: PMC10359090 DOI: 10.1093/bib/bbad186] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2023] [Revised: 04/24/2023] [Accepted: 04/25/2023] [Indexed: 05/27/2023] Open
Abstract
Computational analysis of RNA sequences constitutes a crucial step in the field of RNA biology. As in other domains of the life sciences, the incorporation of artificial intelligence and machine learning techniques into RNA sequence analysis has gained significant traction in recent years. Historically, thermodynamics-based methods were widely employed for the prediction of RNA secondary structures; however, machine learning-based approaches have demonstrated remarkable advancements in recent years, enabling more accurate predictions. Consequently, the precision of sequence analysis pertaining to RNA secondary structures, such as RNA-protein interactions, has also been enhanced, making a substantial contribution to the field of RNA biology. Additionally, artificial intelligence and machine learning are also introducing technical innovations in the analysis of RNA-small molecule interactions for RNA-targeted drug discovery and in the design of RNA aptamers, where RNA serves as its own ligand. This review will highlight recent trends in the prediction of RNA secondary structure, RNA aptamers and RNA drug discovery using machine learning, deep learning and related technologies, and will also discuss potential future avenues in the field of RNA informatics.
Collapse
Affiliation(s)
- Kengo Sato
- School of System Design and Technology, Tokyo Denki University, 5 Senju Asahi-cho, Adachi-ku, Tokyo 120-8551, Japan
| | - Michiaki Hamada
- Department of Electrical Engineering and Bioscience, Faculty of Science and Engineering, Waseda University, 55N-06-10, 3-4-1, Okubo, Shinjuku-ku, Tokyo 169-8555, Japan
- Computational Bio Big-Data Open Innovation Laboratory (CBBD-OIL) , National Institute of Advanced Industrial Science and Technology (AIST), 3-4-1, Okubo, Shinjuku-ku, Tokyo 169-8555, Japan
- Graduate School of Medicine, Nippon Medical School, 1-1-5, Sendagi, Bunkyo-ku, Tokyo 113-8602, Japan
| |
Collapse
|
7
|
Hollar A, Bursey H, Jabbari H. Pseudoknots in RNA Structure Prediction. Curr Protoc 2023; 3:e661. [PMID: 36779804 DOI: 10.1002/cpz1.661] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/14/2023]
Abstract
RNA molecules play active roles in the cell and are important for numerous applications in biotechnology and medicine. The function of an RNA molecule stems from its structure. RNA structure determination is time consuming, challenging, and expensive using experimental methods. Thus, much research has been directed at RNA structure prediction through computational means. Many of these methods focus primarily on the secondary structure of the molecule, ignoring the possibility of pseudoknotted structures. However, pseudoknots are known to play functional roles in many RNA molecules or in their method of interaction with other molecules. Improving the accuracy and efficiency of computational methods that predict pseudoknots is an ongoing challenge for single RNA molecules, RNA-RNA interactions, and RNA-protein interactions. To improve the accuracy of prediction, many methods focus on specific applications while restricting the length and the class of the pseudoknotted structures they can identify. In recent years, computational methods for structure prediction have begun to catch up with the impressive developments seen in biotechnology. Here, we provide a non-comprehensive overview of available pseudoknot prediction methods and their best-use cases. © 2023 Wiley Periodicals LLC.
Collapse
Affiliation(s)
- Andrew Hollar
- Department of Computer Science, University of Victoria, Victoria, Canada
| | - Hunter Bursey
- Department of Computer Science, University of Victoria, Victoria, Canada
| | - Hosna Jabbari
- Department of Computer Science, University of Victoria, Victoria, Canada
| |
Collapse
|
8
|
Fast RNA-RNA Interaction Prediction Methods for Interaction Analysis of Transcriptome-Scale Large Datasets. Methods Mol Biol 2023; 2586:163-173. [PMID: 36705904 DOI: 10.1007/978-1-0716-2768-6_10] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]
Abstract
The computational prediction of RNA-RNA interactions has long been studied in RNA informatics. Most of the existing approaches focused on the interaction prediction of short RNAs in small datasets. However, in recent years, two fast prediction methods, RIsearch2 and RIblast, have been developed to predict transcriptome-scale interactions or long RNA interactions. The key idea of the software acceleration of these tools was the integration of a seed-and-extend method, which is used in fast sequence alignment tools, into RNA-RNA interaction prediction. As a result, the two software programs were ten to a thousand times faster than the existing tools; because of this acceleration, detection of genome-wide microRNA target sites or interaction partners of function-unknown long noncoding RNAs has become possible. In this review, we describe the basic concept of the algorithm, its applications, and the future perspectives of the fast RNA-RNA interaction prediction tools.
Collapse
|
9
|
Genome-Wide RNA Secondary Structure Prediction. Methods Mol Biol 2023; 2586:35-48. [PMID: 36705897 DOI: 10.1007/978-1-0716-2768-6_3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]
Abstract
The information of RNA secondary structure has been widely applied to the inference of RNA function. However, a classical prediction method is not feasible to long RNAs such as mRNA due to the problems of computational time and numerical errors. To overcome those problems, sliding window methods have been applied while their results are not directly comparable to global RNA structure prediction. In this chapter, we introduce ParasoR, a method designed for parallel computation of genome-wide RNA secondary structures. To enable genome-wide prediction, ParasoR distributes dynamic programming (DP) matrices required for structure prediction to multiple computational nodes. Using the database of not the original DP variable but the ratio of variables, ParasoR can locally compute the structure scores such as stem probability or accessibility on demand. A comprehensive analysis of local secondary structures by ParasoR is expected to be a promising way to detect the statistical constraints on long RNAs.
Collapse
|
10
|
RNA Secondary Structure Prediction Based on Energy Models. Methods Mol Biol 2023; 2586:89-105. [PMID: 36705900 DOI: 10.1007/978-1-0716-2768-6_6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]
Abstract
This chapter introduces the RNA secondary structure prediction based on the nearest neighbor energy model, which is one of the most popular architectures of modeling RNA secondary structure without pseudoknots. We discuss the parameterization and the parameter determination by experimental and machine learning-based approaches as well as an integrated approach that compensates each other's shortcomings. Then, folding algorithms for the minimum free energy and the maximum expected accuracy using the dynamic programming technique are introduced. Finally, we compare the prediction accuracy of the method described so far with benchmark datasets.
Collapse
|
11
|
Paloncýová M, Pykal M, Kührová P, Banáš P, Šponer J, Otyepka M. Computer Aided Development of Nucleic Acid Applications in Nanotechnologies. SMALL (WEINHEIM AN DER BERGSTRASSE, GERMANY) 2022; 18:e2204408. [PMID: 36216589 DOI: 10.1002/smll.202204408] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/23/2022] [Revised: 09/12/2022] [Indexed: 06/16/2023]
Abstract
Utilization of nucleic acids (NAs) in nanotechnologies and nanotechnology-related applications is a growing field with broad application potential, ranging from biosensing up to targeted cell delivery. Computer simulations are useful techniques that can aid design and speed up development in this field. This review focuses on computer simulations of hybrid nanomaterials composed of NAs and other components. Current state-of-the-art molecular dynamics simulations, empirical force fields (FFs), and coarse-grained approaches for the description of deoxyribonucleic acid and ribonucleic acid are critically discussed. Challenges in combining biomacromolecular and nanomaterial FFs are emphasized. Recent applications of simulations for modeling NAs and their interactions with nano- and biomaterials are overviewed in the fields of sensing applications, targeted delivery, and NA templated materials. Future perspectives of development are also highlighted.
Collapse
Affiliation(s)
- Markéta Paloncýová
- Regional Center of Advanced Technologies and Materials, The Czech Advanced Technology and Research Institute (CATRIN), Palacký University Olomouc, Šlechtitelů 27, Olomouc, 779 00, Czech Republic
| | - Martin Pykal
- Regional Center of Advanced Technologies and Materials, The Czech Advanced Technology and Research Institute (CATRIN), Palacký University Olomouc, Šlechtitelů 27, Olomouc, 779 00, Czech Republic
| | - Petra Kührová
- Regional Center of Advanced Technologies and Materials, The Czech Advanced Technology and Research Institute (CATRIN), Palacký University Olomouc, Šlechtitelů 27, Olomouc, 779 00, Czech Republic
| | - Pavel Banáš
- Regional Center of Advanced Technologies and Materials, The Czech Advanced Technology and Research Institute (CATRIN), Palacký University Olomouc, Šlechtitelů 27, Olomouc, 779 00, Czech Republic
| | - Jiří Šponer
- Regional Center of Advanced Technologies and Materials, The Czech Advanced Technology and Research Institute (CATRIN), Palacký University Olomouc, Šlechtitelů 27, Olomouc, 779 00, Czech Republic
- Institute of Biophysics of the Czech Academy of Sciences, v. v. i., Královopolská 135, Brno, 612 65, Czech Republic
| | - Michal Otyepka
- Regional Center of Advanced Technologies and Materials, The Czech Advanced Technology and Research Institute (CATRIN), Palacký University Olomouc, Šlechtitelů 27, Olomouc, 779 00, Czech Republic
- IT4Innovations, VŠB - Technical University of Ostrava, 17. listopadu 2172/15, Ostrava-Poruba, 708 00, Czech Republic
| |
Collapse
|
12
|
Fukunaga T, Hamada M. LinAliFold and CentroidLinAliFold: fast RNA consensus secondary structure prediction for aligned sequences using beam search methods. BIOINFORMATICS ADVANCES 2022; 2:vbac078. [PMID: 36699418 PMCID: PMC9710674 DOI: 10.1093/bioadv/vbac078] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/28/2022] [Revised: 10/13/2022] [Accepted: 10/21/2022] [Indexed: 11/05/2022]
Abstract
Motivation RNA consensus secondary structure prediction from aligned sequences is a powerful approach for improving the secondary structure prediction accuracy. However, because the computational complexities of conventional prediction tools scale with the cube of the alignment lengths, their application to long RNA sequences, such as viral RNAs or long non-coding RNAs, requires significant computational time. Results In this study, we developed LinAliFold and CentroidLinAliFold, fast RNA consensus secondary structure prediction tools based on minimum free energy and maximum expected accuracy principles, respectively. We achieved software acceleration using beam search methods that were successfully used for fast secondary structure prediction from a single RNA sequence. Benchmark analyses showed that LinAliFold and CentroidLinAliFold were much faster than the existing methods while preserving the prediction accuracy. As an empirical application, we predicted the consensus secondary structure of coronaviruses with approximately 30 000 nt in 5 and 79 min by LinAliFold and CentroidLinAliFold, respectively. We confirmed that the predicted consensus secondary structure of coronaviruses was consistent with the experimental results. Availability and implementation The source codes of LinAliFold and CentroidLinAliFold are freely available at https://github.com/fukunagatsu/LinAliFold-CentroidLinAliFold. Supplementary information Supplementary data are available at Bioinformatics Advances online.
Collapse
Affiliation(s)
| | - Michiaki Hamada
- Department of Electrical Engineering and Bioscience, Graduate School of Advanced Science and Engineering, Waseda University, Tokyo 1698555, Japan,Computational Bio Big-Data Open Innovation Laboratory, AIST-Waseda University, Tokyo 1698555, Japan
| |
Collapse
|
13
|
Zhang J, Fei Y, Sun L, Zhang QC. Advances and opportunities in RNA structure experimental determination and computational modeling. Nat Methods 2022; 19:1193-1207. [PMID: 36203019 DOI: 10.1038/s41592-022-01623-y] [Citation(s) in RCA: 31] [Impact Index Per Article: 15.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2022] [Accepted: 08/23/2022] [Indexed: 11/09/2022]
Abstract
Beyond transferring genetic information, RNAs are molecules with diverse functions that include catalyzing biochemical reactions and regulating gene expression. Most of these activities depend on RNAs' specific structures. Therefore, accurately determining RNA structure is integral to advancing our understanding of RNA functions. Here, we summarize the state-of-the-art experimental and computational technologies developed to evaluate RNA secondary and tertiary structures. We also highlight how the rapid increase of experimental data facilitates the integrative modeling approaches for better resolving RNA structures. Finally, we provide our thoughts on the latest advances and challenges in RNA structure determination methods, as well as on future directions for both experimental approaches and artificial intelligence-based computational tools to model RNA structure. Ultimately, we hope the technological advances will deepen our understanding of RNA biology and facilitate RNA structure-based biomedical research such as designing specific RNA structures for therapeutics and deploying RNA-targeting small-molecule drugs.
Collapse
Affiliation(s)
- Jinsong Zhang
- MOE Key Laboratory of Bioinformatics, Center for Synthetic and Systems Biology, School of Life Sciences, Tsinghua University, Beijing, China.,Beijing Advanced Innovation Center for Structural Biology & Frontier Research Center for Biological Structure, School of Life Sciences, Tsinghua University, Beijing, China.,Tsinghua-Peking Center for Life Sciences, Beijing, China
| | - Yuhan Fei
- MOE Key Laboratory of Bioinformatics, Center for Synthetic and Systems Biology, School of Life Sciences, Tsinghua University, Beijing, China.,Beijing Advanced Innovation Center for Structural Biology & Frontier Research Center for Biological Structure, School of Life Sciences, Tsinghua University, Beijing, China.,Tsinghua-Peking Center for Life Sciences, Beijing, China
| | - Lei Sun
- MOE Key Laboratory of Bioinformatics, Center for Synthetic and Systems Biology, School of Life Sciences, Tsinghua University, Beijing, China. .,Beijing Advanced Innovation Center for Structural Biology & Frontier Research Center for Biological Structure, School of Life Sciences, Tsinghua University, Beijing, China. .,Tsinghua-Peking Center for Life Sciences, Beijing, China.
| | - Qiangfeng Cliff Zhang
- MOE Key Laboratory of Bioinformatics, Center for Synthetic and Systems Biology, School of Life Sciences, Tsinghua University, Beijing, China. .,Beijing Advanced Innovation Center for Structural Biology & Frontier Research Center for Biological Structure, School of Life Sciences, Tsinghua University, Beijing, China. .,Tsinghua-Peking Center for Life Sciences, Beijing, China.
| |
Collapse
|
14
|
RNA secondary structure packages evaluated and improved by high-throughput experiments. Nat Methods 2022; 19:1234-1242. [PMID: 36192461 PMCID: PMC9839360 DOI: 10.1038/s41592-022-01605-0] [Citation(s) in RCA: 27] [Impact Index Per Article: 13.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2020] [Accepted: 08/10/2022] [Indexed: 01/17/2023]
Abstract
Despite the popularity of computer-aided study and design of RNA molecules, little is known about the accuracy of commonly used structure modeling packages in tasks sensitive to ensemble properties of RNA. Here, we demonstrate that the EternaBench dataset, a set of more than 20,000 synthetic RNA constructs designed on the RNA design platform Eterna, provides incisive discriminative power in evaluating current packages in ensemble-oriented structure prediction tasks. We find that CONTRAfold and RNAsoft, packages with parameters derived through statistical learning, achieve consistently higher accuracy than more widely used packages in their standard settings, which derive parameters primarily from thermodynamic experiments. We hypothesized that training a multitask model with the varied data types in EternaBench might improve inference on ensemble-based prediction tasks. Indeed, the resulting model, named EternaFold, demonstrated improved performance that generalizes to diverse external datasets including complete messenger RNAs, viral genomes probed in human cells and synthetic designs modeling mRNA vaccines.
Collapse
|
15
|
Szabat M, Prochota M, Kierzek R, Kierzek E, Mathews DH. A Test and Refinement of Folding Free Energy Nearest Neighbor Parameters for RNA Including N 6-Methyladenosine. J Mol Biol 2022; 434:167632. [PMID: 35588868 PMCID: PMC11235186 DOI: 10.1016/j.jmb.2022.167632] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2022] [Revised: 04/29/2022] [Accepted: 05/07/2022] [Indexed: 12/26/2022]
Abstract
RNA folding free energy change parameters are widely used to predict RNA secondary structure and to design RNA sequences. These parameters include terms for the folding free energies of helices and loops. Although the full set of parameters has only been traditionally available for the four common bases and backbone, it is well known that covalent modifications of nucleotides are widespread in natural RNAs. Covalent modifications are also widely used in engineered sequences. We recently derived a full set of nearest neighbor terms for RNA that includes N6-methyladenosine (m6A). In this work, we test the model using 98 optical melting experiments, matching duplexes with or without N6-methylation of A. Most experiments place RRACH, the consensus site of N6-methylation, in a variety of contexts, including helices, bulge loops, internal loops, dangling ends, and terminal mismatches. For matched sets of experiments that include either A or m6A in the same context, we find that the parameters for m6A are as accurate as those for A. Across all experiments, the root mean squared deviation between estimated and experimental free energy changes is 0.67 kcal/mol. We used the new experimental data to refine the set of nearest neighbor parameter terms for m6A. These parameters enable prediction of RNA secondary structures including m6A, which can be used to model how N6-methylation of A affects RNA structure.
Collapse
Affiliation(s)
- Marta Szabat
- Institute of Bioorganic Chemistry Polish Academy of Sciences, Noskowskiego 12/14, 61-704 Poznan, Poland
| | - Martina Prochota
- Institute of Bioorganic Chemistry Polish Academy of Sciences, Noskowskiego 12/14, 61-704 Poznan, Poland
| | - Ryszard Kierzek
- Institute of Bioorganic Chemistry Polish Academy of Sciences, Noskowskiego 12/14, 61-704 Poznan, Poland
| | - Elzbieta Kierzek
- Institute of Bioorganic Chemistry Polish Academy of Sciences, Noskowskiego 12/14, 61-704 Poznan, Poland.
| | - David H Mathews
- Department of Biochemistry & Biophysics and Center for RNA Biology, 601 Elmwood Avenue, Box 712, School of Medicine and Dentistry, University of Rochester, Rochester, NY 14642, United States.
| |
Collapse
|
16
|
Szikszai M, Wise M, Datta A, Ward M, Mathews DH. Deep learning models for RNA secondary structure prediction (probably) do not generalize across families. Bioinformatics 2022; 38:3892-3899. [PMID: 35748706 PMCID: PMC9364374 DOI: 10.1093/bioinformatics/btac415] [Citation(s) in RCA: 19] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2022] [Revised: 06/09/2022] [Accepted: 06/21/2022] [Indexed: 12/24/2022] Open
Abstract
MOTIVATION The secondary structure of RNA is of importance to its function. Over the last few years, several papers attempted to use machine learning to improve de novo RNA secondary structure prediction. Many of these papers report impressive results for intra-family predictions but seldom address the much more difficult (and practical) inter-family problem. RESULTS We demonstrate that it is nearly trivial with convolutional neural networks to generate pseudo-free energy changes, modelled after structure mapping data that improve the accuracy of structure prediction for intra-family cases. We propose a more rigorous method for inter-family cross-validation that can be used to assess the performance of learning-based models. Using this method, we further demonstrate that intra-family performance is insufficient proof of generalization despite the widespread assumption in the literature and provide strong evidence that many existing learning-based models have not generalized inter-family. AVAILABILITY AND IMPLEMENTATION Source code and data are available at https://github.com/marcellszi/dl-rna. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Marcell Szikszai
- Department of Computer Science & Software Engineering, The University of Western Australia, Perth, WA 6009, Australia
| | - Michael Wise
- Department of Computer Science & Software Engineering, The University of Western Australia, Perth, WA 6009, Australia
- The Marshall Centre for Infectious Diseases Research and Training, The University of Western Australia, Perth, WA 6009, Australia
| | - Amitava Datta
- Department of Computer Science & Software Engineering, The University of Western Australia, Perth, WA 6009, Australia
| | - Max Ward
- Department of Computer Science & Software Engineering, The University of Western Australia, Perth, WA 6009, Australia
- Department of Molecular and Cellular Biology, Harvard University, Cambridge, MA 02138, USA
| | - David H Mathews
- Department of Biochemistry & Biophysics, Center for RNA Biology, and Department of Biostatistics & Computational Biology, University of Rochester, Rochester, NY 14642, USA
| |
Collapse
|
17
|
Flamm C, Wielach J, Wolfinger MT, Badelt S, Lorenz R, Hofacker IL. Caveats to Deep Learning Approaches to RNA Secondary Structure Prediction. FRONTIERS IN BIOINFORMATICS 2022; 2:835422. [PMID: 36304289 PMCID: PMC9580944 DOI: 10.3389/fbinf.2022.835422] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2021] [Accepted: 06/09/2022] [Indexed: 11/18/2022] Open
Abstract
Machine learning (ML) and in particular deep learning techniques have gained popularity for predicting structures from biopolymer sequences. An interesting case is the prediction of RNA secondary structures, where well established biophysics based methods exist. The accuracy of these classical methods is limited due to lack of experimental parameters and certain simplifying assumptions and has seen little improvement over the last decade. This makes RNA folding an attractive target for machine learning and consequently several deep learning models have been proposed in recent years. However, for ML approaches to be competitive for de-novo structure prediction, the models must not just demonstrate good phenomenological fits, but be able to learn a (complex) biophysical model. In this contribution we discuss limitations of current approaches, in particular due to biases in the training data. Furthermore, we propose to study capabilities and limitations of ML models by first applying them on synthetic data (obtained from a simplified biophysical model) that can be generated in arbitrary amounts and where all biases can be controlled. We assume that a deep learning model that performs well on these synthetic, would also perform well on real data, and vice versa. We apply this idea by testing several ML models of varying complexity. Finally, we show that the best models are capable of capturing many, but not all, properties of RNA secondary structures. Most severely, the number of predicted base pairs scales quadratically with sequence length, even though a secondary structure can only accommodate a linear number of pairs.
Collapse
Affiliation(s)
- Christoph Flamm
- Department of Theoretical Chemistry, University of Vienna, Vienna, Austria
| | - Julia Wielach
- Department of Theoretical Chemistry, University of Vienna, Vienna, Austria
| | - Michael T. Wolfinger
- Department of Theoretical Chemistry, University of Vienna, Vienna, Austria
- Research Group Bioinformatics and Computational Biology, Faculty of Computer Science, University of Vienna, Vienna, Austria
| | - Stefan Badelt
- Department of Theoretical Chemistry, University of Vienna, Vienna, Austria
| | - Ronny Lorenz
- Department of Theoretical Chemistry, University of Vienna, Vienna, Austria
| | - Ivo L. Hofacker
- Department of Theoretical Chemistry, University of Vienna, Vienna, Austria
- Research Group Bioinformatics and Computational Biology, Faculty of Computer Science, University of Vienna, Vienna, Austria
- *Correspondence: Ivo L. Hofacker,
| |
Collapse
|
18
|
Zhao Q, Zhao Z, Fan X, Yuan Z, Mao Q, Yao Y. Review of machine learning methods for RNA secondary structure prediction. PLoS Comput Biol 2021; 17:e1009291. [PMID: 34437528 PMCID: PMC8389396 DOI: 10.1371/journal.pcbi.1009291] [Citation(s) in RCA: 26] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/05/2022] Open
Abstract
Secondary structure plays an important role in determining the function of noncoding RNAs. Hence, identifying RNA secondary structures is of great value to research. Computational prediction is a mainstream approach for predicting RNA secondary structure. Unfortunately, even though new methods have been proposed over the past 40 years, the performance of computational prediction methods has stagnated in the last decade. Recently, with the increasing availability of RNA structure data, new methods based on machine learning (ML) technologies, especially deep learning, have alleviated the issue. In this review, we provide a comprehensive overview of RNA secondary structure prediction methods based on ML technologies and a tabularized summary of the most important methods in this field. The current pending challenges in the field of RNA secondary structure prediction and future trends are also discussed.
Collapse
Affiliation(s)
- Qi Zhao
- College of Medicine and Biological Information Engineering, Northeastern University, Shenyang, Liaoning, China
| | - Zheng Zhao
- School of Information Science and Technology, Dalian Maritime University, Dalian, Liaoning, China
| | - Xiaoya Fan
- School of Software, Key Laboratory for Ubiquitous Network and Service Software of Liaoning Province, Dalian University of Technology, Dalian, Liaoning, China
| | - Zhengwei Yuan
- Key Laboratory of Health Ministry for Congenital Malformation, Shengjing Hospital of China Medical University, Shenyang, Liaoning, China
| | - Qian Mao
- College of Light Industry, Liaoning University, Shenyang, Liaoning, China
- Key Laboratory of Agroproducts Processing Technology, Changchun University, Changchun, Jilin, China
| | - Yudong Yao
- Department of Electrical and Computer Engineering, Stevens Institute of Technology, Hoboken, New Jersey, United States of America
| |
Collapse
|
19
|
Fernandez–Steel Skew Normal Conditional Autoregressive (FSSN CAR) Model in Stan for Spatial Data. Symmetry (Basel) 2021. [DOI: 10.3390/sym13040545] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
In spatial data analysis, the prior conditional autoregressive (CAR) model is used to express the spatial dependence on random effects from adjacent regions. This paper provides a new proposed approach regarding the development of the existing normal CAR model into a more flexible, Fernandez–Steel skew normal (FSSN) CAR model. This approach is able to capture spatial random effects that have both symmetrical and asymmetrical patterns. The FSSN CAR model is built on the basis of the normal CAR with an additional skew parameter. The FSSN distribution is able to provide good estimates for symmetry with heavy- or light-tailed and skewed-right and skewed-left data. The effects of this approach are demonstrated by establishing the FSSN distribution and FSSN CAR model in spatial data using Stan language. On the basis of the plot of the estimation results and histogram of the model error, the FSSN CAR model was shown to behave better than both models without a spatial effect and with the normal CAR model. Moreover, the smallest widely applicable information criterion (WAIC) and leave-one-out (LOO) statistical values also validate the model, as FSSN CAR is shown to be the best model used.
Collapse
|
20
|
Sato K, Akiyama M, Sakakibara Y. RNA secondary structure prediction using deep learning with thermodynamic integration. Nat Commun 2021; 12:941. [PMID: 33574226 PMCID: PMC7878809 DOI: 10.1038/s41467-021-21194-4] [Citation(s) in RCA: 121] [Impact Index Per Article: 40.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2020] [Accepted: 01/15/2021] [Indexed: 12/23/2022] Open
Abstract
Accurate predictions of RNA secondary structures can help uncover the roles of functional non-coding RNAs. Although machine learning-based models have achieved high performance in terms of prediction accuracy, overfitting is a common risk for such highly parameterized models. Here we show that overfitting can be minimized when RNA folding scores learnt using a deep neural network are integrated together with Turner’s nearest-neighbor free energy parameters. Training the model with thermodynamic regularization ensures that folding scores and the calculated free energy are as close as possible. In computational experiments designed for newly discovered non-coding RNAs, our algorithm (MXfold2) achieves the most robust and accurate predictions of RNA secondary structures without sacrificing computational efficiency compared to several other algorithms. The results suggest that integrating thermodynamic information could help improve the robustness of deep learning-based predictions of RNA secondary structure. Accurately predicting the secondary structure of non-coding RNAs can help unravel their function. Here the authors propose a method integrating thermodynamic information and deep learning to improve the robustness of RNA secondary structure prediction compared to several existing algorithms.
Collapse
Affiliation(s)
- Kengo Sato
- Department of Biosciences and Informatics, Keio University, 3-14-1 Hiyoshi, Kohoku-ku, Yokohama, Japan.
| | - Manato Akiyama
- Department of Biosciences and Informatics, Keio University, 3-14-1 Hiyoshi, Kohoku-ku, Yokohama, Japan
| | - Yasubumi Sakakibara
- Department of Biosciences and Informatics, Keio University, 3-14-1 Hiyoshi, Kohoku-ku, Yokohama, Japan
| |
Collapse
|
21
|
Reis AC, Salis HM. An Automated Model Test System for Systematic Development and Improvement of Gene Expression Models. ACS Synth Biol 2020; 9:3145-3156. [PMID: 33054181 DOI: 10.1021/acssynbio.0c00394] [Citation(s) in RCA: 69] [Impact Index Per Article: 17.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]
Abstract
Gene expression models greatly accelerate the engineering of synthetic metabolic pathways and genetic circuits by predicting sequence-function relationships and reducing trial-and-error experimentation. However, developing models with more accurate predictions remains a significant challenge. Here we present a model test system that combines advanced statistics, machine learning, and a database of 9862 characterized genetic systems to automatically quantify model accuracies, accept or reject mechanistic hypotheses, and identify areas for model improvement. We also introduce model capacity, a new information theoretic metric for correct cross-data-set comparisons. We demonstrate the model test system by comparing six models of translation initiation rate, evaluating 100 mechanistic hypotheses, and uncovering new sequence determinants that control protein expression levels. We then applied these results to develop a biophysical model of translation initiation rate with significant improvements in accuracy. Automated model test systems will dramatically accelerate the development of gene expression models, and thereby transition synthetic biology into a mature engineering discipline.
Collapse
|
22
|
|
23
|
Ward M, Sun H, Datta A, Wise M, Mathews DH. Determining parameters for non-linear models of multi-loop free energy change. Bioinformatics 2020; 35:4298-4306. [PMID: 30923811 DOI: 10.1093/bioinformatics/btz222] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2018] [Revised: 02/10/2019] [Accepted: 03/27/2019] [Indexed: 12/12/2022] Open
Abstract
MOTIVATION Predicting the secondary structure of RNA is a fundamental task in bioinformatics. Algorithms that predict secondary structure given only the primary sequence, and a model to evaluate the quality of a structure, are an integral part of this. These algorithms have been updated as our model of RNA thermodynamics changed and expanded. An exception to this has been the treatment of multi-loops. Although more advanced models of multi-loop free energy change have been suggested, a simple, linear model has been used since the 1980s. However, recently, new dynamic programing algorithms for secondary structure prediction that could incorporate these models were presented. Unfortunately, these models appear to have lower accuracy for secondary structure prediction. RESULTS We apply linear regression and a new parameter optimization algorithm to find better parameters for the existing linear model and advanced non-linear multi-loop models. These include the Jacobson-Stockmayer and Aalberts & Nandagopal models. We find that the current linear model parameters may be near optimal for the linear model, and that no advanced model performs better than the existing linear model parameters even after parameter optimization. AVAILABILITY AND IMPLEMENTATION Source code and data is available at https://github.com/maxhwardg/advanced_multiloops. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Max Ward
- Computer Science & Software Engineering, The University of Western Australia, Crawley, WA, Australia
| | - Hongying Sun
- Department of Biochemistry & Biophysics, University of Rochester, Rochester, NY, USA.,Center for RNA Biology, University of Rochester, Rochester, NY, USA
| | - Amitava Datta
- Computer Science & Software Engineering, The University of Western Australia, Crawley, WA, Australia
| | - Michael Wise
- Computer Science & Software Engineering, The University of Western Australia, Crawley, WA, Australia.,The Marshall Centre for Infectious Diseases Research and Training, The University of Western Australia, Crawley, WA, Australia
| | - David H Mathews
- Department of Biostatistics & Computational Biology, University of Rochester, Rochester, NY, USA
| |
Collapse
|
24
|
Abstract
There are some NP-hard problems in the prediction of RNA structures. Prediction of RNA folding structure in RNA nucleotide sequence remains an unsolved challenge. We investigate the computing algorithm in RNA folding structural prediction based on extended structure and basin hopping graph, it is a computing mode of basin hopping graph in RNA folding structural prediction including pseudoknots. This study presents the predicting algorithm based on extended structure, it also proposes an improved computing algorithm based on barrier tree and basin hopping graph, which are the attractive approaches in RNA folding structural prediction. Many experiments have been implemented in Rfam14.1 database and PseudoBase database, the experimental results show that our two algorithms are efficient and accurate than the other existing algorithms.
Collapse
Affiliation(s)
- Zhendong Liu
- School of Computer Science and Technology, Shandong Jianzhu University, Jinan 250101, P. R. China
- Department of Biostatistics, University of California, Los Angeles, Los Angeles 90095, USA
- Department of Statistics, Harvard University, Cambridge, MA 02138, USA
| | - Gang Li
- Department of Biostatistics, University of California, Los Angeles, Los Angeles 90095, USA
| | - Jun S. Liu
- Department of Statistics, Harvard University, Cambridge, MA 02138, USA
| |
Collapse
|
25
|
Spasic A, Berger KD, Chen JL, Seetin MG, Turner DH, Mathews DH. Improving RNA nearest neighbor parameters for helices by going beyond the two-state model. Nucleic Acids Res 2019; 46:4883-4892. [PMID: 29718397 PMCID: PMC6007268 DOI: 10.1093/nar/gky270] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2018] [Accepted: 04/22/2018] [Indexed: 12/31/2022] Open
Abstract
RNA folding free energy change nearest neighbor parameters are widely used to predict folding stabilities of secondary structures. They were determined by linear regression to datasets of optical melting experiments on small model systems. Traditionally, the optical melting experiments are analyzed assuming a two-state model, i.e. a structure is either complete or denatured. Experimental evidence, however, shows that structures exist in an ensemble of conformations. Partition functions calculated with existing nearest neighbor parameters predict that secondary structures can be partially denatured, which also directly conflicts with the two-state model. Here, a new approach for determining RNA nearest neighbor parameters is presented. Available optical melting data for 34 Watson–Crick helices were fit directly to a partition function model that allows an ensemble of conformations. Fitting parameters were the enthalpy and entropy changes for helix initiation, terminal AU pairs, stacks of Watson–Crick pairs and disordered internal loops. The resulting set of nearest neighbor parameters shows a 38.5% improvement in the sum of residuals in fitting the experimental melting curves compared to the current literature set.
Collapse
Affiliation(s)
- Aleksandar Spasic
- Department of Biochemistry & Biophysics, University of Rochester Medical Center, Rochester, NY 14642, USA.,Center for RNA Biology, University of Rochester Medical Center, Rochester, NY 14642, USA
| | - Kyle D Berger
- Department of Biochemistry & Biophysics, University of Rochester Medical Center, Rochester, NY 14642, USA.,Center for RNA Biology, University of Rochester Medical Center, Rochester, NY 14642, USA
| | - Jonathan L Chen
- Center for RNA Biology, University of Rochester Medical Center, Rochester, NY 14642, USA.,Department of Chemistry, University of Rochester, Rochester, NY 14627, USA
| | - Matthew G Seetin
- Department of Biochemistry & Biophysics, University of Rochester Medical Center, Rochester, NY 14642, USA.,Center for RNA Biology, University of Rochester Medical Center, Rochester, NY 14642, USA
| | - Douglas H Turner
- Center for RNA Biology, University of Rochester Medical Center, Rochester, NY 14642, USA.,Department of Chemistry, University of Rochester, Rochester, NY 14627, USA
| | - David H Mathews
- Department of Biochemistry & Biophysics, University of Rochester Medical Center, Rochester, NY 14642, USA.,Center for RNA Biology, University of Rochester Medical Center, Rochester, NY 14642, USA.,Department of Biostatistics & Computational Biology, University of Rochester Medical Center, Rochester, NY 14642, USA
| |
Collapse
|
26
|
Danaee P, Rouches M, Wiley M, Deng D, Huang L, Hendrix D. bpRNA: large-scale automated annotation and analysis of RNA secondary structure. Nucleic Acids Res 2019; 46:5381-5394. [PMID: 29746666 PMCID: PMC6009582 DOI: 10.1093/nar/gky285] [Citation(s) in RCA: 85] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2018] [Accepted: 04/11/2018] [Indexed: 01/04/2023] Open
Abstract
While RNA secondary structure prediction from sequence data has made remarkable progress, there is a need for improved strategies for annotating the features of RNA secondary structures. Here, we present bpRNA, a novel annotation tool capable of parsing RNA structures, including complex pseudoknot-containing RNAs, to yield an objective, precise, compact, unambiguous, easily-interpretable description of all loops, stems, and pseudoknots, along with the positions, sequence, and flanking base pairs of each such structural feature. We also introduce several new informative representations of RNA structure types to improve structure visualization and interpretation. We have further used bpRNA to generate a web-accessible meta-database, ‘bpRNA-1m’, of over 100 000 single-molecule, known secondary structures; this is both more fully and accurately annotated and over 20-times larger than existing databases. We use a subset of the database with highly similar (≥90% identical) sequences filtered out to report on statistical trends in sequence, flanking base pairs, and length. Both the bpRNA method and the bpRNA-1m database will be valuable resources both for specific analysis of individual RNA molecules and large-scale analyses such as are useful for updating RNA energy parameters for computational thermodynamic predictions, improving machine learning models for structure prediction, and for benchmarking structure-prediction algorithms.
Collapse
Affiliation(s)
| | | | | | - Dezhong Deng
- School of Electrical Engineering and Computer Science
| | - Liang Huang
- School of Electrical Engineering and Computer Science
| | - David Hendrix
- School of Electrical Engineering and Computer Science.,Department of Biochemistry and Biophysics
| |
Collapse
|
27
|
Zuber J, Mathews DH. Estimating uncertainty in predicted folding free energy changes of RNA secondary structures. RNA (NEW YORK, N.Y.) 2019; 25:747-754. [PMID: 30952689 PMCID: PMC6521603 DOI: 10.1261/rna.069203.118] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/08/2018] [Accepted: 04/02/2019] [Indexed: 06/09/2023]
Abstract
Nearest neighbor parameters for estimating the folding stability of RNA are commonly used in secondary structure prediction, for generating folding ensembles of structures, and for analyzing RNA function. Previously, we demonstrated that we could quantify the uncertainties in each nearest neighbor parameter by perturbing the underlying optical melting data within experimental error and rederiving the parameters, which accounts for the substantial correlations that exist between the parameters. In this contribution, we describe a method to estimate uncertainty in the estimated folding stabilities of RNA structures, accounting for correlations in the nearest neighbor parameters. This method is incorporated in the RNA structure software package.
Collapse
Affiliation(s)
- Jeffrey Zuber
- Department of Biochemistry and Biophysics and Center for RNA Biology, University of Rochester Medical Center, Rochester, New York, 14642, USA
| | - David H Mathews
- Department of Biochemistry and Biophysics and Center for RNA Biology, University of Rochester Medical Center, Rochester, New York, 14642, USA
- Department of Biostatistics and Computational Biology, University of Rochester Medical Center, Rochester, New York, 14642, USA
| |
Collapse
|
28
|
Mathews DH. How to benchmark RNA secondary structure prediction accuracy. Methods 2019; 162-163:60-67. [PMID: 30951834 DOI: 10.1016/j.ymeth.2019.04.003] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2018] [Revised: 03/24/2019] [Accepted: 04/01/2019] [Indexed: 11/18/2022] Open
Abstract
RNA secondary structure prediction is widely used. As new methods are developed, these are often benchmarked for accuracy against existing methods. This review discusses good practices for performing these benchmarks, including the choice of benchmarking structures, metrics to quantify accuracy, the importance of allowing flexibility for pairs in the accepted structure, and the importance of statistical testing for significance.
Collapse
Affiliation(s)
- David H Mathews
- Center for RNA Biology, Department of Biochemistry & Biophysics, and Department of Biostatistics & Computational Biology, University of Rochester Medical Center, 601 Elmwood Avenue, Box 712, Rochester, NY 14642, United States.
| |
Collapse
|
29
|
Akiyama M, Sato K, Sakakibara Y. A max-margin training of RNA secondary structure prediction integrated with the thermodynamic model. J Bioinform Comput Biol 2019; 16:1840025. [PMID: 30616476 DOI: 10.1142/s0219720018400255] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
Abstract
A popular approach for predicting RNA secondary structure is the thermodynamic nearest-neighbor model that finds a thermodynamically most stable secondary structure with minimum free energy (MFE). For further improvement, an alternative approach that is based on machine learning techniques has been developed. The machine learning-based approach can employ a fine-grained model that includes much richer feature representations with the ability to fit the training data. Although a machine learning-based fine-grained model achieved extremely high performance in prediction accuracy, a possibility of the risk of overfitting for such a model has been reported. In this paper, we propose a novel algorithm for RNA secondary structure prediction that integrates the thermodynamic approach and the machine learning-based weighted approach. Our fine-grained model combines the experimentally determined thermodynamic parameters with a large number of scoring parameters for detailed contexts of features that are trained by the structured support vector machine (SSVM) with the [Formula: see text] regularization to avoid overfitting. Our benchmark shows that our algorithm achieves the best prediction accuracy compared with existing methods, and heavy overfitting cannot be observed. The implementation of our algorithm is available at https://github.com/keio-bioinformatics/mxfold .
Collapse
Affiliation(s)
- Manato Akiyama
- Department of Biosciences and Informatics, Keio University, 3–14–1 Hiyoshi, Kohoku-ku, Yokohama 223–8522, Japan
| | - Kengo Sato
- Department of Biosciences and Informatics, Keio University, 3–14–1 Hiyoshi, Kohoku-ku, Yokohama 223–8522, Japan
| | - Yasubumi Sakakibara
- Department of Biosciences and Informatics, Keio University, 3–14–1 Hiyoshi, Kohoku-ku, Yokohama 223–8522, Japan
| |
Collapse
|
30
|
Zuber J, Cabral BJ, McFadyen I, Mauger DM, Mathews DH. Analysis of RNA nearest neighbor parameters reveals interdependencies and quantifies the uncertainty in RNA secondary structure prediction. RNA (NEW YORK, N.Y.) 2018; 24:1568-1582. [PMID: 30104207 PMCID: PMC6191722 DOI: 10.1261/rna.065102.117] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/28/2017] [Accepted: 08/07/2018] [Indexed: 05/08/2023]
Abstract
RNA secondary structure prediction is often used to develop hypotheses about structure-function relationships for newly discovered RNA sequences, to identify unknown functional RNAs, and to design sequences. Secondary structure prediction methods typically use a thermodynamic model that estimates the free energy change of possible structures based on a set of nearest neighbor parameters. These parameters were derived from optical melting experiments of small model oligonucleotides. This work aims to better understand the precision of structure prediction. Here, the experimental errors in optical melting experiments were propagated to errors in the derived nearest neighbor parameter values and then to errors in RNA secondary structure prediction. To perform this analysis, the optical melting experimental values were systematically perturbed within the estimates of experimental error and alternative sets of nearest neighbor parameters were then derived from these error-bounded values. Secondary structure predictions using either the perturbed or reference parameter sets were then compared. This work demonstrated that the precision of RNA secondary structure prediction is more robust than suggested by previous work based on perturbation of the nearest neighbor parameters. This robustness is due to correlations between parameters. Additionally, this work identified weaknesses in the parameter derivation that makes accurate assessment of parameter uncertainty difficult. Considerations for experimental design are provided to mitigate these weaknesses are provided.
Collapse
Affiliation(s)
- Jeffrey Zuber
- Department of Biochemistry and Biophysics and Center for RNA Biology, University of Rochester Medical Center, Rochester, New York 14642, USA
| | - B Joseph Cabral
- Computational Sciences, Moderna Therapeutics, Cambridge, Massachusetts 02141, USA
| | - Iain McFadyen
- Computational Sciences, Moderna Therapeutics, Cambridge, Massachusetts 02141, USA
| | - David M Mauger
- Computational Sciences, Moderna Therapeutics, Cambridge, Massachusetts 02141, USA
| | - David H Mathews
- Department of Biochemistry and Biophysics and Center for RNA Biology, University of Rochester Medical Center, Rochester, New York 14642, USA
- Department of Biostatistics and Computational Biology, University of Rochester Medical Center, Rochester, New York 14642, USA
| |
Collapse
|
31
|
Liu Z, Zhu D, Dai Q. Predicting Model and Algorithm in RNA Folding Structure Including Pseudoknots. INT J PATTERN RECOGN 2018. [DOI: 10.1142/s0218001418510059] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
The prediction of RNA structure with pseudoknots is a nondeterministic polynomial-time hard (NP-hard) problem; according to minimum free energy models and computational methods, we investigate the RNA-pseudoknotted structure. Our paper presents an efficient algorithm for predicting RNA structure with pseudoknots, and the algorithm takes O([Formula: see text]) time and O([Formula: see text]) space, the experimental tests in Rfam10.1 and PseudoBase indicate that the algorithm is more effective and precise. The predicting accuracy, the time complexity and space complexity outperform existing algorithms, such as Maximum Weight Matching (MWM) algorithm, PKNOTS algorithm and Inner Limiting Layer (ILM) algorithm, and the algorithm can predict arbitrary pseudoknots. And there exists a [Formula: see text] ([Formula: see text]) polynomial time approximation scheme in searching maximum number of stackings, and we give the proof of the approximation scheme in RNA-pseudoknotted structure. We have improved several types of pseudoknots considered in RNA folding structure, and analyze their possible transitions between types of pseudoknots.
Collapse
Affiliation(s)
- Zhendong Liu
- School of Computer Science and Technology, Shandong Jianzhu University, Jinan 250101, P. R. China
| | - Daming Zhu
- School of Computer Science and Technology, Shandong University, Jinan 250101, P. R. China
| | - Qionghai Dai
- Department of Automation, Tsinghua University, Beijing 100084, P. R. China
| |
Collapse
|
32
|
Fukunaga T, Hamada M. RIblast: an ultrafast RNA-RNA interaction prediction system based on a seed-and-extension approach. Bioinformatics 2018; 33:2666-2674. [PMID: 28459942 PMCID: PMC5860064 DOI: 10.1093/bioinformatics/btx287] [Citation(s) in RCA: 58] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2016] [Accepted: 04/27/2017] [Indexed: 12/28/2022] Open
Abstract
Motivation LncRNAs play important roles in various biological processes. Although more than 58 000 human lncRNA genes have been discovered, most known lncRNAs are still poorly characterized. One approach to understanding the functions of lncRNAs is the detection of the interacting RNA target of each lncRNA. Because experimental detections of comprehensive lncRNA–RNA interactions are difficult, computational prediction of lncRNA–RNA interactions is an indispensable technique. However, the high computational costs of existing RNA–RNA interaction prediction tools prevent their application to large-scale lncRNA datasets. Results Here, we present ‘RIblast’, an ultrafast RNA–RNA interaction prediction method based on the seed-and-extension approach. RIblast discovers seed regions using suffix arrays and subsequently extends seed regions based on an RNA secondary structure energy model. Computational experiments indicate that RIblast achieves a level of prediction accuracy similar to those of existing programs, but at speeds over 64 times faster than existing programs. Availability and implementation The source code of RIblast is freely available at https://github.com/fukunagatsu/RIblast. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Tsukasa Fukunaga
- Faculty of Science and Engineering, Waseda University, Tokyo 169-8555, Japan.,Japan Society for the Promotion of Science, Tokyo 102-0083, Japan
| | - Michiaki Hamada
- Faculty of Science and Engineering, Waseda University, Tokyo 169-8555, Japan.,Computational Bio Big-Data Open Innovation Laboratory, AIST-Waseda University, Tokyo 169-8555, Japan
| |
Collapse
|
33
|
Zhu Y, Xie Z, Li Y, Zhu M, Chen YPP. Research on folding diversity in statistical learning methods for RNA secondary structure prediction. Int J Biol Sci 2018; 14:872-882. [PMID: 29989089 PMCID: PMC6036747 DOI: 10.7150/ijbs.24595] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2017] [Accepted: 02/21/2018] [Indexed: 12/24/2022] Open
Abstract
How to improve the prediction accuracy of RNA secondary structure is currently a hot topic. The existing prediction methods for a single sequence do not fully consider the folding diversity which may occur among RNAs with different functions or sources. This paper explores the relationship between folding diversity and prediction accuracy, and puts forward a new method to improve the prediction accuracy of RNA secondary structure. Our research investigates the following: 1. The folding feature based on stochastic context-free grammar is proposed. By using dimension reduction and clustering techniques, some public data sets are analyzed. The results show that there is significant folding diversity among different RNA families. 2. To assign folding rules to RNAs without structural information, a classification method based on production probability is proposed. The experimental results show that the classification method proposed in this paper can effectively classify the RNAs of unknown structure. 3. Based on the existing prediction methods of statistical learning models, an RNA secondary structure prediction framework is proposed, namely "Cluster - Training - Parameter Selection - Prediction". The results show that, with information on folding diversity, prediction accuracy can be significantly improved.
Collapse
Affiliation(s)
- Yu Zhu
- College of Computer Science, Sichuan University, China
| | - ZhaoYang Xie
- College of Computer Science, Sichuan University, China
| | - YiZhou Li
- College of Chemistry, Sichuan University, China
| | - Min Zhu
- Vice Dean of College of Computer Science, Sichuan University
| | - Yi-Ping Phoebe Chen
- Department of Computer Science and Information Technology, La Trobe University, Australia
| |
Collapse
|
34
|
Groher F, Bofill-Bosch C, Schneider C, Braun J, Jager S, Geißler K, Hamacher K, Suess B. Riboswitching with ciprofloxacin-development and characterization of a novel RNA regulator. Nucleic Acids Res 2018; 46:2121-2132. [PMID: 29346617 PMCID: PMC5829644 DOI: 10.1093/nar/gkx1319] [Citation(s) in RCA: 46] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2017] [Revised: 12/22/2017] [Accepted: 12/28/2017] [Indexed: 11/24/2022] Open
Abstract
RNA molecules play important and diverse regulatory roles in the cell. Inspired by this natural versatility, RNA devices are increasingly important for many synthetic biology applications, e.g. optimizing engineered metabolic pathways, gene therapeutics or building up complex logical units. A major advantage of RNA is the possibility of de novo design of RNA-based sensing domains via an in vitro selection process (SELEX). Here, we describe development of a novel ciprofloxacin-responsive riboswitch by in vitro selection and next-generation sequencing-guided cellular screening. The riboswitch recognizes the small molecule drug ciprofloxacin with a KD in the low nanomolar range and adopts a pseudoknot fold stabilized by ligand binding. It efficiently interferes with gene expression both in lower and higher eukaryotes. By controlling an auxotrophy marker and a resistance gene, respectively, we demonstrate efficient, scalable and programmable control of cellular survival in yeast. The applied strategy for the development of the ciprofloxacin riboswitch is easily transferrable to any small molecule target of choice and will thus broaden the spectrum of RNA regulators considerably.
Collapse
Affiliation(s)
- Florian Groher
- Synthetic Genetic Circuits, Dept. of Biology, TU Darmstadt, Darmstadt, Germany
| | | | | | - Johannes Braun
- Synthetic Genetic Circuits, Dept. of Biology, TU Darmstadt, Darmstadt, Germany
| | - Sven Jager
- Computational Biology and Simulation, Dept. of Biology, TU Darmstadt, Darmstadt, Germany
| | - Katharina Geißler
- Synthetic Genetic Circuits, Dept. of Biology, TU Darmstadt, Darmstadt, Germany
| | - Kay Hamacher
- Computational Biology and Simulation, Dept. of Biology, TU Darmstadt, Darmstadt, Germany
- Dept. of Physics, Dept. of Computer Science, TU Darmstadt, Darmstadt, Germany
| | - Beatrix Suess
- Synthetic Genetic Circuits, Dept. of Biology, TU Darmstadt, Darmstadt, Germany
| |
Collapse
|
35
|
Zuber J, Sun H, Zhang X, McFadyen I, Mathews DH. A sensitivity analysis of RNA folding nearest neighbor parameters identifies a subset of free energy parameters with the greatest impact on RNA secondary structure prediction. Nucleic Acids Res 2017; 45:6168-6176. [PMID: 28334976 PMCID: PMC5449625 DOI: 10.1093/nar/gkx170] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2016] [Accepted: 03/10/2017] [Indexed: 01/02/2023] Open
Abstract
Nearest neighbor parameters for estimating the folding energy changes of RNA secondary structures are used in structure prediction and analysis. Despite their widespread application, a comprehensive analysis of the impact of each parameter on the precision of calculations had not been conducted. To identify the parameters with greatest impact, a sensitivity analysis was performed on the 291 parameters that compose the 2004 version of the free energy nearest neighbor rules. Perturbed parameter sets were generated by perturbing each parameter independently. Then the effect of each individual parameter change on predicted base-pair probabilities and secondary structures as compared to the standard parameter set was observed for a set of sequences including structured ncRNA, mRNA and randomized sequences. The results identify for the first time the parameters with the greatest impact on secondary structure prediction, and the subset which should be prioritized for further study in order to improve the precision of structure prediction. In particular, bulge loop initiation, multibranch loop initiation, AU/GU internal loop closure and AU/GU helix end parameters were particularly important. An analysis of parameter usage during folding free energy calculations of stochastic samples of secondary structures revealed a correlation between parameter usage and impact on structure prediction precision.
Collapse
Affiliation(s)
- Jeffrey Zuber
- Department of Biochemistry & Biophysics and Center for RNA Biology, University of Rochester Medical Center, Rochester, NY 14642, USA
| | - Hongying Sun
- Department of Biochemistry & Biophysics and Center for RNA Biology, University of Rochester Medical Center, Rochester, NY 14642, USA
| | - Xiaoju Zhang
- Department of Biochemistry & Biophysics and Center for RNA Biology, University of Rochester Medical Center, Rochester, NY 14642, USA
| | - Iain McFadyen
- Computational Sciences, Moderna Therapeutics, Cambridge, MA 02141, USA
| | - David H Mathews
- Department of Biochemistry & Biophysics and Center for RNA Biology, University of Rochester Medical Center, Rochester, NY 14642, USA.,Department of Biostatistics & Computational Biology, University of Rochester Medical Center, Rochester, NY 14642, USA
| |
Collapse
|
36
|
Inferring Parameters for an Elementary Step Model of DNA Structure Kinetics with Locally Context-Dependent Arrhenius Rates. ACTA ACUST UNITED AC 2017. [DOI: 10.1007/978-3-319-66799-7_12] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register]
|
37
|
Jager S, Schiller B, Babel P, Blumenroth M, Strufe T, Hamacher K. StreAM-[Formula: see text]: algorithms for analyzing coarse grained RNA dynamics based on Markov models of connectivity-graphs. Algorithms Mol Biol 2017; 12:15. [PMID: 28572834 PMCID: PMC5450175 DOI: 10.1186/s13015-017-0105-0] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2016] [Accepted: 05/16/2017] [Indexed: 12/05/2022] Open
Abstract
Background In this work, we present a new coarse grained representation of RNA dynamics. It is based on adjacency matrices and their interactions patterns obtained from molecular dynamics simulations. RNA molecules are well-suited for this representation due to their composition which is mainly modular and assessable by the secondary structure alone. These interactions can be represented as adjacency matrices of k nucleotides. Based on those, we define transitions between states as changes in the adjacency matrices which form Markovian dynamics. The intense computational demand for deriving the transition probability matrices prompted us to develop StreAM-\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$T_g$$\end{document}Tg, a stream-based algorithm for generating such Markov models of k-vertex adjacency matrices representing the RNA. Results We benchmark StreAM-\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$T_g$$\end{document}Tg (a) for random and RNA unit sphere dynamic graphs (b) for the robustness of our method against different parameters. Moreover, we address a riboswitch design problem by applying StreAM-\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$T_g$$\end{document}Tg on six long term molecular dynamics simulation of a synthetic tetracycline dependent riboswitch (500 ns) in combination with five different antibiotics. Conclusions The proposed algorithm performs well on large simulated as well as real world dynamic graphs. Additionally, StreAM-\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$T_g$$\end{document}Tg provides insights into nucleotide based RNA dynamics in comparison to conventional metrics like the root-mean square fluctuation. In the light of experimental data our results show important design opportunities for the riboswitch.
Collapse
|
38
|
Hill AC, Schroeder SJ. Thermodynamic stabilities of three-way junction nanomotifs in prohead RNA. RNA (NEW YORK, N.Y.) 2017; 23:521-529. [PMID: 28069889 PMCID: PMC5340915 DOI: 10.1261/rna.059220.116] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/18/2016] [Accepted: 12/24/2016] [Indexed: 06/06/2023]
Abstract
The thermodynamic stabilities of four natural prohead or packaging RNA (pRNA) three-way junction (3WJ) nanomotifs and seven phi29 pRNA 3WJ deletion mutant nanomotifs were investigated using UV optical melting on a three-component RNA system. Our data reveal that some pRNA 3WJs are more stable than the phi29 pRNA 3WJ. The stability of the 3WJ contributes to the unique self-assembly properties of pRNA. Thus, ultrastable pRNA 3WJ motifs suggest new scaffolds for pRNA-based nanotechnology. We present data demonstrating that pRNA 3WJs differentially respond to the presence of metal ions. A comparison of our data with free energies predicted by currently available RNA secondary structure prediction programs shows that these programs do not accurately predict multibranch loop stabilities. These results will expand the existing parameters used for RNA secondary structure prediction from sequence in order to better inform RNA structure-function hypotheses and guide the rational design of functional RNA supramolecular assemblies.
Collapse
Affiliation(s)
| | - Susan J Schroeder
- Department of Microbiology and Plant Biology
- Department of Chemistry and Biochemistry, University of Oklahoma, Norman, Oklahoma 73019, USA
| |
Collapse
|
39
|
Rodríguez-Mejía JL, Roldán-Salgado A, Osuna J, Merino E, Gaytán P. A Codon Deletion at the Beginning of Green Fluorescent Protein Genes Enhances Protein Expression. J Mol Microbiol Biotechnol 2016; 27:1-10. [PMID: 27820932 DOI: 10.1159/000448786] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022] Open
Abstract
Recombinant protein expression is one of the key issues in protein engineering and biotechnology. Among the different models for assessing protein production and structure-function studies, green fluorescent protein (GFP) is one of the preferred models because of its importance as a reporter in cellular and molecular studies. In this research we analyze the effect of codon deletions near the amino terminus of different GFP proteins on fluorescence. Our study includes Gly4 deletions in the enhanced GFP (EGFP), the red-shifted GFP and the red-shifted EGFP. The Gly4 deletion mutants and their corresponding wild-type counterparts were transcribed under the control of the T7 or Trc promoters and their expression patterns were analyzed. Different fluorescent outcomes were observed depending on the type of fluorescent gene versions. In silico analysis of the RNA secondary structures near the ribosome binding site revealed a direct relationship between their minimum free energy and GFP production. Integrative analysis of these results, including SDS-PAGE analysis, led us to conclude that the fluorescence improvement of cells expressing different versions of GFPs with Gly4 deleted is due to an enhancement of the accessibility of the ribosome binding site by reducing the stability of the RNA secondary structures at their mRNA leader regions.
Collapse
|
40
|
Lorenz R, Wolfinger MT, Tanzer A, Hofacker IL. Predicting RNA secondary structures from sequence and probing data. Methods 2016; 103:86-98. [PMID: 27064083 DOI: 10.1016/j.ymeth.2016.04.004] [Citation(s) in RCA: 66] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2015] [Revised: 03/29/2016] [Accepted: 04/04/2016] [Indexed: 01/08/2023] Open
Abstract
RNA secondary structures have proven essential for understanding the regulatory functions performed by RNA such as microRNAs, bacterial small RNAs, or riboswitches. This success is in part due to the availability of efficient computational methods for predicting RNA secondary structures. Recent advances focus on dealing with the inherent uncertainty of prediction by considering the ensemble of possible structures rather than the single most stable one. Moreover, the advent of high-throughput structural probing has spurred the development of computational methods that incorporate such experimental data as auxiliary information.
Collapse
Affiliation(s)
- Ronny Lorenz
- University of Vienna, Faculty of Chemistry, Department of Theoretical Chemistry, Währingerstrasse 17, 1090 Vienna, Austria.
| | - Michael T Wolfinger
- University of Vienna, Faculty of Chemistry, Department of Theoretical Chemistry, Währingerstrasse 17, 1090 Vienna, Austria; Medical University of Vienna, Center for Anatomy and Cell Biology, Währingerstraße 13, 1090 Vienna, Austria.
| | - Andrea Tanzer
- University of Vienna, Faculty of Chemistry, Department of Theoretical Chemistry, Währingerstrasse 17, 1090 Vienna, Austria.
| | - Ivo L Hofacker
- University of Vienna, Faculty of Chemistry, Department of Theoretical Chemistry, Währingerstrasse 17, 1090 Vienna, Austria; University of Vienna, Faculty of Computer Science, Research Group Bioinformatics and Computational Biology, Währingerstr. 29, 1090 Vienna, Austria.
| |
Collapse
|
41
|
Jager S, Schiller B, Strufe T, Hamacher K. StreAM- $$T_g$$ : Algorithms for Analyzing Coarse Grained RNA Dynamics Based on Markov Models of Connectivity-Graphs. LECTURE NOTES IN COMPUTER SCIENCE 2016. [DOI: 10.1007/978-3-319-43681-4_16] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/02/2023]
|
42
|
Abstract
RNA secondary structure is often predicted using folding thermodynamics. RNAstructure is a software package that includes structure prediction by free energy minimization, prediction of base pairing probabilities, prediction of structures composed of highly probably base pairs, and prediction of structures with pseudoknots. A user-friendly graphical user interface is provided, and this interface works on Windows, Apple OS X, and Linux. This chapter provides protocols for using RNAstructure for structure prediction.
Collapse
|
43
|
Kubota M, Tran C, Spitale RC. Progress and challenges for chemical probing of RNA structure inside living cells. Nat Chem Biol 2015; 11:933-41. [PMID: 26575240 PMCID: PMC5068366 DOI: 10.1038/nchembio.1958] [Citation(s) in RCA: 75] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2015] [Accepted: 10/14/2015] [Indexed: 01/18/2023]
Abstract
Proper gene expression is essential for the survival of every cell. Once thought to be a passive transporter of genetic information, RNA has recently emerged as a key player in nearly every pathway in the cell. A full description of its structure is critical to understanding RNA function. Decades of research have focused on utilizing chemical tools to interrogate the structures of RNAs, with recent focus shifting to performing experiments inside living cells. This Review will detail the design and utility of chemical reagents used in RNA structure probing. We also outline how these reagents have been used to gain a deeper understanding of RNA structure in vivo. We review the recent merger of chemical probing with deep sequencing. Finally, we outline some of the hurdles that remain in fully characterizing the structure of RNA inside living cells, and how chemical biology can uniquely tackle such challenges.
Collapse
Affiliation(s)
- Miles Kubota
- Department of Pharmaceutical Sciences, University of California, Irvine, Irvine, California, USA
| | - Catherine Tran
- Department of Pharmaceutical Sciences, University of California, Irvine, Irvine, California, USA
| | - Robert C Spitale
- Department of Pharmaceutical Sciences, University of California, Irvine, Irvine, California, USA
| |
Collapse
|
44
|
Abstract
Despite the success of RNA secondary structure prediction for simple, short RNAs, the problem of predicting RNAs with long-range tertiary folds remains. Furthermore, RNA 3D structure prediction is hampered by the lack of the knowledge about the tertiary contacts and their thermodynamic parameters. Low-resolution structural modeling enables us to estimate the conformational entropies for a number of tertiary folds through rigorous statistical mechanical calculations. The models lead to 3D tertiary folds at coarse-grained level. The coarse-grained structures serve as the initial structures for all-atom molecular dynamics refinement to build the final all-atom 3D structures. In this paper, we present an overview of RNA computational models for secondary and tertiary structures’ predictions and then focus on a recently developed RNA statistical mechanical model—the Vfold model. The main emphasis is placed on the physics behind the models, including the treatment of the non-canonical interactions in secondary and tertiary structure modelings, and the correlations to RNA functions.
Collapse
Affiliation(s)
- Xiaojun Xu
- />Department of Physics, University of Missouri, Columbia, MO 65211 USA
- />Department of Biochemistry, University of Missouri, Columbia, MO 65211 USA
- />Informatics Institute, University of Missouri, Columbia, MO 65211 USA
| | - Shi-Jie Chen
- />Department of Physics, University of Missouri, Columbia, MO 65211 USA
- />Department of Biochemistry, University of Missouri, Columbia, MO 65211 USA
- />Informatics Institute, University of Missouri, Columbia, MO 65211 USA
| |
Collapse
|
45
|
Saule C, Giegerich R. Pareto optimization in algebraic dynamic programming. Algorithms Mol Biol 2015; 10:22. [PMID: 26150892 PMCID: PMC4491898 DOI: 10.1186/s13015-015-0051-7] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2014] [Accepted: 05/07/2015] [Indexed: 11/10/2022] Open
Abstract
Pareto optimization combines independent objectives by computing the Pareto front of its search space, defined as the set of all solutions for which no other candidate solution scores better under all objectives. This gives, in a precise sense, better information than an artificial amalgamation of different scores into a single objective, but is more costly to compute. Pareto optimization naturally occurs with genetic algorithms, albeit in a heuristic fashion. Non-heuristic Pareto optimization so far has been used only with a few applications in bioinformatics. We study exact Pareto optimization for two objectives in a dynamic programming framework. We define a binary Pareto product operator [Formula: see text] on arbitrary scoring schemes. Independent of a particular algorithm, we prove that for two scoring schemes A and B used in dynamic programming, the scoring scheme [Formula: see text] correctly performs Pareto optimization over the same search space. We study different implementations of the Pareto operator with respect to their asymptotic and empirical efficiency. Without artificial amalgamation of objectives, and with no heuristics involved, Pareto optimization is faster than computing the same number of answers separately for each objective. For RNA structure prediction under the minimum free energy versus the maximum expected accuracy model, we show that the empirical size of the Pareto front remains within reasonable bounds. Pareto optimization lends itself to the comparative investigation of the behavior of two alternative scoring schemes for the same purpose. For the above scoring schemes, we observe that the Pareto front can be seen as a composition of a few macrostates, each consisting of several microstates that differ in the same limited way. We also study the relationship between abstract shape analysis and the Pareto front, and find that they extract information of a different nature from the folding space and can be meaningfully combined.
Collapse
|
46
|
Chitsaz H, Aminisharifabad M. Exact Learning of RNA Energy Parameters From Structure. J Comput Biol 2015; 22:463-73. [DOI: 10.1089/cmb.2014.0164] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Affiliation(s)
- Hamidreza Chitsaz
- Department of Computer Science, Colorado State University, Fort Collins, Colorado
| | | |
Collapse
|
47
|
Yonemoto H, Asai K, Hamada M. A semi-supervised learning approach for RNA secondary structure prediction. Comput Biol Chem 2015; 57:72-9. [PMID: 25748534 DOI: 10.1016/j.compbiolchem.2015.02.002] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2015] [Accepted: 02/03/2015] [Indexed: 12/25/2022]
Abstract
RNA secondary structure prediction is a key technology in RNA bioinformatics. Most algorithms for RNA secondary structure prediction use probabilistic models, in which the model parameters are trained with reliable RNA secondary structures. Because of the difficulty of determining RNA secondary structures by experimental procedures, such as NMR or X-ray crystal structural analyses, there are still many RNA sequences that could be useful for training whose secondary structures have not been experimentally determined. In this paper, we introduce a novel semi-supervised learning approach for training parameters in a probabilistic model of RNA secondary structures in which we employ not only RNA sequences with annotated secondary structures but also ones with unknown secondary structures. Our model is based on a hybrid of generative (stochastic context-free grammars) and discriminative models (conditional random fields) that has been successfully applied to natural language processing. Computational experiments indicate that the accuracy of secondary structure prediction is improved by incorporating RNA sequences with unknown secondary structures into training. To our knowledge, this is the first study of a semi-supervised learning approach for RNA secondary structure prediction. This technique will be useful when the number of reliable structures is limited.
Collapse
Affiliation(s)
- Haruka Yonemoto
- Graduate School of Frontier Sciences, University of Tokyo, 5-1-5 Kashiwanoha, Kashiwa 277-8562, Japan
| | - Kiyoshi Asai
- Graduate School of Frontier Sciences, University of Tokyo, 5-1-5 Kashiwanoha, Kashiwa 277-8562, Japan; Computational Biology Research Center (CBRC), National Institute of Advanced Industrial Science and Technology (AIST), 2-4-7, Aomi, Koto-ku, Tokyo 135-0064, Japan
| | - Michiaki Hamada
- Department of Electrical Engineering and Bioscience, Faculty of Science and Engineering, Waseda University, 55N-06-10, 3-4-1, Okubo Shinjuku-ku, Tokyo 169-8555, Japan; Computational Biology Research Center (CBRC), National Institute of Advanced Industrial Science and Technology (AIST), 2-4-7, Aomi, Koto-ku, Tokyo 135-0064, Japan.
| |
Collapse
|
48
|
Abstract
It has been well accepted that the RNA secondary structures of most functional non-coding RNAs (ncRNAs) are closely related to their functions and are conserved during evolution. Hence, prediction of conserved secondary structures from evolutionarily related sequences is one important task in RNA bioinformatics; the methods are useful not only to further functional analyses of ncRNAs but also to improve the accuracy of secondary structure predictions and to find novel functional RNAs from the genome. In this review, I focus on common secondary structure prediction from a given aligned RNA sequence, in which one secondary structure whose length is equal to that of the input alignment is predicted. I systematically review and classify existing tools and algorithms for the problem, by utilizing the information employed in the tools and by adopting a unified viewpoint based on maximum expected gain (MEG) estimators. I believe that this classification will allow a deeper understanding of each tool and provide users with useful information for selecting tools for common secondary structure predictions.
Collapse
|
49
|
Venkatachalam B, Gusfield D, Frid Y. Faster algorithms for RNA-folding using the Four-Russians method. Algorithms Mol Biol 2014; 9:5. [PMID: 24602450 PMCID: PMC3996002 DOI: 10.1186/1748-7188-9-5] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2013] [Accepted: 02/18/2014] [Indexed: 01/16/2023] Open
Abstract
BACKGROUND The secondary structure that maximizes the number of non-crossing matchings between complimentary bases of an RNA sequence of length n can be computed in O(n3) time using Nussinov's dynamic programming algorithm. The Four-Russians method is a technique that reduces the running time for certain dynamic programming algorithms by a multiplicative factor after a preprocessing step where solutions to all smaller subproblems of a fixed size are exhaustively enumerated and solved. Frid and Gusfield designed an O(n3logn) algorithm for RNA folding using the Four-Russians technique. In their algorithm the preprocessing is interleaved with the algorithm computation. THEORETICAL RESULTS We simplify the algorithm and the analysis by doing the preprocessing once prior to the algorithm computation. We call this the two-vector method. We also show variants where instead of exhaustive preprocessing, we only solve the subproblems encountered in the main algorithm once and memoize the results. We give a simple proof of correctness and explore the practical advantages over the earlier method.The Nussinov algorithm admits an O(n2) time parallel algorithm. We show a parallel algorithm using the two-vector idea that improves the time bound to O(n2logn). PRACTICAL RESULTS We have implemented the parallel algorithm on graphics processing units using the CUDA platform. We discuss the organization of the data structures to exploit coalesced memory access for fast running times. The ideas to organize the data structures also help in improving the running time of the serial algorithms. For sequences of length up to 6000 bases the parallel algorithm takes only about 2.5 seconds and the two-vector serial method takes about 57 seconds on a desktop and 15 seconds on a server. Among the serial algorithms, the two-vector and memoized versions are faster than the Frid-Gusfield algorithm by a factor of 3, and are faster than Nussinov by up to a factor of 20. The source-code for the algorithms is available at http://github.com/ijalabv/FourRussiansRNAFolding.
Collapse
Affiliation(s)
- Balaji Venkatachalam
- Department of Computer Science, University of California, Davis, 1 Shields Ave, Davis, CA, USA
| | - Dan Gusfield
- Department of Computer Science, University of California, Davis, 1 Shields Ave, Davis, CA, USA
| | - Yelena Frid
- Department of Computer Science, University of California, Davis, 1 Shields Ave, Davis, CA, USA
| |
Collapse
|
50
|
Andronescu M, Condon A, Turner DH, Mathews DH. The determination of RNA folding nearest neighbor parameters. Methods Mol Biol 2014; 1097:45-70. [PMID: 24639154 DOI: 10.1007/978-1-62703-709-9_3] [Citation(s) in RCA: 40] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
The stability of RNA secondary structure can be predicted using a set of nearest neighbor parameters. These parameters are widely used by algorithms that predict secondary structure. This contribution introduces the UV optical melting experiments that are used to determine the folding stability of short RNA strands. It explains how the nearest neighbor parameters are chosen and how the values are fit to the data. A sample nearest neighbor calculation is provided. The contribution concludes with new methods that use the database of sequences with known structures to determine parameter values.
Collapse
Affiliation(s)
- Mirela Andronescu
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | | | | | | |
Collapse
|