1
|
Malik A, Zhang L, Gautam M, Dai N, Li S, Zhang H, Mathews DH, Huang L. LinearAlifold: Linear-time consensus structure prediction for RNA alignments. J Mol Biol 2024; 436:168694. [PMID: 38971557 PMCID: PMC11377157 DOI: 10.1016/j.jmb.2024.168694] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2024] [Revised: 06/28/2024] [Accepted: 07/01/2024] [Indexed: 07/08/2024]
Abstract
Predicting the consensus structure of a set of aligned RNA homologs is a convenient method to find conserved structures in an RNA genome, which has many applications including viral diagnostics and therapeutics. However, the most commonly used tool for this task, RNAalifold, is prohibitively slow for long sequences, due to a cubic scaling with the sequence length, taking over a day on 400 SARS-CoV-2 and SARS-related genomes (∼30,000nt). We present LinearAlifold, a much faster alternative that scales linearly with both the sequence length and the number of sequences, based on our work LinearFold that folds a single RNA in linear time. Our work is orders of magnitude faster than RNAalifold (0.7 h on the above 400 genomes, or ∼36× speedup) and achieves higher accuracies when compared to a database of known structures. More interestingly, LinearAlifold's prediction on SARS-CoV-2 correlates well with experimentally determined structures, substantially outperforming RNAalifold. Finally, LinearAlifold supports two energy models (Vienna and BL*) and four modes: minimum free energy (MFE), maximum expected accuracy (MEA), ThreshKnot, and stochastic sampling, each of which takes under an hour for hundreds of SARS-CoV variants. Our resource is at: https://github.com/LinearFold/LinearAlifold (code) and http://linearfold.org/linear-alifold (server).
Collapse
Affiliation(s)
- Apoorv Malik
- School of EECS, Oregon State University, Corvallis, OR 97330, USA
| | - Liang Zhang
- School of EECS, Oregon State University, Corvallis, OR 97330, USA
| | - Milan Gautam
- School of EECS, Oregon State University, Corvallis, OR 97330, USA
| | - Ning Dai
- School of EECS, Oregon State University, Corvallis, OR 97330, USA
| | - Sizhen Li
- School of EECS, Oregon State University, Corvallis, OR 97330, USA
| | - He Zhang
- School of EECS, Oregon State University, Corvallis, OR 97330, USA
| | - David H Mathews
- Dept. of Biochemistry & Biophysics, University of Rochester Medical Center, Rochester, NY 14642, USA; Center for RNA Biology, University of Rochester Medical Center, Rochester, NY 14642, USA; Dept. of Biostatistics & Computational Biology, University of Rochester Medical Center, Rochester, NY 14642, USA
| | - Liang Huang
- School of EECS, Oregon State University, Corvallis, OR 97330, USA; Dept. of Biochemistry & Biophysics, Oregon State University, Corvallis, OR 97330, USA.
| |
Collapse
|
2
|
Bao N, Wang Z, Fu J, Dong H, Jin Y. RNA structure in alternative splicing regulation: from mechanism to therapy. Acta Biochim Biophys Sin (Shanghai) 2024. [PMID: 39034824 DOI: 10.3724/abbs.2024119] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/23/2024] Open
Abstract
Alternative splicing is a highly intricate process that plays a crucial role in post-transcriptional regulation and significantly expands the functional proteome of a limited number of coding genes in eukaryotes. Its regulation is multifactorial, with RNA structure exerting a significant impact. Aberrant RNA conformations lead to dysregulation of splicing patterns, which directly affects the manifestation of disease symptoms. In this review, the molecular mechanisms of RNA secondary structure-mediated splicing regulation are summarized, with a focus on the complex interplay between aberrant RNA conformations and disease phenotypes resulted from splicing defects. This study also explores additional factors that reshape structural conformations, enriching our understanding of the mechanistic network underlying structure-mediated splicing regulation. In addition, an emphasis has been placed on the clinical role of targeting aberrant splicing corrections in human diseases. The principal mechanisms of action behind this phenomenon are described, followed by a discussion of prospective development strategies and pertinent challenges.
Collapse
|
3
|
von Löhneysen S, Spicher T, Varenyk Y, Yao HT, Lorenz R, Hofacker I, Stadler PF. Phylogenetic and Chemical Probing Information as Soft Constraints in RNA Secondary Structure Prediction. J Comput Biol 2024; 31:549-563. [PMID: 38935442 DOI: 10.1089/cmb.2024.0519] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/29/2024] Open
Abstract
Extrinsic, experimental information can be incorporated into thermodynamics-based RNA folding algorithms in the form of pseudo-energies. Evolutionary conservation of RNA secondary structure elements is detectable in alignments of phylogenetically related sequences and provides evidence for the presence of certain base pairs that can also be converted into pseudo-energy contributions. We show that the centroid base pairs computed from a consensus folding model such as RNAalifold result in a substantial improvement of the prediction accuracy for single sequences. Evidence for specific base pairs turns out to be more informative than a position-wise profile for the conservation of the pairing status. A comparison with chemical probing data, furthermore, strongly suggests that phylogenetic base pairing data are more informative than position-specific data on (un)pairedness as obtained from chemical probing experiments. In this context we demonstrate, in addition, that the conversion of signal from probing data into pseudo-energies is possible using thermodynamic structure predictions as a reference instead of known RNA structures.
Collapse
Affiliation(s)
- Sarah von Löhneysen
- Bioinformatics Group, Department of Computer Science, and Interdisciplinary Center for Bioinformatics, Universität Leipzig, Leipzig, Germany
| | - Thomas Spicher
- Institute for Theoretical Chemistry, University of Vienna, Vienna, Austria
- UniVie Doctoral School Computer Science (DoCS), University of Vienna, Vienna, Austria
| | - Yuliia Varenyk
- Institute for Theoretical Chemistry, University of Vienna, Vienna, Austria
- Vienna BioCenter PhD Program, Doctoral School of the University of Vienna and Medical, University of Vienna, Vienna, Austria
| | - Hua-Ting Yao
- Institute for Theoretical Chemistry, University of Vienna, Vienna, Austria
| | - Ronny Lorenz
- Institute for Theoretical Chemistry, University of Vienna, Vienna, Austria
| | - Ivo Hofacker
- Institute for Theoretical Chemistry, University of Vienna, Vienna, Austria
| | - Peter F Stadler
- Bioinformatics Group, Department of Computer Science, and Interdisciplinary Center for Bioinformatics, Universität Leipzig, Leipzig, Germany
- Institute for Theoretical Chemistry, University of Vienna, Vienna, Austria
- Max Planck Institute for Mathematics in the Sciences, Leipzig, Germany
- Facultad de Ciencias, Universidad Nacional de Colombia, Bogotá, Colombia
- Santa Fe Institute, Santa Fe, New Mexico, USA
| |
Collapse
|
4
|
Wang K, Yin Z, Sang C, Xia W, Wang Y, Sun T, Xu X. Geometric deep learning for the prediction of magnesium-binding sites in RNA structures. Int J Biol Macromol 2024; 262:130150. [PMID: 38365157 DOI: 10.1016/j.ijbiomac.2024.130150] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2023] [Revised: 01/24/2024] [Accepted: 02/11/2024] [Indexed: 02/18/2024]
Abstract
Magnesium ions (Mg2+) are essential for the folding, functional expression, and structural stability of RNA molecules. However, predicting Mg2+-binding sites in RNA molecules based solely on RNA structures is still challenging. The molecular surface, characterized by a continuous shape with geometric and chemical properties, is important for RNA modelling and carries essential information for understanding the interactions between RNAs and Mg2+ ions. Here, we propose an approach named RNA-magnesium ion surface interaction fingerprinting (RMSIF), a geometric deep learning-based conceptual framework to predict magnesium ion binding sites in RNA structures. To evaluate the performance of RMSIF, we systematically enumerated decoy Mg2+ ions across a full-space grid within the range of 2 to 10 Å from the RNA molecule and made predictions accordingly. Visualization techniques were used to validate the prediction results and calculate success rates. Comparative assessments against state-of-the-art methods like MetalionRNA, MgNet, and Metal3DRNA revealed that RMSIF achieved superior success rates and accuracy in predicting Mg2+-binding sites. Additionally, in terms of the spatial distribution of Mg2+ ions within the RNA structures, a majority were situated in the deep grooves, while a minority occupied the shallow grooves. Collectively, the conceptual framework developed in this study holds promise for advancing insights into drug design, RNA co-transcriptional folding, and structure prediction.
Collapse
Affiliation(s)
- Kang Wang
- Department of Physics, Zhejiang University of Science and Technology, Hangzhou 310008, China
| | - Zuode Yin
- Institute of Bioinformatics and Medical Engineering, Jiangsu University of Technology, Changzhou 213001, China
| | - Chunjiang Sang
- Department of Physics, Zhejiang University of Science and Technology, Hangzhou 310008, China
| | - Wentao Xia
- Department of Physics, Zhejiang University of Science and Technology, Hangzhou 310008, China
| | - Yan Wang
- Department of Physics, Zhejiang University of Science and Technology, Hangzhou 310008, China
| | - Tingting Sun
- Department of Physics, Zhejiang University of Science and Technology, Hangzhou 310008, China.
| | - Xiaojun Xu
- Institute of Bioinformatics and Medical Engineering, Jiangsu University of Technology, Changzhou 213001, China.
| |
Collapse
|
5
|
Lazzeri G, Micheletti C, Pasquali S, Faccioli P. RNA folding pathways from all-atom simulations with a variationally improved history-dependent bias. Biophys J 2023; 122:3089-3098. [PMID: 37355771 PMCID: PMC10432211 DOI: 10.1016/j.bpj.2023.06.012] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2023] [Revised: 05/03/2023] [Accepted: 06/15/2023] [Indexed: 06/26/2023] Open
Abstract
Atomically detailed simulations of RNA folding have proven very challenging in view of the difficulties of developing realistic force fields and the intrinsic computational complexity of sampling rare conformational transitions. As a step forward in tackling these issues, we extend to RNA an enhanced path-sampling method previously successfully applied to proteins. In this scheme, the information about the RNA's native structure is harnessed by a soft history-dependent biasing force promoting the generation of productive folding trajectories in an all-atom force field with explicit solvent. A rigorous variational principle is then applied to minimize the effect of the bias. Here, we report on an application of this method to RNA molecules from 20 to 47 nucleotides long and increasing topological complexity. By comparison with analog simulations performed on small proteins with similar size and architecture, we show that the RNA folding landscape is significantly more frustrated, even for relatively small chains with a simple topology. The predicted RNA folding mechanisms are found to be consistent with the available experiments and some of the existing coarse-grained models. Due to its computational performance, this scheme provides a promising platform to efficiently gather atomistic RNA folding trajectories, thus retain the information about the chemical composition of the sequence.
Collapse
Affiliation(s)
- Gianmarco Lazzeri
- Frankfurt Institute for Advanced Studies, Frankfurt am Main, Germany; Physics Department of Trento University, Povo (Trento), Italy
| | | | - Samuela Pasquali
- Laboratoire Cibles Thérapeutiques et Conception de Médicaments, Université Paris Cité, Paris, France; Laboratoire Biologie Fonctionnelle et Adaptative, Université Paris Cité, Paris, France.
| | - Pietro Faccioli
- Physics Department of Trento University, Povo (Trento), Italy; INFN-TIFPA, Povo (Trento), Italy.
| |
Collapse
|
6
|
Zhou L, Feng T, Xu S, Gao F, Lam TT, Wang Q, Wu T, Huang H, Zhan L, Li L, Guan Y, Dai Z, Yu G. ggmsa: a visual exploration tool for multiple sequence alignment and associated data. Brief Bioinform 2022; 23:6603927. [PMID: 35671504 DOI: 10.1093/bib/bbac222] [Citation(s) in RCA: 75] [Impact Index Per Article: 25.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2022] [Revised: 05/07/2022] [Accepted: 05/11/2022] [Indexed: 12/25/2022] Open
Abstract
The identification of the conserved and variable regions in the multiple sequence alignment (MSA) is critical to accelerating the process of understanding the function of genes. MSA visualizations allow us to transform sequence features into understandable visual representations. As the sequence-structure-function relationship gains increasing attention in molecular biology studies, the simple display of nucleotide or protein sequence alignment is not satisfied. A more scalable visualization is required to broaden the scope of sequence investigation. Here we present ggmsa, an R package for mining comprehensive sequence features and integrating the associated data of MSA by a variety of display methods. To uncover sequence conservation patterns, variations and recombination at the site level, sequence bundles, sequence logos, stacked sequence alignment and comparative plots are implemented. ggmsa supports integrating the correlation of MSA sequences and their phenotypes, as well as other traits such as ancestral sequences, molecular structures, molecular functions and expression levels. We also design a new visualization method for genome alignments in multiple alignment format to explore the pattern of within and between species variation. Combining these visual representations with prime knowledge, ggmsa assists researchers in discovering MSA and making decisions. The ggmsa package is open-source software released under the Artistic-2.0 license, and it is freely available on Bioconductor (https://bioconductor.org/packages/ggmsa) and Github (https://github.com/YuLab-SMU/ggmsa).
Collapse
Affiliation(s)
- Lang Zhou
- Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, Guangzhou, China.,Division of Laboratory Medicine, Microbiome Medicine Center, Zhujiang Hospital, Southern Medical University, Guangzhou, China
| | - Tingze Feng
- Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, Guangzhou, China
| | - Shuangbin Xu
- Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, Guangzhou, China
| | - Fangluan Gao
- Institute of Plant Virology, Fujian Agriculture and Forestry University, Fuzhou, China
| | - Tommy T Lam
- State Key Laboratory of Emerging Infectious Diseases, School of Public Health, The University of Hong Kong, Hong Kong SAR, China.,Laboratory of Data Discovery for Health Limited, 19W Hong Kong Science & Technology Parks, Hong Kong SAR, China
| | - Qianwen Wang
- Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, Guangzhou, China.,Centre for Soybean Research of the State Key Laboratory of Agrobiotechnology and School of Life Sciences, The Chinese University of Hong Kong, Shatin, Hong Kong SAR, China
| | - Tianzhi Wu
- Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, Guangzhou, China
| | - Huina Huang
- Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, Guangzhou, China.,Zhuhai International Travel Healthcare Center, Zhuhai, Guangdong, China
| | - Li Zhan
- Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, Guangzhou, China
| | - Lin Li
- Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, Guangzhou, China
| | - Yi Guan
- State Key Laboratory of Emerging Infectious Diseases, School of Public Health, The University of Hong Kong, Hong Kong SAR, China.,Joint Institute of Virology (Shantou University - The University of Hong Kong), Shantou University, Shantou, China
| | - Zehan Dai
- Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, Guangzhou, China
| | - Guangchuang Yu
- Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, Guangzhou, China.,Division of Laboratory Medicine, Microbiome Medicine Center, Zhujiang Hospital, Southern Medical University, Guangzhou, China
| |
Collapse
|
7
|
Martin NS, Ahnert SE. Fast free-energy-based neutral set size estimates for the RNA genotype-phenotype map. J R Soc Interface 2022; 19:20220072. [PMID: 35702868 PMCID: PMC9198509 DOI: 10.1098/rsif.2022.0072] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2022] [Accepted: 05/23/2022] [Indexed: 12/30/2022] Open
Abstract
The genotype-phenotype (GP) map of RNA secondary structure links each RNA sequence to its corresponding secondary structure. Previous research has shown that the large-scale structural properties of GP maps, such as the size of neutral sets in genotype space, can influence evolutionary outcomes. In order to use neutral set sizes, efficient and accurate computational methods are needed to compute them. Here, we propose a new method, which is based on free energy estimates and is much faster than existing sample-based methods. Moreover, this approach can give insight into the reasons behind neutral set size variations, for example, why structures with fewer stacks tend to have larger neutral set sizes. In addition, we generalize neutral set size calculations from the previously studied many-to-one framework, where each sequence folds into a single energetically preferred structure, to a fuller many-to-many framework, where several low-energy structures are included. We find that structures with high neutral sets in one framework also tend to have large neutral sets in the other framework for a range of parameters and thus the choice of GP map does not fundamentally affect which structures have the largest neutral set sizes.
Collapse
Affiliation(s)
- Nora S. Martin
- Theory of Condensed Matter Group, Cavendish Laboratory, University of Cambridge, JJ Thomson Avenue, Cambridge CB3 0HE, UK
- Sainsbury Laboratory, University of Cambridge, Bateman Street, Cambridge CB2 1LR, UK
- Rudolf Peierls Centre for Theoretical Physics, University of Oxford, Parks Road, Oxford OX1 3PU, UK
| | - Sebastian E. Ahnert
- Department of Chemical Engineering and Biotechnology, University of Cambridge, Philippa Fawcett Drive, Cambridge CB3 0AS, UK
- The Alan Turing Institute, British Library, Euston Road, London NW1 2DB, UK
| |
Collapse
|
8
|
Andrikos C, Makris E, Kolaitis A, Rassias G, Pavlatos C, Tsanakas P. Knotify: An Efficient Parallel Platform for RNA Pseudoknot Prediction Using Syntactic Pattern Recognition. Methods Protoc 2022; 5:mps5010014. [PMID: 35200530 PMCID: PMC8876629 DOI: 10.3390/mps5010014] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2022] [Revised: 01/27/2022] [Accepted: 01/30/2022] [Indexed: 11/16/2022] Open
Abstract
Obtaining valuable clues for noncoding RNA (ribonucleic acid) subsequences remains a significant challenge, acknowledging that most of the human genome transcribes into noncoding RNA parts related to unknown biological operations. Capturing these clues relies on accurate “base pairing” prediction, also known as “RNA secondary structure prediction”. As COVID-19 is considered a severe global threat, the single-stranded SARS-CoV-2 virus reveals the importance of establishing an efficient RNA analysis toolkit. This work aimed to contribute to that by introducing a novel system committed to predicting RNA secondary structure patterns (i.e., RNA’s pseudoknots) that leverage syntactic pattern-recognition strategies. Having focused on the pseudoknot predictions, we formalized the secondary structure prediction of the RNA to be primarily a parsing and, secondly, an optimization problem. The proposed methodology addresses the problem of predicting pseudoknots of the first order (H-type). We introduce a context-free grammar (CFG) that affords enough expression power to recognize potential pseudoknot pattern. In addition, an alternative methodology of detecting possible pseudoknots is also implemented as well, using a brute-force algorithm. Any input sequence may highlight multiple potential folding patterns requiring a strict methodology to determine the single biologically realistic one. We conscripted a novel heuristic over the widely accepted notion of free-energy minimization to tackle such ambiguity in a performant way by utilizing each pattern’s context to unveil the most prominent pseudoknot pattern. The overall process features polynomial-time complexity, while its parallel implementation enhances the end performance, as proportional to the deployed hardware. The proposed methodology does succeed in predicting the core stems of any RNA pseudoknot of the test dataset by performing a 76.4% recall ratio. The methodology achieved a F1-score equal to 0.774 and MCC equal 0.543 in discovering all the stems of an RNA sequence, outperforming the particular task. Measurements were taken using a dataset of 262 RNA sequences establishing a performance speed of 1.31, 3.45, and 7.76 compared to three well-known platforms. The implementation source code is publicly available under knotify github repo.
Collapse
Affiliation(s)
- Christos Andrikos
- School of Electrical and Computer Engineering, National Technical University of Athens, 9 Iroon Polytechniou St., 15780 Athens, Greece; (C.A.); (E.M.); (A.K.); (G.R.); (P.T.)
| | - Evangelos Makris
- School of Electrical and Computer Engineering, National Technical University of Athens, 9 Iroon Polytechniou St., 15780 Athens, Greece; (C.A.); (E.M.); (A.K.); (G.R.); (P.T.)
| | - Angelos Kolaitis
- School of Electrical and Computer Engineering, National Technical University of Athens, 9 Iroon Polytechniou St., 15780 Athens, Greece; (C.A.); (E.M.); (A.K.); (G.R.); (P.T.)
| | - Georgios Rassias
- School of Electrical and Computer Engineering, National Technical University of Athens, 9 Iroon Polytechniou St., 15780 Athens, Greece; (C.A.); (E.M.); (A.K.); (G.R.); (P.T.)
| | - Christos Pavlatos
- Hellenic Air Force Academy, Dekelia Air Base, Acharnes, 13671 Athens, Greece
- Correspondence: ; Tel.: +30-210-7722541
| | - Panayiotis Tsanakas
- School of Electrical and Computer Engineering, National Technical University of Athens, 9 Iroon Polytechniou St., 15780 Athens, Greece; (C.A.); (E.M.); (A.K.); (G.R.); (P.T.)
| |
Collapse
|
9
|
Li S, Zhang H, Zhang L, Liu K, Liu B, Mathews DH, Huang L. LinearTurboFold: Linear-time global prediction of conserved structures for RNA homologs with applications to SARS-CoV-2. Proc Natl Acad Sci U S A 2021; 118:e2116269118. [PMID: 34887342 PMCID: PMC8719904 DOI: 10.1073/pnas.2116269118] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/05/2021] [Indexed: 12/26/2022] Open
Abstract
The constant emergence of COVID-19 variants reduces the effectiveness of existing vaccines and test kits. Therefore, it is critical to identify conserved structures in severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) genomes as potential targets for variant-proof diagnostics and therapeutics. However, the algorithms to predict these conserved structures, which simultaneously fold and align multiple RNA homologs, scale at best cubically with sequence length and are thus infeasible for coronaviruses, which possess the longest genomes (∼30,000 nt) among RNA viruses. As a result, existing efforts on modeling SARS-CoV-2 structures resort to single-sequence folding as well as local folding methods with short window sizes, which inevitably neglect long-range interactions that are crucial in RNA functions. Here we present LinearTurboFold, an efficient algorithm for folding RNA homologs that scales linearly with sequence length, enabling unprecedented global structural analysis on SARS-CoV-2. Surprisingly, on a group of SARS-CoV-2 and SARS-related genomes, LinearTurboFold's purely in silico prediction not only is close to experimentally guided models for local structures, but also goes far beyond them by capturing the end-to-end pairs between 5' and 3' untranslated regions (UTRs) (∼29,800 nt apart) that match perfectly with a purely experimental work. Furthermore, LinearTurboFold identifies undiscovered conserved structures and conserved accessible regions as potential targets for designing efficient and mutation-insensitive small-molecule drugs, antisense oligonucleotides, small interfering RNAs (siRNAs), CRISPR-Cas13 guide RNAs, and RT-PCR primers. LinearTurboFold is a general technique that can also be applied to other RNA viruses and full-length genome studies and will be a useful tool in fighting the current and future pandemics.
Collapse
Affiliation(s)
- Sizhen Li
- School of Electrical Engineering & Computer Science, Oregon State University, Corvallis, OR 97331
| | - He Zhang
- Baidu Research, Sunnyvale, CA 94089
- School of Electrical Engineering & Computer Science, Oregon State University, Corvallis, OR 97331
| | - Liang Zhang
- School of Electrical Engineering & Computer Science, Oregon State University, Corvallis, OR 97331
- Baidu Research, Sunnyvale, CA 94089
| | - Kaibo Liu
- Baidu Research, Sunnyvale, CA 94089
- School of Electrical Engineering & Computer Science, Oregon State University, Corvallis, OR 97331
| | | | - David H Mathews
- Department of Biochemistry & Biophysics, University of Rochester Medical Center, Rochester, NY 14642;
- Center for RNA Biology, University of Rochester Medical Center, Rochester, NY 14642
- Department of Biostatistics & Computational Biology, University of Rochester Medical Center, Rochester, NY 14642
| | - Liang Huang
- School of Electrical Engineering & Computer Science, Oregon State University, Corvallis, OR 97331;
- Baidu Research, Sunnyvale, CA 94089
| |
Collapse
|
10
|
Selective packaging of HIV-1 RNA genome is guided by the stability of 5' untranslated region polyA stem. Proc Natl Acad Sci U S A 2021; 118:2114494118. [PMID: 34873042 DOI: 10.1073/pnas.2114494118] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/27/2021] [Indexed: 01/08/2023] Open
Abstract
To generate infectious virus, HIV-1 must package two copies of its full-length RNA into particles. HIV-1 transcription initiates from multiple, neighboring sites, generating RNA species that only differ by a few nucleotides at the 5' end, including those with one (1G) or three (3G) 5' guanosines. Strikingly, 1G RNA is preferentially packaged into virions over 3G RNA. We investigated how HIV-1 distinguishes between these nearly identical RNAs using in-gel chemical probing combined with recently developed computational tools for determining RNA conformational ensembles, as well as cell-based assays to quantify the efficiency of RNA packaging into viral particles. We found that 1G and 3G RNAs fold into distinct structural ensembles. The 1G RNA, but not the 3G RNA, primarily adopts conformations with an intact polyA stem, exposed dimerization initiation site, and multiple, unpaired guanosines known to mediate Gag binding. Furthermore, we identified mutants that exhibited altered genome selectivity and packaged 3G RNA efficiently. In these mutants, both 1G and 3G RNAs fold into similar conformational ensembles, such that they can no longer be distinguished. Our findings demonstrate that polyA stem stability guides RNA-packaging selectivity. These studies also uncover the mechanism by which HIV-1 selects its genome for packaging: 1G RNA is preferentially packaged because it exposes structural elements that promote RNA dimerization and Gag binding.
Collapse
|
11
|
Li S, Zhang H, Zhang L, Liu K, Liu B, Mathews DH, Huang L. LinearTurboFold: Linear-Time Global Prediction of Conserved Structures for RNA Homologs with Applications to SARS-CoV-2. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2021:2020.11.23.393488. [PMID: 34816262 PMCID: PMC8609897 DOI: 10.1101/2020.11.23.393488] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
The constant emergence of COVID-19 variants reduces the effectiveness of existing vaccines and test kits. Therefore, it is critical to identify conserved structures in SARS-CoV-2 genomes as potential targets for variant-proof diagnostics and therapeutics. However, the algorithms to predict these conserved structures, which simultaneously fold and align multiple RNA homologs, scale at best cubically with sequence length, and are thus infeasible for coronaviruses, which possess the longest genomes (∼30,000 nt ) among RNA viruses. As a result, existing efforts on modeling SARS-CoV-2 structures resort to single sequence folding as well as local folding methods with short window sizes, which inevitably neglect long-range interactions that are crucial in RNA functions. Here we present LinearTurboFold, an efficient algorithm for folding RNA homologs that scales linearly with sequence length, enabling unprecedented global structural analysis on SARS-CoV-2. Surprisingly, on a group of SARS-CoV-2 and SARS-related genomes, LinearTurbo-Fold's purely in silico prediction not only is close to experimentally-guided models for local structures, but also goes far beyond them by capturing the end-to-end pairs between 5' and 3' UTRs (∼29,800 nt apart) that match perfectly with a purely experimental work. Furthermore, LinearTurboFold identifies novel conserved structures and conserved accessible regions as potential targets for designing efficient and mutation-insensitive small-molecule drugs, antisense oligonucleotides, siRNAs, CRISPR-Cas13 guide RNAs and RT-PCR primers. LinearTurboFold is a general technique that can also be applied to other RNA viruses and full-length genome studies, and will be a useful tool in fighting the current and future pandemics. SIGNIFICANCE STATEMENT Conserved RNA structures are critical for designing diagnostic and therapeutic tools for many diseases including COVID-19. However, existing algorithms are much too slow to model the global structures of full-length RNA viral genomes. We present LinearTurboFold, a linear-time algorithm that is orders of magnitude faster, making it the first method to simultaneously fold and align whole genomes of SARS-CoV-2 variants, the longest known RNA virus (∼30 kilobases). Our work enables unprecedented global structural analysis and captures long-range interactions that are out of reach for existing algorithms but crucial for RNA functions. LinearTurboFold is a general technique for full-length genome studies and can help fight the current and future pandemics.
Collapse
Affiliation(s)
- Sizhen Li
- School of Electrical Engineering & Computer Science, Oregon State University, Corvallis, OR
| | - He Zhang
- School of Electrical Engineering & Computer Science, Oregon State University, Corvallis, OR
- Baidu Research, Sunnyvale, CA
| | - Liang Zhang
- School of Electrical Engineering & Computer Science, Oregon State University, Corvallis, OR
- Baidu Research, Sunnyvale, CA
| | - Kaibo Liu
- School of Electrical Engineering & Computer Science, Oregon State University, Corvallis, OR
- Baidu Research, Sunnyvale, CA
| | | | - David H. Mathews
- Department of Biochemistry & Biophysics, Center for RNA Biology, and Department of Biostatistics & Computational Biology, University of Rochester Medical Center, Rochester, NY
| | - Liang Huang
- School of Electrical Engineering & Computer Science, Oregon State University, Corvallis, OR
- Baidu Research, Sunnyvale, CA
| |
Collapse
|
12
|
Mazzanti L, Alferkh L, Frezza E, Pasquali S. Biasing RNA Coarse-Grained Folding Simulations with Small-Angle X-ray Scattering Data. J Chem Theory Comput 2021; 17:6509-6521. [PMID: 34506136 DOI: 10.1021/acs.jctc.1c00441] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
RNA molecules can easily adopt alternative structures in response to different environmental conditions. As a result, a molecule's energy landscape is rough and can exhibit a multitude of deep basins. In the absence of a high-resolution structure, small-angle X-ray scattering data (SAXS) can narrow down the conformational space available to the molecule and be used in conjunction with physical modeling to obtain high-resolution putative structures to be further tested by experiments. Because of the low resolution of these data, it is natural to implement the integration of SAXS data into simulations using a coarse-grained representation of the molecule, allowing for much wider searches and faster evaluation of SAXS theoretical intensity curves than with atomistic models. We present here the theoretical framework and the implementation of a simulation approach based on our coarse-grained model HiRE-RNA combined with SAXS evaluations "on-the-fly" leading the simulation toward conformations agreeing with the scattering data, starting from partially folded structures as the ones that can easily be obtained from secondary structure prediction-based tools. We show on three benchmark systems how our approach can successfully achieve high-resolution structures with remarkable similarity with the native structure recovering not only the overall shape, as imposed by SAXS data, but also the details of initially missing base pairs.
Collapse
Affiliation(s)
- Liuba Mazzanti
- Laboratoire CiTCoM, CNRS UMR 8038, Université de Paris, 4 Avenue de l'observatoire, 75006 Paris, France
| | - Lina Alferkh
- Laboratoire CiTCoM, CNRS UMR 8038, Université de Paris, 4 Avenue de l'observatoire, 75006 Paris, France
| | - Elisa Frezza
- Laboratoire CiTCoM, CNRS UMR 8038, Université de Paris, 4 Avenue de l'observatoire, 75006 Paris, France
| | - Samuela Pasquali
- Laboratoire CiTCoM, CNRS UMR 8038, Université de Paris, 4 Avenue de l'observatoire, 75006 Paris, France
| |
Collapse
|
13
|
Rivas E. Evolutionary conservation of RNA sequence and structure. WILEY INTERDISCIPLINARY REVIEWS-RNA 2021; 12:e1649. [PMID: 33754485 PMCID: PMC8250186 DOI: 10.1002/wrna.1649] [Citation(s) in RCA: 22] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/17/2020] [Revised: 02/24/2021] [Accepted: 02/25/2021] [Indexed: 12/22/2022]
Abstract
An RNA structure prediction from a single‐sequence RNA folding program is not evidence for an RNA whose structure is important for function. Random sequences have plausible and complex predicted structures not easily distinguishable from those of structural RNAs. How to tell when an RNA has a conserved structure is a question that requires looking at the evolutionary signature left by the conserved RNA. This question is important not just for long noncoding RNAs which usually lack an identified function, but also for RNA binding protein motifs which can be single stranded RNAs or structures. Here we review recent advances using sequence and structural analysis to determine when RNA structure is conserved or not. Although covariation measures assess structural RNA conservation, one must distinguish covariation due to RNA structure from covariation due to independent phylogenetic substitutions. We review a statistical test to measure false positives expected under the null hypothesis of phylogenetic covariation alone (specificity). We also review a complementary test that measures power, that is, expected covariation derived from sequence variation alone (sensitivity). Power in the absence of covariation signals the absence of a conserved RNA structure. We analyze artifacts that falsely identify conserved RNA structure such as the misuse of programs that do not assess significance, the use of inappropriate statistics confounded by signals other than covariation, or misalignments that induce spurious covariation. Among artifacts that obscure the signal of a conserved RNA structure, we discuss the inclusion of pseudogenes in alignments which increase power but destroy covariation. This article is categorized under:RNA Structure and Dynamics > RNA Structure, Dynamics and Chemistry RNA Evolution and Genomics > Computational Analyses of RNA RNA Evolution and Genomics > RNA and Ribonucleoprotein Evolution
Collapse
Affiliation(s)
- Elena Rivas
- Department of Molecular and Cellular Biology, Harvard University, Cambridge, Massachusetts, USA
| |
Collapse
|
14
|
Variation Profile of the Orthotospovirus Genome. Pathogens 2020; 9:pathogens9070521. [PMID: 32610472 PMCID: PMC7400459 DOI: 10.3390/pathogens9070521] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2020] [Revised: 06/26/2020] [Accepted: 06/26/2020] [Indexed: 12/13/2022] Open
Abstract
Orthotospoviruses are plant-infecting members of the family Tospoviridae (order Bunyavirales), have a broad host range and are vectored by polyphagous thrips in a circulative-propagative manner. Because diverse hosts and vectors impose heterogeneous selection constraints on viral genomes, the evolutionary arms races between hosts and their pathogens might be manifested as selection for rapid changes in key genes. These observations suggest that orthotospoviruses contain key genetic components that rapidly mutate to mediate host adaptation and vector transmission. Using complete genome sequences, we profiled genomic variation in orthotospoviruses. Results show that the three genomic segments contain hypervariable areas at homologous locations across species. Remarkably, the highest nucleotide variation mapped to the intergenic region of RNA segments S and M, which fold into a hairpin. Secondary structure analyses showed that the hairpin is a dynamic structure with multiple functional shapes formed by stems and loops, contains sites under positive selection and covariable sites. Accumulation and tolerance of mutations in the intergenic region is a general feature of orthotospoviruses and might mediate adaptation to host plants and insect vectors.
Collapse
|
15
|
Ganser LR, Kelly ML, Herschlag D, Al-Hashimi HM. The roles of structural dynamics in the cellular functions of RNAs. Nat Rev Mol Cell Biol 2020; 20:474-489. [PMID: 31182864 DOI: 10.1038/s41580-019-0136-0] [Citation(s) in RCA: 295] [Impact Index Per Article: 59.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Abstract
RNAs fold into 3D structures that range from simple helical elements to complex tertiary structures and quaternary ribonucleoprotein assemblies. The functions of many regulatory RNAs depend on how their 3D structure changes in response to a diverse array of cellular conditions. In this Review, we examine how the structural characterization of RNA as dynamic ensembles of conformations, which form with different probabilities and at different timescales, is improving our understanding of RNA function in cells. We discuss the mechanisms of gene regulation by microRNAs, riboswitches, ribozymes, post-transcriptional RNA modifications and RNA-binding proteins, and how the cellular environment and processes such as liquid-liquid phase separation may affect RNA folding and activity. The emerging RNA-ensemble-function paradigm is changing our perspective and understanding of RNA regulation, from in vitro to in vivo and from descriptive to predictive.
Collapse
Affiliation(s)
- Laura R Ganser
- Department of Biochemistry, Duke University School of Medicine, Durham, NC, USA
| | - Megan L Kelly
- Department of Biochemistry, Duke University School of Medicine, Durham, NC, USA
| | - Daniel Herschlag
- Department of Biochemistry, Stanford ChEM-H Chemistry, Engineering, and Medicine for Human Health, Stanford University, Stanford, CA, USA.,Department of Chemical Engineering, Stanford ChEM-H Chemistry, Engineering, and Medicine for Human Health, Stanford University, Stanford, CA, USA.,Department of Chemistry, Stanford ChEM-H Chemistry, Engineering, and Medicine for Human Health, Stanford University, Stanford, CA, USA
| | - Hashim M Al-Hashimi
- Department of Biochemistry, Duke University School of Medicine, Durham, NC, USA. .,Department of Chemistry, Duke University, Durham, NC, USA.
| |
Collapse
|
16
|
Abstract
RNA molecules fold into complex three-dimensional structures that sample alternate conformations ranging from minor differences in tertiary structure dynamics to major differences in secondary structure. This allows them to form entirely different substructures with each population potentially giving rise to a distinct biological outcome. The substructures can be partitioned along an existing energy landscape given a particular static cellular cue or can be shifted in response to dynamic cues such as ligand binding. We review a few key examples of RNA molecules that sample alternate conformations and how these are capitalized on for control of critical regulatory functions.
Collapse
Affiliation(s)
- Marie Teng-Pei Wu
- Department of Molecular and Cellular Biology, Harvard University, Cambridge, Massachusetts 02138
| | - Victoria D'Souza
- Department of Molecular and Cellular Biology, Harvard University, Cambridge, Massachusetts 02138
| |
Collapse
|
17
|
Wang F, Sun LZ, Sun T, Chang S, Xu X. Helix-Based RNA Landscape Partition and Alternative Secondary Structure Determination. ACS OMEGA 2019; 4:15407-15413. [PMID: 31572840 PMCID: PMC6761681 DOI: 10.1021/acsomega.9b01430] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 05/16/2019] [Accepted: 09/03/2019] [Indexed: 06/10/2023]
Abstract
RNA is a versatile macromolecule with the ability to fold into and interconvert between multiple functional conformations. The elucidation of the RNA folding landscape, especially the knowledge of alternative structures, is critical to uncover the physical mechanism of RNA functions. Here, we introduce a helix-based strategy for RNA folding landscape partition and alternative secondary structure determination. The benchmark test of 27 RNAs involving alternative stable structures shows that the model has the ability to divide the whole landscape into distinct partitions at the secondary structure level and predict the representative structures for each partition. Furthermore, the predicted structures and equilibrium populations of metastable conformations for the 2'dG-sensing riboswitch reveal the allosteric conformational switch on transcript length, which is consistent with the experimental study, indicating the importance of metastable states for RNA-based gene regulation. The model delivers a starting point for the landscape-based strategy toward the RNA folding mechanism and functions.
Collapse
Affiliation(s)
- Fengfei Wang
- Institute
of Bioinformatics and Medical Engineering, School of Mathematics and
Physics, Jiangsu University of Technology, Changzhou, Jiangsu 213001, China
| | - Li-Zhen Sun
- Department
of Applied Physics, Zhejiang University
of Technology, Hangzhou, Zhejiang 310023, China
| | - Tingting Sun
- Department
of Physics, Zhejiang University of Science
and Technology, Hangzhou, Zhejiang 310023, China
| | - Shan Chang
- Institute
of Bioinformatics and Medical Engineering, School of Mathematics and
Physics, Jiangsu University of Technology, Changzhou, Jiangsu 213001, China
| | - Xiaojun Xu
- Institute
of Bioinformatics and Medical Engineering, School of Mathematics and
Physics, Jiangsu University of Technology, Changzhou, Jiangsu 213001, China
| |
Collapse
|
18
|
Manzourolajdad A, Spouge JL. Structural prediction of RNA switches using conditional base-pair probabilities. PLoS One 2019; 14:e0217625. [PMID: 31188853 PMCID: PMC6561571 DOI: 10.1371/journal.pone.0217625] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2018] [Accepted: 05/15/2019] [Indexed: 11/23/2022] Open
Abstract
An RNA switch triggers biological functions by toggling between two conformations. RNA switches include bacterial riboswitches, where ligand binding can stabilize a bound structure. For RNAs with only one stable structure, structural prediction usually just requires a straightforward free energy minimization, but for an RNA switch, the prediction of a less stable alternative structure is often computationally costly and even problematic. The current sampling-clustering method predicts stable and alternative structures by partitioning structures sampled from the energy landscape into two clusters, but it is very time-consuming. Instead, we predict the alternative structure of an RNA switch from conditional probability calculations within the energy landscape. First, our method excludes base pairs related to the most stable structure in the energy landscape. Then, it detects stable stems (“seeds”) in the remaining landscape. Finally, it folds an alternative structure prediction around a seed. While having comparable riboswitch classification performance, the conditional-probability computations had fewer adjustable parameters, offered greater predictive flexibility, and were more than one thousand times faster than the sampling step alone in sampling-clustering predictions, the competing standard. Overall, the described approach helps traverse thermodynamically improbable energy landscapes to find biologically significant substructures and structures rapidly and effectively.
Collapse
Affiliation(s)
- Amirhossein Manzourolajdad
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland, United States of America
- * E-mail:
| | - John L. Spouge
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland, United States of America
| |
Collapse
|
19
|
Schroeder SJ. Challenges and approaches to predicting RNA with multiple functional structures. RNA (NEW YORK, N.Y.) 2018; 24:1615-1624. [PMID: 30143552 PMCID: PMC6239171 DOI: 10.1261/rna.067827.118] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
The revolution in sequencing technology demands new tools to interpret the genetic code. As in vivo transcriptome-wide chemical probing techniques advance, new challenges emerge in the RNA folding problem. The emphasis on one sequence folding into a single minimum free energy structure is fading as a new focus develops on generating RNA structural ensembles and identifying functional structural features in ensembles. This review describes an efficient combinatorially complete method and three free energy minimization approaches to predicting RNA structures with more than one functional fold, as well as two methods for analysis of a thermodynamics-based Boltzmann ensemble of structures. The review then highlights two examples of viral RNA 3'-UTR regions that fold into more than one conformation and have been characterized by single molecule fluorescence energy resonance transfer or NMR spectroscopy. These examples highlight the different approaches and challenges in predicting structure and function from sequence for RNA with multiple biological roles and folds. More well-defined examples and new metrics for measuring differences in RNA structures will guide future improvements in prediction of RNA structure and function from sequence.
Collapse
Affiliation(s)
- Susan J Schroeder
- Department of Chemistry and Biochemistry, Department of Microbiology and Plant Biology, University of Oklahoma, Norman, Oklahoma 73019, USA
| |
Collapse
|
20
|
Kutchko KM, Madden EA, Morrison C, Plante KS, Sanders W, Vincent HA, Cruz Cisneros MC, Long KM, Moorman NJ, Heise MT, Laederach A. Structural divergence creates new functional features in alphavirus genomes. Nucleic Acids Res 2018; 46:3657-3670. [PMID: 29361131 PMCID: PMC6283419 DOI: 10.1093/nar/gky012] [Citation(s) in RCA: 37] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2017] [Revised: 12/10/2017] [Accepted: 01/05/2018] [Indexed: 12/03/2022] Open
Abstract
Alphaviruses are mosquito-borne pathogens that cause human diseases ranging from debilitating arthritis to lethal encephalitis. Studies with Sindbis virus (SINV), which causes fever, rash, and arthralgia in humans, and Venezuelan equine encephalitis virus (VEEV), which causes encephalitis, have identified RNA structural elements that play key roles in replication and pathogenesis. However, a complete genomic structural profile has not been established for these viruses. We used the structural probing technique SHAPE-MaP to identify structured elements within the SINV and VEEV genomes. Our SHAPE-directed structural models recapitulate known RNA structures, while also identifying novel structural elements, including a new functional element in the nsP1 region of SINV whose disruption causes a defect in infectivity. Although RNA structural elements are important for multiple aspects of alphavirus biology, we found the majority of RNA structures were not conserved between SINV and VEEV. Our data suggest that alphavirus RNA genomes are highly divergent structurally despite similar genomic architecture and sequence conservation; still, RNA structural elements are critical to the viral life cycle. These findings reframe traditional assumptions about RNA structure and evolution: rather than structures being conserved, alphaviruses frequently evolve new structures that may shape interactions with host immune systems or co-evolve with viral proteins.
Collapse
Affiliation(s)
- Katrina M Kutchko
- Department of Biology, UNC-Chapel Hill, USA
- Curriculum in Bioinformatics and Computational Biology, UNC-Chapel Hill, USA
| | - Emily A Madden
- Department of Microbiology and Immunology, UNC-Chapel Hill, USA
| | | | | | - Wes Sanders
- Department of Microbiology and Immunology, UNC-Chapel Hill, USA
- Lineberger Comprehensive Cancer Center, UNC-Chapel Hill, USA
| | | | | | | | - Nathaniel J Moorman
- Department of Microbiology and Immunology, UNC-Chapel Hill, USA
- Lineberger Comprehensive Cancer Center, UNC-Chapel Hill, USA
| | - Mark T Heise
- Department of Microbiology and Immunology, UNC-Chapel Hill, USA
- Department of Genetics, UNC-Chapel Hill, USA
| | - Alain Laederach
- Department of Biology, UNC-Chapel Hill, USA
- Curriculum in Bioinformatics and Computational Biology, UNC-Chapel Hill, USA
- Lineberger Comprehensive Cancer Center, UNC-Chapel Hill, USA
| |
Collapse
|
21
|
Antunes D, Jorge NAN, Caffarena ER, Passetti F. Using RNA Sequence and Structure for the Prediction of Riboswitch Aptamer: A Comprehensive Review of Available Software and Tools. Front Genet 2018; 8:231. [PMID: 29403526 PMCID: PMC5780412 DOI: 10.3389/fgene.2017.00231] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2017] [Accepted: 12/21/2017] [Indexed: 12/14/2022] Open
Abstract
RNA molecules are essential players in many fundamental biological processes. Prokaryotes and eukaryotes have distinct RNA classes with specific structural features and functional roles. Computational prediction of protein structures is a research field in which high confidence three-dimensional protein models can be proposed based on the sequence alignment between target and templates. However, to date, only a few approaches have been developed for the computational prediction of RNA structures. Similar to proteins, RNA structures may be altered due to the interaction with various ligands, including proteins, other RNAs, and metabolites. A riboswitch is a molecular mechanism, found in the three kingdoms of life, in which the RNA structure is modified by the binding of a metabolite. It can regulate multiple gene expression mechanisms, such as transcription, translation initiation, and mRNA splicing and processing. Due to their nature, these entities also act on the regulation of gene expression and detection of small metabolites and have the potential to helping in the discovery of new classes of antimicrobial agents. In this review, we describe software and web servers currently available for riboswitch aptamer identification and secondary and tertiary structure prediction, including applications.
Collapse
Affiliation(s)
- Deborah Antunes
- Scientific Computing Program (PROCC), Computational Biophysics and Molecular Modeling Group, Fundação Oswaldo Cruz, Rio de Janeiro, Brazil
| | - Natasha A N Jorge
- Laboratory of Functional Genomics and Bioinformatics, Oswaldo Cruz Institute, Fundação Oswaldo Cruz, Rio de Janeiro, Brazil.,Laboratory of Gene Expression Regulation, Carlos Chagas Institute, Fundação Oswaldo Cruz, Curitiba, Brazil
| | - Ernesto R Caffarena
- Scientific Computing Program (PROCC), Computational Biophysics and Molecular Modeling Group, Fundação Oswaldo Cruz, Rio de Janeiro, Brazil
| | - Fabio Passetti
- Laboratory of Functional Genomics and Bioinformatics, Oswaldo Cruz Institute, Fundação Oswaldo Cruz, Rio de Janeiro, Brazil.,Laboratory of Gene Expression Regulation, Carlos Chagas Institute, Fundação Oswaldo Cruz, Curitiba, Brazil
| |
Collapse
|
22
|
Tan Z, Fu Y, Sharma G, Mathews DH. TurboFold II: RNA structural alignment and secondary structure prediction informed by multiple homologs. Nucleic Acids Res 2017; 45:11570-11581. [PMID: 29036420 PMCID: PMC5714223 DOI: 10.1093/nar/gkx815] [Citation(s) in RCA: 67] [Impact Index Per Article: 8.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2017] [Accepted: 09/12/2017] [Indexed: 12/26/2022] Open
Abstract
This paper presents TurboFold II, an extension of the TurboFold algorithm for predicting secondary structures for multiple RNA homologs. TurboFold II augments the structure prediction capabilities of TurboFold by additionally providing multiple sequence alignments. Probabilities for alignment of nucleotide positions between all pairs of input sequences are iteratively estimated in TurboFold II by incorporating information from both the sequence identity and secondary structures. A multiple sequence alignment is obtained from these probabilities by using a probabilistic consistency transformation and a hierarchically computed guide tree. To assess TurboFold II, its sequence alignment and structure predictions were compared with leading tools, including methods that focus on alignment alone and methods that provide both alignment and structure prediction. TurboFold II has comparable alignment accuracy with MAFFT and higher accuracy than other tools. TurboFold II also has comparable structure prediction accuracy as the original TurboFold algorithm, which is one of the most accurate methods. TurboFold II is part of the RNAstructure software package, which is freely available for download at http://rna.urmc.rochester.edu under a GPL license.
Collapse
Affiliation(s)
- Zhen Tan
- Department of Biochemistry and Biophysics, University of Rochester Medical Center, 601 Elmwood Avenue, Box 712, Rochester, NY 14642, USA.,Center for RNA Biology, University of Rochester Medical Center, 601 Elmwood Avenue, Box 712, Rochester, NY 14642, USA
| | - Yinghan Fu
- Department of Biochemistry and Biophysics, University of Rochester Medical Center, 601 Elmwood Avenue, Box 712, Rochester, NY 14642, USA.,Center for RNA Biology, University of Rochester Medical Center, 601 Elmwood Avenue, Box 712, Rochester, NY 14642, USA
| | - Gaurav Sharma
- Center for RNA Biology, University of Rochester Medical Center, 601 Elmwood Avenue, Box 712, Rochester, NY 14642, USA.,Department of Electrical and Computer Engineering, University of Rochester, Hopeman 204, RC Box 270126, Rochester, NY 14627, USA.,Department of Biostatistics and Computational Biology, University of Rochester Medical Center, 601 Elmwood Avenue, Box 630, Rochester, NY 14642, USA
| | - David H Mathews
- Department of Biochemistry and Biophysics, University of Rochester Medical Center, 601 Elmwood Avenue, Box 712, Rochester, NY 14642, USA.,Center for RNA Biology, University of Rochester Medical Center, 601 Elmwood Avenue, Box 712, Rochester, NY 14642, USA.,Department of Biostatistics and Computational Biology, University of Rochester Medical Center, 601 Elmwood Avenue, Box 630, Rochester, NY 14642, USA
| |
Collapse
|
23
|
Schlick T, Pyle AM. Opportunities and Challenges in RNA Structural Modeling and Design. Biophys J 2017; 113:225-234. [PMID: 28162235 PMCID: PMC5529161 DOI: 10.1016/j.bpj.2016.12.037] [Citation(s) in RCA: 48] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2016] [Revised: 12/08/2016] [Accepted: 12/19/2016] [Indexed: 01/27/2023] Open
Abstract
We describe opportunities and challenges in RNA structural modeling and design, as recently discussed during the second Telluride Science Research Center workshop organized in June 2016. Topics include fundamental processes of RNA, such as structural assemblies (hierarchical folding, multiple conformational states and their clustering), RNA motifs, and chemical reactivity of RNA, as used for structural prediction and functional inference. We also highlight the software and database issues associated with RNA structures, such as the multiple approaches for motif annotation, the need for frequent database updating, and the importance of quality control of RNA structures. We discuss various modeling approaches for structure prediction, mechanistic analysis of RNA reactions, and RNA design, and the complementary roles that both atomistic and coarse-grained approaches play in such simulations. Collectively, as scientists from varied disciplines become familiar and drawn into these unique challenges, new approaches and collaborative efforts will undoubtedly be catalyzed.
Collapse
Affiliation(s)
- Tamar Schlick
- Department of Chemistry, New York University, New York, New York; Courant Institute of Mathematical Sciences, New York University, New York, New York.
| | - Anna Marie Pyle
- Department of Molecular and Cellular and Developmental Biology and Department of Chemistry, Yale University; Howard Hughes Medical Institute, New Haven, Connecticut.
| |
Collapse
|
24
|
Woods CT, Lackey L, Williams B, Dokholyan NV, Gotz D, Laederach A. Comparative Visualization of the RNA Suboptimal Conformational Ensemble In Vivo. Biophys J 2017. [PMID: 28625696 PMCID: PMC5529173 DOI: 10.1016/j.bpj.2017.05.031] [Citation(s) in RCA: 26] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
Abstract
When a ribonucleic acid (RNA) molecule folds, it often does not adopt a single, well-defined conformation. The folding energy landscape of an RNA is highly dependent on its nucleotide sequence and molecular environment. Cellular molecules sometimes alter the energy landscape, thereby changing the ensemble of likely low-energy conformations. The effects of these energy landscape changes on the conformational ensemble are particularly challenging to visualize for large RNAs. We have created a robust approach for visualizing the conformational ensemble of RNAs that is well suited for in vitro versus in vivo comparisons. Our method creates a stable map of conformational space for a given RNA sequence. We first identify single point mutations in the RNA that maximally sample suboptimal conformational space based on the ensemble’s partition function. Then, we cluster these diverse ensembles to identify the most diverse partition functions for Boltzmann stochastic sampling. By using, to our knowledge, a novel nestedness distance metric, we iteratively add mutant suboptimal ensembles to converge on a stable 2D map of conformational space. We then compute the selective 2′ hydroxyl acylation by primer extension (SHAPE)-directed ensemble for the RNA folding under different conditions, and we project these ensembles on the map to visualize. To validate our approach, we established a conformational map of the Vibrio vulnificus add adenine riboswitch that reveals five classes of structures. In the presence of adenine, projection of the SHAPE-directed sampling correctly identified the on-conformation; without the ligand, only off-conformations were visualized. We also collected the whole-transcript in vitro and in vivo SHAPE-MaP for human β-actin messenger RNA that revealed similar global folds in both conditions. Nonetheless, a comparison of in vitro and in vivo data revealed that specific regions exhibited significantly different SHAPE-MaP profiles indicative of structural rearrangements, including rearrangement consistent with binding of the zipcode protein in a region distal to the stop codon.
Collapse
Affiliation(s)
- Chanin T Woods
- Curriculum in Bioinformatics and Computational Biology, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina; Department of Biology, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina
| | - Lela Lackey
- Department of Biology, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina
| | - Benfeard Williams
- Department of Biochemistry and Biophysics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina
| | - Nikolay V Dokholyan
- Department of Biochemistry and Biophysics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina
| | - David Gotz
- Carolina Health Informatics Program, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina; School of Information and Library Science, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina
| | - Alain Laederach
- Curriculum in Bioinformatics and Computational Biology, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina; Department of Biology, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina.
| |
Collapse
|
25
|
Manzourolajdad A, Gonzalez M, Spouge JL. Changes in the Plasticity of HIV-1 Nef RNA during the Evolution of the North American Epidemic. PLoS One 2016; 11:e0163688. [PMID: 27685447 PMCID: PMC5042412 DOI: 10.1371/journal.pone.0163688] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2016] [Accepted: 09/13/2016] [Indexed: 02/04/2023] Open
Abstract
Because of a high mutation rate, HIV exists as a viral swarm of many sequence variants evolving under various selective pressures from the human immune system. Although the Nef gene codes for the most immunogenic of HIV accessory proteins, which alone makes it of great interest to HIV research, it also encodes an RNA structure, whose contribution to HIV virulence has been largely unexplored. Nef RNA helps HIV escape RNA interference (RNAi) through nucleotide changes and alternative folding. This study examines Historic and Modern Datasets of patient HIV-1 Nef sequences during the evolution of the North American epidemic for local changes in RNA plasticity. By definition, RNA plasticity refers to an RNA molecule’s ability to take alternative folds (i.e., alternative conformations). Our most important finding is that an evolutionarily conserved region of the HIV-1 Nef gene, which we denote by R2, recently underwent a statistically significant increase in its RNA plasticity. Thus, our results indicate that Modern Nef R2 typically accommodates an alternative fold more readily than Historic Nef R2. Moreover, the increase in RNA plasticity resides mostly in synonymous nucleotide changes, which cannot be a response to selective pressures on the Nef protein. R2 may therefore be of interest in the development of antiviral RNAi therapies.
Collapse
Affiliation(s)
- Amirhossein Manzourolajdad
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland, United States of America
- * E-mail:
| | - Mileidy Gonzalez
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland, United States of America
| | - John L. Spouge
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland, United States of America
| |
Collapse
|
26
|
Kutchko KM, Laederach A. Transcending the prediction paradigm: novel applications of SHAPE to RNA function and evolution. WILEY INTERDISCIPLINARY REVIEWS-RNA 2016; 8. [PMID: 27396578 PMCID: PMC5179297 DOI: 10.1002/wrna.1374] [Citation(s) in RCA: 36] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/04/2016] [Revised: 04/29/2016] [Accepted: 05/23/2016] [Indexed: 12/31/2022]
Abstract
Selective 2′‐hydroxyl acylation analyzed by primer extension (SHAPE) provides information on RNA structure at single‐nucleotide resolution. It is most often used in conjunction with RNA secondary structure prediction algorithms as a probabilistic or thermodynamic restraint. With the recent advent of ultra‐high‐throughput approaches for collecting SHAPE data, the applications of this technology are extending beyond structure prediction. In this review, we discuss recent applications of SHAPE data in the transcriptomic context and how this new experimental paradigm is changing our understanding of these experiments and RNA folding in general. SHAPE experiments probe both the secondary and tertiary structure of an RNA, suggesting that model‐free approaches for within and comparative RNA structure analysis can provide significant structural insight without the need for a full structural model. New methods incorporating SHAPE at different nucleotide resolutions are required to parse these transcriptomic data sets to transcend secondary structure modeling with global structural metrics. These ‘multiscale’ approaches provide deeper insights into RNA global structure, evolution, and function in the cell. WIREs RNA 2017, 8:e1374. doi: 10.1002/wrna.1374 For further resources related to this article, please visit the WIREs website.
Collapse
Affiliation(s)
- Katrina M Kutchko
- Department of Biology, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA.,Curriculum in Bioinformatics and Computational Biology, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Alain Laederach
- Department of Biology, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA.,Curriculum in Bioinformatics and Computational Biology, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| |
Collapse
|
27
|
RNA Duplex Map in Living Cells Reveals Higher-Order Transcriptome Structure. Cell 2016; 165:1267-1279. [PMID: 27180905 DOI: 10.1016/j.cell.2016.04.028] [Citation(s) in RCA: 447] [Impact Index Per Article: 49.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2015] [Revised: 03/03/2016] [Accepted: 04/05/2016] [Indexed: 02/06/2023]
Abstract
RNA has the intrinsic property to base pair, forming complex structures fundamental to its diverse functions. Here, we develop PARIS, a method based on reversible psoralen crosslinking for global mapping of RNA duplexes with near base-pair resolution in living cells. PARIS analysis in three human and mouse cell types reveals frequent long-range structures, higher-order architectures, and RNA-RNA interactions in trans across the transcriptome. PARIS determines base-pairing interactions on an individual-molecule level, revealing pervasive alternative conformations. We used PARIS-determined helices to guide phylogenetic analysis of RNA structures and discovered conserved long-range and alternative structures. XIST, a long noncoding RNA (lncRNA) essential for X chromosome inactivation, folds into evolutionarily conserved RNA structural domains that span many kilobases. XIST A-repeat forms complex inter-repeat duplexes that nucleate higher-order assembly of the key epigenetic silencing protein SPEN. PARIS is a generally applicable and versatile method that provides novel insights into the RNA structurome and interactome. VIDEO ABSTRACT.
Collapse
|
28
|
Cordero P, Das R. Rich RNA Structure Landscapes Revealed by Mutate-and-Map Analysis. PLoS Comput Biol 2015; 11:e1004473. [PMID: 26566145 PMCID: PMC4643908 DOI: 10.1371/journal.pcbi.1004473] [Citation(s) in RCA: 43] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2015] [Accepted: 07/20/2015] [Indexed: 11/19/2022] Open
Abstract
Landscapes exhibiting multiple secondary structures arise in natural RNA molecules that modulate gene expression, protein synthesis, and viral infection [corrected]. We report herein that high-throughput chemical experiments can isolate an RNA's multiple alternative secondary structures as they are stabilized by systematic mutagenesis (mutate-and-map, M2) and that a computational algorithm, REEFFIT, enables unbiased reconstruction of these states' structures and populations. In an in silico benchmark on non-coding RNAs with complex landscapes, M2-REEFFIT recovers 95% of RNA helices present with at least 25% population while maintaining a low false discovery rate (10%) and conservative error estimates. In experimental benchmarks, M2-REEFFIT recovers the structure landscapes of a 35-nt MedLoop hairpin, a 110-nt 16S rRNA four-way junction with an excited state, a 25-nt bistable hairpin, and a 112-nt three-state adenine riboswitch with its expression platform, molecules whose characterization previously required expert mutational analysis and specialized NMR or chemical mapping experiments. With this validation, M2-REEFFIT enabled tests of whether artificial RNA sequences might exhibit complex landscapes in the absence of explicit design. An artificial flavin mononucleotide riboswitch and a randomly generated RNA sequence are found to interconvert between three or more states, including structures for which there was no design, but that could be stabilized through mutations. These results highlight the likely pervasiveness of rich landscapes with multiple secondary structures in both natural and artificial RNAs and demonstrate an automated chemical/computational route for their empirical characterization.
Collapse
Affiliation(s)
- Pablo Cordero
- Biomedical Informatics Program, Stanford University, Stanford, California, United States of America
- Biochemistry Department, Stanford University, Stanford, California, United States of America
| | - Rhiju Das
- Biomedical Informatics Program, Stanford University, Stanford, California, United States of America
- Biochemistry Department, Stanford University, Stanford, California, United States of America
- Physics Department, Stanford University, Stanford, California, United States of America
| |
Collapse
|
29
|
Drory Retwitzer M, Kifer I, Sengupta S, Yakhini Z, Barash D. An Efficient Minimum Free Energy Structure-Based Search Method for Riboswitch Identification Based on Inverse RNA Folding. PLoS One 2015; 10:e0134262. [PMID: 26230932 PMCID: PMC4521916 DOI: 10.1371/journal.pone.0134262] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2015] [Accepted: 07/07/2015] [Indexed: 11/22/2022] Open
Abstract
Riboswitches are RNA genetic control elements that were originally discovered in bacteria and provide a unique mechanism of gene regulation. They work without the participation of proteins and are believed to represent ancient regulatory systems in the evolutionary timescale. One of the biggest challenges in riboswitch research is to find additional eukaryotic riboswitches since more than 20 riboswitch classes have been found in prokaryotes but only one class has been found in eukaryotes. Moreover, this single known class of eukaryotic riboswitch, namely the TPP riboswitch class, has been found in bacteria, archaea, fungi and plants but not in animals. The few examples of eukaryotic riboswitches were identified using sequence-based bioinformatics search methods such as a combination of BLAST and pattern matching techniques that incorporate base-pairing considerations. None of these approaches perform energy minimization structure predictions. There is a clear motivation to develop new bioinformatics methods, aside of the ongoing advances in covariance models, that will sample the sequence search space more flexibly using structural guidance while retaining the computational efficiency of sequence-based methods. We present a new energy minimization approach that transforms structure-based search into a sequence-based search, thereby enabling the utilization of well established sequence-based search utilities such as BLAST and FASTA. The transformation to sequence space is obtained by using an extended inverse RNA folding problem solver with sequence and structure constraints, available within RNAfbinv. Examples in applying the new method are presented for the purine and preQ1 riboswitches. The method is described in detail along with its findings in prokaryotes. Potential uses in finding novel eukaryotic riboswitches and optimizing pre-designed synthetic riboswitches based on ligand simulations are discussed. The method components are freely available for use.
Collapse
Affiliation(s)
| | - Ilona Kifer
- Agilent Laboratories, Tel Aviv, Israel; Microsoft R&D Center, Herzliya, Israel
| | - Supratim Sengupta
- Department of Physical Sciences, Indian Institute of Science Education and Research Kolkata, Mohanpur, 741246, India
| | - Zohar Yakhini
- Agilent Laboratories, Tel Aviv, Israel; Laboratory of Computational Biology, Computer Science Department, Israel Institute of Technology, Haifa, 32000, Israel
| | - Danny Barash
- Department of Computer Science, Ben-Gurion University, Beer-Sheva, 84105, Israel
| |
Collapse
|
30
|
Kutchko KM, Sanders W, Ziehr B, Phillips G, Solem A, Halvorsen M, Weeks KM, Moorman N, Laederach A. Multiple conformations are a conserved and regulatory feature of the RB1 5' UTR. RNA (NEW YORK, N.Y.) 2015; 21:1274-85. [PMID: 25999316 PMCID: PMC4478346 DOI: 10.1261/rna.049221.114] [Citation(s) in RCA: 49] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/07/2014] [Accepted: 03/27/2015] [Indexed: 05/22/2023]
Abstract
Folding to a well-defined conformation is essential for the function of structured ribonucleic acids (RNAs) like the ribosome and tRNA. Structured elements in the untranslated regions (UTRs) of specific messenger RNAs (mRNAs) are known to control expression. The importance of unstructured regions adopting multiple conformations, however, is still poorly understood. High-resolution SHAPE-directed Boltzmann suboptimal sampling of the Homo sapiens Retinoblastoma 1 (RB1) 5' UTR yields three distinct conformations compatible with the experimental data. Private single nucleotide variants (SNVs) identified in two patients with retinoblastoma each collapse the structural ensemble to a single but distinct well-defined conformation. The RB1 5' UTRs from Bos taurus (cow) and Trichechus manatus latirostris (manatee) are divergent in sequence from H. sapiens (human) yet maintain structural compatibility with high-probability base pairs. SHAPE chemical probing of the cow and manatee RB1 5' UTRs reveals that they also adopt multiple conformations. Luciferase reporter assays reveal that 5' UTR mutations alter RB1 expression. In a traditional model of disease, causative SNVs disrupt a key structural element in the RNA. For the subset of patients with heritable retinoblastoma-associated SNVs in the RB1 5' UTR, the absence of multiple structures is likely causative of the cancer. Our data therefore suggest that selective pressure will favor multiple conformations in eukaryotic UTRs to regulate expression.
Collapse
Affiliation(s)
- Katrina M Kutchko
- Department of Biology, University of North Carolina, Chapel Hill, North Carolina 27599-3290, USA Curriculum in Bioinformatics and Computational Biology, University of North Carolina, Chapel Hill, North Carolina 27599, USA
| | - Wes Sanders
- Department of Biology, University of North Carolina, Chapel Hill, North Carolina 27599-3290, USA
| | - Ben Ziehr
- Department of Microbiology and Immunology, University of North Carolina, Chapel Hill, North Carolina 27599, USA Lineberger Comprehensive Cancer Center, University of North Carolina, Chapel Hill, North Carolina 27599, USA
| | - Gabriela Phillips
- Department of Biology, University of North Carolina, Chapel Hill, North Carolina 27599-3290, USA
| | - Amanda Solem
- Department of Biology, University of North Carolina, Chapel Hill, North Carolina 27599-3290, USA
| | - Matthew Halvorsen
- Institute for Genomic Medicine, Columbia University Medical Center, New York, New York 10032, USA
| | - Kevin M Weeks
- Department of Chemistry, University of North Carolina, Chapel Hill, North Carolina 27599-3290, USA
| | - Nathaniel Moorman
- Department of Microbiology and Immunology, University of North Carolina, Chapel Hill, North Carolina 27599, USA Lineberger Comprehensive Cancer Center, University of North Carolina, Chapel Hill, North Carolina 27599, USA
| | - Alain Laederach
- Department of Biology, University of North Carolina, Chapel Hill, North Carolina 27599-3290, USA
| |
Collapse
|
31
|
Solem AC, Halvorsen M, Ramos SBV, Laederach A. The potential of the riboSNitch in personalized medicine. WILEY INTERDISCIPLINARY REVIEWS-RNA 2015; 6:517-32. [PMID: 26115028 PMCID: PMC4543445 DOI: 10.1002/wrna.1291] [Citation(s) in RCA: 37] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/29/2014] [Revised: 03/25/2015] [Accepted: 05/13/2015] [Indexed: 01/28/2023]
Abstract
RNA conformation plays a significant role in stability, ligand binding, transcription, and translation. Single nucleotide variants (SNVs) have the potential to disrupt specific structural elements because RNA folds in a sequence-specific manner. A riboSNitch is an element of RNA structure with a specific function that is disrupted by an SNV or a single nucleotide polymorphism (SNP; or polymorphism; SNVs occur with low frequency in the population, <1%). The riboSNitch is analogous to a riboswitch, where binding of a small molecule rather than mutation alters the structure of the RNA to control gene regulation. RiboSNitches are particularly relevant to interpreting the results of genome-wide association studies (GWAS). Often GWAS identify SNPs associated with a phenotype mapping to noncoding regions of the genome. Because a majority of the human genome is transcribed, significant subsets of GWAS SNPs are putative riboSNitches. The extent to which the transcriptome is tolerant of SNP-induced structure change is still poorly understood. Recent advances in ultra high-throughput structure probing begin to reveal the structural complexities of mutation-induced structure change. This review summarizes our current understanding of SNV and SNP-induced structure change in the human transcriptome and discusses the importance of riboSNitch discovery in interpreting GWAS results and massive sequencing projects.
Collapse
Affiliation(s)
- Amanda C Solem
- Department of Biology, University of North Carolina, Chapel Hill, NC, USA
| | - Matthew Halvorsen
- Department of Biology, University of North Carolina, Chapel Hill, NC, USA.,Institute for Genomic Medicine, Columbia University, New York, NY, USA
| | - Silvia B V Ramos
- Department of Biochemistry and Biophysics, University of North Carolina, Chapel Hill, NC, USA
| | - Alain Laederach
- Department of Biology, University of North Carolina, Chapel Hill, NC, USA.,Bioinformatics and Computational Biology Program, University of North Carolina, Chapel Hill, NC, USA
| |
Collapse
|
32
|
Manzourolajdad A, Arnold J. Secondary structural entropy in RNA switch (Riboswitch) identification. BMC Bioinformatics 2015; 16:133. [PMID: 25928324 PMCID: PMC4448311 DOI: 10.1186/s12859-015-0523-2] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2014] [Accepted: 03/02/2015] [Indexed: 01/10/2023] Open
Abstract
BACKGROUND RNA regulatory elements play a significant role in gene regulation. Riboswitches, a widespread group of regulatory RNAs, are vital components of many bacterial genomes. These regulatory elements generally function by forming a ligand-induced alternative fold that controls access to ribosome binding sites or other regulatory sites in RNA. Riboswitch-mediated mechanisms are ubiquitous across bacterial genomes. A typical class of riboswitch has its own unique structural and biological complexity, making de novo riboswitch identification a formidable task. Traditionally, riboswitches have been identified through comparative genomics based on sequence and structural homology. The limitations of structural-homology-based approaches, coupled with the assumption that there is a great diversity of undiscovered riboswitches, suggests the need for alternative methods for riboswitch identification, possibly based on features intrinsic to their structure. As of yet, no such reliable method has been proposed. RESULTS We used structural entropy of riboswitch sequences as a measure of their secondary structural dynamics. Entropy values of a diverse set of riboswitches were compared to that of their mutants, their dinucleotide shuffles, and their reverse complement sequences under different stochastic context-free grammar folding models. Significance of our results was evaluated by comparison to other approaches, such as the base-pairing entropy and energy landscapes dynamics. Classifiers based on structural entropy optimized via sequence and structural features were devised as riboswitch identifiers and tested on Bacillus subtilis, Escherichia coli, and Synechococcus elongatus as an exploration of structural entropy based approaches. The unusually long untranslated region of the cotH in Bacillus subtilis, as well as upstream regions of certain genes, such as the sucC genes were associated with significant structural entropy values in genome-wide examinations. CONCLUSIONS Various tests show that there is in fact a relationship between higher structural entropy and the potential for the RNA sequence to have alternative structures, within the limitations of our methodology. This relationship, though modest, is consistent across various tests. Understanding the behavior of structural entropy as a fairly new feature for RNA conformational dynamics, however, may require extensive exploratory investigation both across RNA sequences and folding models.
Collapse
Affiliation(s)
- Amirhossein Manzourolajdad
- Institute of Bioinformatics, University of Georgia, Davison Life Sciences Bldg, Room B118B, 120 Green St, Athens, 30602, USA. .,National Center for Biotechnology Information (NCBI), NIH, Building 38A, RM 6S614K, 8600 Rockville Pike, Bethesda, 20894, USA.
| | - Jonathan Arnold
- Institute of Bioinformatics, University of Georgia, Davison Life Sciences Bldg, Room B118B, 120 Green St, Athens, 30602, USA. .,Department of Genetics, University of Georgia, Davison Life Sciences Bldg, 120 Green St, Athens, 30602, USA.
| |
Collapse
|
33
|
de Boer FK, Hogeweg P. Mutation rates and evolution of multiple coding in RNA-based protocells. J Mol Evol 2014; 79:193-203. [PMID: 25280530 PMCID: PMC4247474 DOI: 10.1007/s00239-014-9648-6] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2013] [Accepted: 09/18/2014] [Indexed: 11/28/2022]
Abstract
RNA has a myriad of biological roles in contemporary life. We use the RNA paradigm for genotype-phenotype mappings to study the evolution of multiple coding in dependence to mutation rates. We study three different one-to-many genotype-phenotype mappings which have the potential to encode the information for multiple functions on a single sequence. These three different maps are (i) cofolding, where two sequences can bind and “cofold,” (ii) suboptimal folding, where the alternative foldings within a certain range of the native state of sequences are considered, and (iii) adapter-based folding, in which protocells can evolve adapter-mediated alternative foldings. We study how protocells with a set of sequences can code for a set of predefined functional structures, while avoiding all other structures, which are considered to be misfoldings. Note that such misfolded structures are far more prevalent than functional ones. Our results highlight the flexibility of the RNA sequence to secondary structure mapping and the power of evolution to shape the genotype-phenotype mapping. We show that high fitness can be achieved even at high mutation rates. Mutation rates affect genome size, but differently depending on which folding method is used. We observe that cofolding limits the possibility to avoid misfolded structures and that adapters are always beneficial for fitness, but even more beneficial at low mutation rates. In all cases, the evolution procedure selects for molecules that can form additional structures. Our results indicate that inherent properties of RNA molecules and their interactions allow the evolution of complexity even at high mutation rates.
Collapse
Affiliation(s)
- Folkert K de Boer
- Theoretical Biology and Bioinformatics, Universiteit Utrecht, Utrecht, The Netherlands,
| | | |
Collapse
|
34
|
|