1
|
Hermans P, Tsishyn M, Schwersensky M, Rooman M, Pucci F. Exploring Evolution to Uncover Insights Into Protein Mutational Stability. Mol Biol Evol 2025; 42:msae267. [PMID: 39786559 PMCID: PMC11721782 DOI: 10.1093/molbev/msae267] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2024] [Revised: 11/27/2024] [Accepted: 11/28/2024] [Indexed: 01/12/2025] Open
Abstract
Determining the impact of mutations on the thermodynamic stability of proteins is essential for a wide range of applications such as rational protein design and genetic variant interpretation. Since protein stability is a major driver of evolution, evolutionary data are often used to guide stability predictions. Many state-of-the-art stability predictors extract evolutionary information from multiple sequence alignments of proteins homologous to a query protein, and leverage it to predict the effects of mutations on protein stability. To evaluate the power and the limitations of such methods, we used the massive amount of stability data recently obtained by deep mutational scanning to study how best to construct multiple sequence alignments and optimally extract evolutionary information from them. We tested different evolutionary models and found that, unexpectedly, independent-site models achieve similar accuracy to more complex epistatic models. A detailed analysis of the latter models suggests that their inference often results in noisy couplings, which do not appear to add predictive power over the independent-site contribution, at least in the context of stability prediction. Interestingly, by combining any of the evolutionary features with a simple structural feature, the relative solvent accessibility of the mutated residue, we achieved similar prediction accuracy to supervised, machine learning-based, protein stability change predictors. Our results provide new insights into the relationship between protein evolution and stability, and show how evolutionary information can be exploited to improve the performance of mutational stability prediction.
Collapse
Affiliation(s)
- Pauline Hermans
- Computational Biology and Bioinformatics, Université Libre de Bruxelles, Brussels 1050, Belgium
- Interuniversity Institute of Bioinformatics in Brussels, Brussels 1050, Belgium
| | - Matsvei Tsishyn
- Computational Biology and Bioinformatics, Université Libre de Bruxelles, Brussels 1050, Belgium
- Interuniversity Institute of Bioinformatics in Brussels, Brussels 1050, Belgium
| | - Martin Schwersensky
- Computational Biology and Bioinformatics, Université Libre de Bruxelles, Brussels 1050, Belgium
- Interuniversity Institute of Bioinformatics in Brussels, Brussels 1050, Belgium
| | - Marianne Rooman
- Computational Biology and Bioinformatics, Université Libre de Bruxelles, Brussels 1050, Belgium
- Interuniversity Institute of Bioinformatics in Brussels, Brussels 1050, Belgium
| | - Fabrizio Pucci
- Computational Biology and Bioinformatics, Université Libre de Bruxelles, Brussels 1050, Belgium
- Interuniversity Institute of Bioinformatics in Brussels, Brussels 1050, Belgium
| |
Collapse
|
2
|
Degenhardt MFS, Degenhardt HF, Bhandari YR, Lee YT, Ding J, Yu P, Heinz WF, Stagno JR, Schwieters CD, Watts NR, Wingfield PT, Rein A, Zhang J, Wang YX. Determining structures of RNA conformers using AFM and deep neural networks. Nature 2024:10.1038/s41586-024-07559-x. [PMID: 39695231 DOI: 10.1038/s41586-024-07559-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2023] [Accepted: 05/10/2024] [Indexed: 12/20/2024]
Abstract
Much of the human genome is transcribed into RNAs1, many of which contain structural elements that are important for their function. Such RNA molecules-including those that are structured and well-folded2-are conformationally heterogeneous and flexible, which is a prerequisite for function3,4, but this limits the applicability of methods such as NMR, crystallography and cryo-electron microscopy for structure elucidation. Moreover, owing to the lack of a large RNA structure database, and no clear correlation between sequence and structure, approaches such as AlphaFold5 for protein structure prediction do not apply to RNA. Therefore, determining the structures of heterogeneous RNAs remains an unmet challenge. Here we report holistic RNA structure determination method using atomic force microscopy, unsupervised machine learning and deep neural networks (HORNET), a novel method for determining three-dimensional topological structures of RNA using atomic force microscopy images of individual molecules in solution. Owing to the high signal-to-noise ratio of atomic force microscopy, this method is ideal for capturing structures of large RNA molecules in distinct conformations. In addition to six benchmark cases, we demonstrate the utility of HORNET by determining multiple heterogeneous structures of RNase P RNA and the HIV-1 Rev response element (RRE) RNA. Thus, our method addresses one of the major challenges in determining heterogeneous structures of large and flexible RNA molecules, and contributes to the fundamental understanding of RNA structural biology.
Collapse
Affiliation(s)
- Maximilia F S Degenhardt
- Protein-Nucleic Acid Interaction Section, Center for Structural Biology, Center for Cancer Research, National Cancer Institute, Frederick, MD, USA
| | - Hermann F Degenhardt
- Protein-Nucleic Acid Interaction Section, Center for Structural Biology, Center for Cancer Research, National Cancer Institute, Frederick, MD, USA
| | - Yuba R Bhandari
- Protein-Nucleic Acid Interaction Section, Center for Structural Biology, Center for Cancer Research, National Cancer Institute, Frederick, MD, USA
| | - Yun-Tzai Lee
- Protein-Nucleic Acid Interaction Section, Center for Structural Biology, Center for Cancer Research, National Cancer Institute, Frederick, MD, USA
| | - Jienyu Ding
- Protein-Nucleic Acid Interaction Section, Center for Structural Biology, Center for Cancer Research, National Cancer Institute, Frederick, MD, USA
| | - Ping Yu
- Protein-Nucleic Acid Interaction Section, Center for Structural Biology, Center for Cancer Research, National Cancer Institute, Frederick, MD, USA
| | - William F Heinz
- Optical Microscopy and Analysis Laboratory, Cancer Research Technology Program, Frederick National Laboratory for Cancer Research, Frederick, MD, USA
| | - Jason R Stagno
- Protein-Nucleic Acid Interaction Section, Center for Structural Biology, Center for Cancer Research, National Cancer Institute, Frederick, MD, USA
| | - Charles D Schwieters
- Computational Biomolecular Magnetic Resonance Core, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Bethesda, MD, USA
| | - Norman R Watts
- Protein Expression Laboratory, National Institute of Arthritis and Musculoskeletal and Skin Diseases, National Institutes of Health, Bethesda, MD, USA
| | - Paul T Wingfield
- Protein Expression Laboratory, National Institute of Arthritis and Musculoskeletal and Skin Diseases, National Institutes of Health, Bethesda, MD, USA
| | - Alan Rein
- Retrovirus Assembly Section, HIV Dynamics and Replication Program, National Cancer Institute, Frederick, MD, USA
| | - Jinwei Zhang
- Structural Biology of Noncoding RNAs and Ribonucleoproteins Section, Laboratory of Molecular Biology, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Bethesda, MD, USA
| | - Yun-Xing Wang
- Protein-Nucleic Acid Interaction Section, Center for Structural Biology, Center for Cancer Research, National Cancer Institute, Frederick, MD, USA.
| |
Collapse
|
3
|
Calvanese F, Lambert CN, Nghe P, Zamponi F, Weigt M. Towards parsimonious generative modeling of RNA families. Nucleic Acids Res 2024; 52:5465-5477. [PMID: 38661206 PMCID: PMC11162787 DOI: 10.1093/nar/gkae289] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2023] [Revised: 03/05/2024] [Accepted: 04/05/2024] [Indexed: 04/26/2024] Open
Abstract
Generative probabilistic models emerge as a new paradigm in data-driven, evolution-informed design of biomolecular sequences. This paper introduces a novel approach, called Edge Activation Direct Coupling Analysis (eaDCA), tailored to the characteristics of RNA sequences, with a strong emphasis on simplicity, efficiency, and interpretability. eaDCA explicitly constructs sparse coevolutionary models for RNA families, achieving performance levels comparable to more complex methods while utilizing a significantly lower number of parameters. Our approach demonstrates efficiency in generating artificial RNA sequences that closely resemble their natural counterparts in both statistical analyses and SHAPE-MaP experiments, and in predicting the effect of mutations. Notably, eaDCA provides a unique feature: estimating the number of potential functional sequences within a given RNA family. For example, in the case of cyclic di-AMP riboswitches (RF00379), our analysis suggests the existence of approximately 1039 functional nucleotide sequences. While huge compared to the known <4000 natural sequences, this number represents only a tiny fraction of the vast pool of nearly 1082 possible nucleotide sequences of the same length (136 nucleotides). These results underscore the promise of sparse and interpretable generative models, such as eaDCA, in enhancing our understanding of the expansive RNA sequence space.
Collapse
Affiliation(s)
- Francesco Calvanese
- Sorbonne Université, CNRS, Institut de Biologie Paris-Seine, Laboratoire de Biologie Computationnelle et Quantitative – LCQB, Paris, France
- Laboratoire de Biophysique et Evolution, UMR CNRS-ESPCI 8231 Chimie Biologie Innovation, PSL University, Paris, France
| | - Camille N Lambert
- Laboratoire de Biophysique et Evolution, UMR CNRS-ESPCI 8231 Chimie Biologie Innovation, PSL University, Paris, France
| | - Philippe Nghe
- Laboratoire de Biophysique et Evolution, UMR CNRS-ESPCI 8231 Chimie Biologie Innovation, PSL University, Paris, France
| | - Francesco Zamponi
- Dipartimento di Fisica, Sapienza Università di Roma, Rome, Italy
- Laboratoire de Physique de l’Ecole Normale Supérieure, ENS, Université PSL, CNRS, Sorbonne Université, Université de Paris, Paris, France
| | - Martin Weigt
- Sorbonne Université, CNRS, Institut de Biologie Paris-Seine, Laboratoire de Biologie Computationnelle et Quantitative – LCQB, Paris, France
| |
Collapse
|
4
|
Taubert O, von der Lehr F, Bazarova A, Faber C, Knechtges P, Weiel M, Debus C, Coquelin D, Basermann A, Streit A, Kesselheim S, Götz M, Schug A. RNA contact prediction by data efficient deep learning. Commun Biol 2023; 6:913. [PMID: 37674020 PMCID: PMC10482910 DOI: 10.1038/s42003-023-05244-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2023] [Accepted: 08/14/2023] [Indexed: 09/08/2023] Open
Abstract
On the path to full understanding of the structure-function relationship or even design of RNA, structure prediction would offer an intriguing complement to experimental efforts. Any deep learning on RNA structure, however, is hampered by the sparsity of labeled training data. Utilizing the limited data available, we here focus on predicting spatial adjacencies ("contact maps") as a proxy for 3D structure. Our model, BARNACLE, combines the utilization of unlabeled data through self-supervised pre-training and efficient use of the sparse labeled data through an XGBoost classifier. BARNACLE shows a considerable improvement over both the established classical baseline and a deep neural network. In order to demonstrate that our approach can be applied to tasks with similar data constraints, we show that our findings generalize to the related setting of accessible surface area prediction.
Collapse
Affiliation(s)
- Oskar Taubert
- Steinbuch Centre for Computing (SCC), Karlsruhe Institute of Technology, 76344, Eggenstein-Leopoldshafen, Germany
| | - Fabrice von der Lehr
- Institute for Software Technology (SC), German Aerospace Centre (DLR), 51147, Köln, Germany
| | - Alina Bazarova
- Jülich Supercomputing Centre, Forschungszentrum Jülich, 52428, Jülich, Germany
- Helmholtz AI, 81675, Munich, Germany
| | - Christian Faber
- Jülich Supercomputing Centre, Forschungszentrum Jülich, 52428, Jülich, Germany
| | - Philipp Knechtges
- Institute for Software Technology (SC), German Aerospace Centre (DLR), 51147, Köln, Germany
- Helmholtz AI, 81675, Munich, Germany
| | - Marie Weiel
- Steinbuch Centre for Computing (SCC), Karlsruhe Institute of Technology, 76344, Eggenstein-Leopoldshafen, Germany
- Helmholtz AI, 81675, Munich, Germany
| | - Charlotte Debus
- Steinbuch Centre for Computing (SCC), Karlsruhe Institute of Technology, 76344, Eggenstein-Leopoldshafen, Germany
- Helmholtz AI, 81675, Munich, Germany
| | - Daniel Coquelin
- Steinbuch Centre for Computing (SCC), Karlsruhe Institute of Technology, 76344, Eggenstein-Leopoldshafen, Germany
- Helmholtz AI, 81675, Munich, Germany
| | - Achim Basermann
- Institute for Software Technology (SC), German Aerospace Centre (DLR), 51147, Köln, Germany
| | - Achim Streit
- Steinbuch Centre for Computing (SCC), Karlsruhe Institute of Technology, 76344, Eggenstein-Leopoldshafen, Germany
| | - Stefan Kesselheim
- Jülich Supercomputing Centre, Forschungszentrum Jülich, 52428, Jülich, Germany
- Helmholtz AI, 81675, Munich, Germany
| | - Markus Götz
- Steinbuch Centre for Computing (SCC), Karlsruhe Institute of Technology, 76344, Eggenstein-Leopoldshafen, Germany.
- Helmholtz AI, 81675, Munich, Germany.
| | - Alexander Schug
- Jülich Supercomputing Centre, Forschungszentrum Jülich, 52428, Jülich, Germany.
- Faculty of Biology, University of Duisburg-Essen, 45117, Essen, Germany.
| |
Collapse
|
5
|
Degenhardt MFS, Degenhardt HF, Bhandari YR, Lee YT, Ding J, Heinz WF, Stagno JR, Schwieters CD, Zhang J, Wang YX. Determining structures of individual RNA conformers using atomic force microscopy images and deep neural networks. RESEARCH SQUARE 2023:rs.3.rs-2798658. [PMID: 37425706 PMCID: PMC10327248 DOI: 10.21203/rs.3.rs-2798658/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/11/2023]
Abstract
The vast percentage of the human genome is transcribed into RNA, many of which contain various structural elements and are important for functions. RNA molecules are conformationally heterogeneous and functionally dyanmics1, even when they are structured and well-folded2, which limit the applicability of methods such as NMR, crystallography, or cryo-EM. Moreover, because of the lack of a large structure RNA database, and no clear correlation between sequence and structure, approaches like AlphaFold3 for protein structure prediction, do not apply to RNA. Therefore determining the structures of heterogeneous RNA is an unmet challenge. Here we report a novel method of determining RNA three-dimensional topological structures using deep neural networks and atomic force microscopy (AFM) images of individual RNA molecules in solution. Owing to the high signal-to-noise ratio of AFM, our method is ideal for capturing structures of individual conformationally heterogeneous RNA. We show that our method can determine 3D topological structures of any large folded RNA conformers, from ~ 200 to ~ 420 residues, the size range that most functional RNA structures or structural elements fall into. Thus our method addresses one of the major challenges in frontier RNA structural biology and may impact our fundamental understanding of RNA structure.
Collapse
Affiliation(s)
- Maximilia F S Degenhardt
- Protein-Nucleic Acid Interaction Section, Center for Structural Biology, National Cancer Institute; Frederick, USA
| | - Hermann F Degenhardt
- Protein-Nucleic Acid Interaction Section, Center for Structural Biology, National Cancer Institute; Frederick, USA
| | - Yuba R Bhandari
- Protein-Nucleic Acid Interaction Section, Center for Structural Biology, National Cancer Institute; Frederick, USA
| | - Yun-Tzai Lee
- Protein-Nucleic Acid Interaction Section, Center for Structural Biology, National Cancer Institute; Frederick, USA
| | - Jienyu Ding
- Protein-Nucleic Acid Interaction Section, Center for Structural Biology, National Cancer Institute; Frederick, USA
| | - William F Heinz
- Optical Microscopy and Analysis Laboratory, Cancer Research Technology Program, Frederick National Laboratory for Cancer Research, Frederick, MD 21702, USA
| | - Jason R Stagno
- Protein-Nucleic Acid Interaction Section, Center for Structural Biology, National Cancer Institute; Frederick, USA
| | - Charles D Schwieters
- Computational Biomolecular Magnetic Resonance Core, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health; Bethesda, USA
| | - Jinwei Zhang
- Structural Biology of Noncoding RNAs and Ribonucleoproteins Section, Laboratory of Molecular Biology, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health; Bethesda, USA
| | - Yun-Xing Wang
- Protein-Nucleic Acid Interaction Section, Center for Structural Biology, National Cancer Institute; Frederick, USA
| |
Collapse
|
6
|
Singh J, Paliwal K, Litfin T, Singh J, Zhou Y. Predicting RNA distance-based contact maps by integrated deep learning on physics-inferred secondary structure and evolutionary-derived mutational coupling. Bioinformatics 2022; 38:3900-3910. [PMID: 35751593 PMCID: PMC9364379 DOI: 10.1093/bioinformatics/btac421] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2021] [Revised: 04/30/2022] [Accepted: 06/28/2022] [Indexed: 12/24/2022] Open
Abstract
MOTIVATION Recently, AlphaFold2 achieved high experimental accuracy for the majority of proteins in Critical Assessment of Structure Prediction (CASP 14). This raises the hope that one day, we may achieve the same feat for RNA structure prediction for those structured RNAs, which is as fundamentally and practically important similar to protein structure prediction. One major factor in the recent advancement of protein structure prediction is the highly accurate prediction of distance-based contact maps of proteins. RESULTS Here, we showed that by integrated deep learning with physics-inferred secondary structures, co-evolutionary information and multiple sequence-alignment sampling, we can achieve RNA contact-map prediction at a level of accuracy similar to that in protein contact-map prediction. More importantly, highly accurate prediction for top L long-range contacts can be assured for those RNAs with a high effective number of homologous sequences (Neff > 50). The initial use of the predicted contact map as distance-based restraints confirmed its usefulness in 3D structure prediction. AVAILABILITY AND IMPLEMENTATION SPOT-RNA-2D is available as a web server at https://sparks-lab.org/server/spot-rna-2d/ and as a standalone program at https://github.com/jaswindersingh2/SPOT-RNA-2D. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | | | - Thomas Litfin
- Institute for Glycomics, Griffith University, Parklands Dr. Southport, QLD 4222, Australia
| | - Jaspreet Singh
- Signal Processing Laboratory, School of Engineering and Built Environment, Griffith University, Brisbane, QLD 4111, Australia
| | - Yaoqi Zhou
- To whom correspondence should be addressed. or or
| |
Collapse
|
7
|
Si Y, Zhang Y, Yan C. A reproducibility analysis-based statistical framework for residue-residue evolutionary coupling detection. Brief Bioinform 2022; 23:6509046. [PMID: 35037015 DOI: 10.1093/bib/bbab576] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2021] [Revised: 11/26/2021] [Accepted: 12/15/2021] [Indexed: 11/14/2022] Open
Abstract
Direct coupling analysis (DCA) has been widely used to infer evolutionary coupled residue pairs from the multiple sequence alignment (MSA) of homologous sequences. However, effectively selecting residue pairs with significant evolutionary couplings according to the result of DCA is a non-trivial task. In this study, we developed a general statistical framework for significant evolutionary coupling detection, referred to as irreproducible discovery rate (IDR)-DCA, which is based on reproducibility analysis of the coupling scores obtained from DCA on manually created MSA replicates. IDR-DCA was applied to select residue pairs for contact prediction for monomeric proteins, protein-protein interactions and monomeric RNAs, in which three different versions of DCA were applied. We demonstrated that with the application of IDR-DCA, the residue pairs selected using a universal threshold always yielded stable performance for contact prediction. Comparing with the application of carefully tuned coupling score cutoffs, IDR-DCA always showed better performance. The robustness of IDR-DCA was also supported through the MSA downsampling analysis. We further demonstrated the effectiveness of applying constraints obtained from residue pairs selected by IDR-DCA to assist RNA secondary structure prediction.
Collapse
Affiliation(s)
- Yunda Si
- School of Physics, Huazhong University of Science and Technology, Wuhan, Hubei 430074, China
| | - Yi Zhang
- School of Physics, Huazhong University of Science and Technology, Wuhan, Hubei 430074, China
| | - Chengfei Yan
- School of Physics, Huazhong University of Science and Technology, Wuhan, Hubei 430074, China
| |
Collapse
|
8
|
Zerihun MB, Pucci F, Schug A. CoCoNet-boosting RNA contact prediction by convolutional neural networks. Nucleic Acids Res 2021; 49:12661-12672. [PMID: 34871451 PMCID: PMC8682773 DOI: 10.1093/nar/gkab1144] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2020] [Revised: 10/27/2021] [Accepted: 11/05/2021] [Indexed: 11/24/2022] Open
Abstract
Co-evolutionary models such as direct coupling analysis (DCA) in combination with machine learning (ML) techniques based on deep neural networks are able to predict accurate protein contact or distance maps. Such information can be used as constraints in structure prediction and massively increase prediction accuracy. Unfortunately, the same ML methods cannot readily be applied to RNA as they rely on large structural datasets only available for proteins. Here, we demonstrate how the available smaller data for RNA can be used to improve prediction of RNA contact maps. We introduce an algorithm called CoCoNet that is based on a combination of a Coevolutionary model and a shallow Convolutional Neural Network. Despite its simplicity and the small number of trained parameters, the method boosts the positive predictive value (PPV) of predicted contacts by about 70% with respect to DCA as tested by cross-validation of about eighty RNA structures. However, the direct inclusion of the CoCoNet contacts in 3D modeling tools does not result in a proportional increase of the 3D RNA structure prediction accuracy. Therefore, we suggest that the field develops, in addition to contact PPV, metrics which estimate the expected impact for 3D structure modeling tools better. CoCoNet is freely available and can be found at https://github.com/KIT-MBS/coconet.
Collapse
Affiliation(s)
- Mehari B Zerihun
- John von Neumann Institute for Computing, Jülich Supercomputing Centre, Forschungszentrum Jülich, 52428 Jülich, Germany.,Steinbuch Centre for Computing, Karlsruhe Institute of Technology, 76344 Eggenstein-Leopoldshafen, Germany
| | - Fabrizio Pucci
- John von Neumann Institute for Computing, Jülich Supercomputing Centre, Forschungszentrum Jülich, 52428 Jülich, Germany.,Computational Biology and Bioinformatics, Université Libre de Bruxelles 1050, Brussels, Belgium
| | - Alexander Schug
- John von Neumann Institute for Computing, Jülich Supercomputing Centre, Forschungszentrum Jülich, 52428 Jülich, Germany.,Faculty of Biology, University of Duisburg-Essen, 45117 Essen, Germany
| |
Collapse
|
9
|
Sanbonmatsu K. Getting to the bottom of lncRNA mechanism: structure-function relationships. Mamm Genome 2021; 33:343-353. [PMID: 34642784 PMCID: PMC8509902 DOI: 10.1007/s00335-021-09924-x] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2021] [Accepted: 09/28/2021] [Indexed: 12/14/2022]
Abstract
While long non-coding RNAs are known to play key roles in disease and development, relatively few structural studies have been performed for this important class of RNAs. Here, we review functional studies of long non-coding RNAs and expose the need for high-resolution 3-D structural studies, discussing the roles of long non-coding RNAs in the cell and how structure–function relationships might be used to elucidate further understanding. We then describe structural studies of other classes of RNAs using chemical probing, nuclear magnetic resonance, small-angle X-ray scattering, X-ray crystallography, and cryogenic electron microscopy (cryo-EM). Next, we review early structural studies of long non-coding RNAs to date and describe the way forward for the structural biology of long non-coding RNAs in terms of cryo-EM.
Collapse
|
10
|
Townshend RJL, Eismann S, Watkins AM, Rangan R, Karelina M, Das R, Dror RO. Geometric deep learning of RNA structure. Science 2021; 373:1047-1051. [PMID: 34446608 PMCID: PMC9829186 DOI: 10.1126/science.abe5650] [Citation(s) in RCA: 190] [Impact Index Per Article: 47.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2020] [Accepted: 07/14/2021] [Indexed: 01/28/2023]
Abstract
RNA molecules adopt three-dimensional structures that are critical to their function and of interest in drug discovery. Few RNA structures are known, however, and predicting them computationally has proven challenging. We introduce a machine learning approach that enables identification of accurate structural models without assumptions about their defining characteristics, despite being trained with only 18 known RNA structures. The resulting scoring function, the Atomic Rotationally Equivariant Scorer (ARES), substantially outperforms previous methods and consistently produces the best results in community-wide blind RNA structure prediction challenges. By learning effectively even from a small amount of data, our approach overcomes a major limitation of standard deep neural networks. Because it uses only atomic coordinates as inputs and incorporates no RNA-specific information, this approach is applicable to diverse problems in structural biology, chemistry, materials science, and beyond.
Collapse
Affiliation(s)
| | - Stephan Eismann
- Department of Computer Science, Stanford University, Stanford, CA, USA
- Department of Computer Science, Stanford University, Stanford, CA, USA
- Department of Applied Physics, Stanford University, Stanford, CA, USA
| | - Andrew M Watkins
- Department of Biochemistry, Stanford University, Stanford, CA, USA
| | - Ramya Rangan
- Department of Biochemistry, Stanford University, Stanford, CA, USA
- Biophysics Program, Stanford University, Stanford, CA, USA
| | - Masha Karelina
- Department of Computer Science, Stanford University, Stanford, CA, USA
- Biophysics Program, Stanford University, Stanford, CA, USA
| | - Rhiju Das
- Department of Biochemistry, Stanford University, Stanford, CA, USA.
- Department of Physics, Stanford University, Stanford, CA, USA
| | - Ron O Dror
- Department of Computer Science, Stanford University, Stanford, CA, USA.
- Department of Structural Biology, Stanford University, Stanford, CA, USA
- Department of Molecular and Cellular Physiology, Stanford University, Stanford, CA, USA
- Institute for Computational and Mathematical Engineering, Stanford University, Stanford, CA, USA
| |
Collapse
|