1
|
Antczak M, Szachniuk M. Toward Increasing the Credibility of RNA Design. Methods Mol Biol 2025; 2847:137-151. [PMID: 39312141 DOI: 10.1007/978-1-0716-4079-1_9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/25/2024]
Abstract
In the problem of RNA design, also known as inverse folding, RNA sequences are predicted that achieve the desired secondary structure at the lowest possible free energy and under certain constraints. The designed sequences have applications in synthetic biology and RNA-based nanotechnologies. There are also known cases of the successful use of inverse folding to discover previously unknown noncoding RNAs. Several computational methods have been dedicated to the problem of RNA design. They differ by algorithm and additional parameters, e.g., those determining the goal function in the sequence optimization process. Users can obtain many promising RNA sequences quite easily. The more difficult issue is to critically evaluate them and select the most favorable and reliable sequence that form1s the expected RNA structure. The latter problem is addressed in this paper. We propose an RNA design protocol extended to include sequence evaluation, for which a 3D structure is used. Experiments show that the accuracy of RNA design can be improved by adding a 3D structure prediction and analysis step.
Collapse
Affiliation(s)
- Maciej Antczak
- Institute of Computing Science, Poznan University of Technology, Poznan, Poland
| | - Marta Szachniuk
- Institute of Bioorganic Chemistry, Polish Academy of Sciences, Poznan, Poland.
| |
Collapse
|
2
|
Liang F, Sun M, Xie L, Zhao X, Liu D, Zhao K, Zhang G. Recent advances and challenges in protein complex model accuracy estimation. Comput Struct Biotechnol J 2024; 23:1824-1832. [PMID: 38707538 PMCID: PMC11066466 DOI: 10.1016/j.csbj.2024.04.049] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2024] [Revised: 04/18/2024] [Accepted: 04/18/2024] [Indexed: 05/07/2024] Open
Abstract
Estimation of model accuracy plays a crucial role in protein structure prediction, aiming to evaluate the quality of predicted protein structure models accurately and objectively. This process is not only key to screening candidate models that are close to the real structure, but also provides guidance for further optimization of protein structures. With the significant advancements made by AlphaFold2 in monomer structure, the problem of single-domain protein structure prediction has been widely solved. Correspondingly, the importance of assessing the quality of single-domain protein models decreased, and the research focus has shifted to estimation of model accuracy of protein complexes. In this review, our goal is to provide a comprehensive overview of the reference and statistical metrics, as well as representative methods, and the current challenges within four distinct facets (Topology Global Score, Interface Total Score, Interface Residue-Wise Score, and Tertiary Residue-Wise Score) in the field of complex EMA.
Collapse
Affiliation(s)
| | | | - Lei Xie
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Xuanfeng Zhao
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Dong Liu
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Kailong Zhao
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Guijun Zhang
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| |
Collapse
|
3
|
Tarafder S, Bhattacharya D. lociPARSE: A Locality-aware Invariant Point Attention Model for Scoring RNA 3D Structures. J Chem Inf Model 2024; 64:8655-8664. [PMID: 39523843 PMCID: PMC11600500 DOI: 10.1021/acs.jcim.4c01621] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2024] [Revised: 10/17/2024] [Accepted: 10/29/2024] [Indexed: 11/16/2024]
Abstract
A scoring function that can reliably assess the accuracy of a 3D RNA structural model in the absence of experimental structure is not only important for model evaluation and selection but also useful for scoring-guided conformational sampling. However, high-fidelity RNA scoring has proven to be difficult using conventional knowledge-based statistical potentials and currently available machine learning-based approaches. Here, we present lociPARSE, a locality-aware invariant point attention architecture for scoring RNA 3D structures. Unlike existing machine learning methods that estimate superposition-based root-mean-square deviation (RMSD), lociPARSE estimates Local Distance Difference Test (lDDT) scores capturing the accuracy of each nucleotide and its surrounding local atomic environment in a superposition-free manner, before aggregating information to predict global structural accuracy. Tested on multiple datasets including CASP15, lociPARSE significantly outperforms existing statistical potentials (rsRNASP, cgRNASP, DFIRE-RNA, and RASP) and machine learning methods (ARES and RNA3DCNN) across complementary assessment metrics. lociPARSE is freely available at https://github.com/Bhattacharya-Lab/lociPARSE.
Collapse
Affiliation(s)
- Sumit Tarafder
- Department of Computer Science, Virginia Tech, Blacksburg, Virginia 24061, United States
| | - Debswapna Bhattacharya
- Department of Computer Science, Virginia Tech, Blacksburg, Virginia 24061, United States
| |
Collapse
|
4
|
Liu G, Mu KL, Ran F, Liu JM, Zhou LL, Peng LQ, Feng G, Liu YC, Wei FD, Zhu LL, Zhang XY, Zhang YP, Sun QW. The hemostatic activity and Mechanistic roles of glucosyloxybenzyl 2-isobutylmalate extract (BSCE) from Bletilla striata (Thunb.) Rchb.f. in Inhibiting pulmonary hemorrhage. Heliyon 2024; 10:e38203. [PMID: 39381249 PMCID: PMC11459001 DOI: 10.1016/j.heliyon.2024.e38203] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2024] [Revised: 09/19/2024] [Accepted: 09/19/2024] [Indexed: 10/10/2024] Open
Abstract
Background Hemorrhagic events cause numerous deaths annually worldwide, highlighting the urgent need for effective hemostatic drugs. The glucosyloxybenzyl 2-isobutylmalates Control Extract (BSCE) from the orchid plant Bletilla striata (Thunb.) Rchb.f. has demonstrated significant hemostatic activity in both in vitro and in vivo studies. However, the effect and mechanism of BSCE on non-traumatic bleeding remain unclear. Methods Pulmonary hemorrhage was induced in 40 Sprague-Dawley rats by administering Zingiber officinale Roscoe. for 14 days. These rats were then randomly divided into five groups: model (Mod), positive control (YNBY), and BSCE low, medium, and high-dose groups. An additional 8 rats served as the control group (Con). The BSCE groups received different doses of BSCE for 10 days, while the YNBY group received Yunnan Baiyao suspension. The effects on body weight, food and water intake, red blood cell count (RBC), hemoglobin concentration (HGB), lung tissue pathology, platelet count, coagulation parameters, and fibrinolytic system markers were evaluated. Network pharmacology and molecular docking analyses were also conducted to identify potential targets and pathways involved in BSCE's effects. Results BSCE treatment significantly improved body weight, food intake, and water consumption in rats with pulmonary hemorrhage. RBC and HGB levels increased significantly in the BSCE medium and high-dose groups compared to the Mod group (P < 0.05). Pathological examination revealed that BSCE reduced lung tissue hemorrhage and inflammation, with improvements in alveolar structure. BSCE also positively affected platelet count, thrombin time (TT), activated partial thromboplastin time (APTT), fibrinogen (FIB) levels, and fibrinolytic markers (D-dimer, PAI-1, and t-PA). Network pharmacology and molecular docking identified key targets such as MMPs, CASPs, and pathways including IL-17 and TNF signaling, suggesting BSCE's involvement in hemostasis and anti-inflammatory processes. Conclusions BSCE exhibits significant hemostatic and protective effects on Z.officinale-induced pulmonary hemorrhage in rats by improving hematological parameters, reducing lung tissue damage, and modulating the coagulation and fibrinolytic systems. The study provides evidence supporting the potential of BSCE as a therapeutic agent for hemorrhagic diseases, with its efficacy linked to multi-target and multi-pathway interactions.
Collapse
Affiliation(s)
| | | | - Fei Ran
- Guizhou University of Traditional Chinese Medicine, Guiyang, 550025, Guizhou, China
| | - Jin-mei Liu
- Guizhou University of Traditional Chinese Medicine, Guiyang, 550025, Guizhou, China
| | - Ling-li Zhou
- Guizhou University of Traditional Chinese Medicine, Guiyang, 550025, Guizhou, China
| | - Le-qiang Peng
- Guizhou University of Traditional Chinese Medicine, Guiyang, 550025, Guizhou, China
| | - Guo Feng
- Guizhou University of Traditional Chinese Medicine, Guiyang, 550025, Guizhou, China
| | - Yu-chen Liu
- Guizhou University of Traditional Chinese Medicine, Guiyang, 550025, Guizhou, China
| | - Fu-dao Wei
- Guizhou University of Traditional Chinese Medicine, Guiyang, 550025, Guizhou, China
| | - Ling-li Zhu
- Guizhou University of Traditional Chinese Medicine, Guiyang, 550025, Guizhou, China
| | - Xin-yue Zhang
- Guizhou University of Traditional Chinese Medicine, Guiyang, 550025, Guizhou, China
| | - Yong-ping Zhang
- Guizhou University of Traditional Chinese Medicine, Guiyang, 550025, Guizhou, China
| | - Qing-wen Sun
- Guizhou University of Traditional Chinese Medicine, Guiyang, 550025, Guizhou, China
| |
Collapse
|
5
|
Rosignoli S, Pacelli M, Manganiello F, Paiardini A. An outlook on structural biology after AlphaFold: tools, limits and perspectives. FEBS Open Bio 2024. [PMID: 39313455 DOI: 10.1002/2211-5463.13902] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2024] [Revised: 08/19/2024] [Accepted: 09/13/2024] [Indexed: 09/25/2024] Open
Abstract
AlphaFold and similar groundbreaking, AI-based tools, have revolutionized the field of structural bioinformatics, with their remarkable accuracy in ab-initio protein structure prediction. This success has catalyzed the development of new software and pipelines aimed at incorporating AlphaFold's predictions, often focusing on addressing the algorithm's remaining challenges. Here, we present the current landscape of structural bioinformatics shaped by AlphaFold, and discuss how the field is dynamically responding to this revolution, with new software, methods, and pipelines. While the excitement around AI-based tools led to their widespread application, it is essential to acknowledge that their practical success hinges on their integration into established protocols within structural bioinformatics, often neglected in the context of AI-driven advancements. Indeed, user-driven intervention is still as pivotal in the structure prediction process as in complementing state-of-the-art algorithms with functional and biological knowledge.
Collapse
Affiliation(s)
- Serena Rosignoli
- Department of Biochemical sciences "A. Rossi Fanelli", Sapienza Università di Roma, Italy
| | - Maddalena Pacelli
- Department of Biochemical sciences "A. Rossi Fanelli", Sapienza Università di Roma, Italy
| | - Francesca Manganiello
- Department of Biochemical sciences "A. Rossi Fanelli", Sapienza Università di Roma, Italy
| | - Alessandro Paiardini
- Department of Biochemical sciences "A. Rossi Fanelli", Sapienza Università di Roma, Italy
| |
Collapse
|
6
|
Szikszai M, Magnus M, Sanghi S, Kadyan S, Bouatta N, Rivas E. RNA3DB: A structurally-dissimilar dataset split for training and benchmarking deep learning models for RNA structure prediction. J Mol Biol 2024; 436:168552. [PMID: 38552946 PMCID: PMC11377173 DOI: 10.1016/j.jmb.2024.168552] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2024] [Revised: 03/19/2024] [Accepted: 03/22/2024] [Indexed: 04/09/2024]
Abstract
With advances in protein structure prediction thanks to deep learning models like AlphaFold, RNA structure prediction has recently received increased attention from deep learning researchers. RNAs introduce substantial challenges due to the sparser availability and lower structural diversity of the experimentally resolved RNA structures in comparison to protein structures. These challenges are often poorly addressed by the existing literature, many of which report inflated performance due to using training and testing sets with significant structural overlap. Further, the most recent Critical Assessment of Structure Prediction (CASP15) has shown that deep learning models for RNA structure are currently outperformed by traditional methods. In this paper we present RNA3DB, a dataset of structured RNAs, derived from the Protein Data Bank (PDB), that is designed for training and benchmarking deep learning models. The RNA3DB method arranges the RNA 3D chains into distinct groups (Components) that are non-redundant both with regard to sequence as well as structure, providing a robust way of dividing training, validation, and testing sets. Any split of these structurally-dissimilar Components are guaranteed to produce test and validations sets that are distinct by sequence and structure from those in the training set. We provide the RNA3DB dataset, a particular train/test split of the RNA3DB Components (in an approximate 70/30 ratio) that will be updated periodically. We also provide the RNA3DB methodology along with the source-code, with the goal of creating a reproducible and customizable tool for producing structurally-dissimilar dataset splits for structural RNAs.
Collapse
Affiliation(s)
- Marcell Szikszai
- Department of Molecular and Cellular Biology, Harvard University, Cambridge, 02138, MA, USA
| | - Marcin Magnus
- Department of Molecular and Cellular Biology, Harvard University, Cambridge, 02138, MA, USA
| | - Siddhant Sanghi
- Department of Systems Biology, Columbia University, New York 10027, NY, USA; College of Biological Sciences, UC Davis, Davis 95616, CA, USA
| | - Sachin Kadyan
- Department of Systems Biology, Columbia University, New York 10027, NY, USA
| | - Nazim Bouatta
- Laboratory of Systems Pharmacology, Harvard Medical School, Boston 02115, MA, USA
| | - Elena Rivas
- Department of Molecular and Cellular Biology, Harvard University, Cambridge, 02138, MA, USA
| |
Collapse
|
7
|
Wang B, Li W. Advances in the Application of Protein Language Modeling for Nucleic Acid Protein Binding Site Prediction. Genes (Basel) 2024; 15:1090. [PMID: 39202449 PMCID: PMC11353971 DOI: 10.3390/genes15081090] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2024] [Revised: 08/13/2024] [Accepted: 08/14/2024] [Indexed: 09/03/2024] Open
Abstract
Protein and nucleic acid binding site prediction is a critical computational task that benefits a wide range of biological processes. Previous studies have shown that feature selection holds particular significance for this prediction task, making the generation of more discriminative features a key area of interest for many researchers. Recent progress has shown the power of protein language models in handling protein sequences, in leveraging the strengths of attention networks, and in successful applications to tasks such as protein structure prediction. This naturally raises the question of the applicability of protein language models in predicting protein and nucleic acid binding sites. Various approaches have explored this potential. This paper first describes the development of protein language models. Then, a systematic review of the latest methods for predicting protein and nucleic acid binding sites is conducted by covering benchmark sets, feature generation methods, performance comparisons, and feature ablation studies. These comparisons demonstrate the importance of protein language models for the prediction task. Finally, the paper discusses the challenges of protein and nucleic acid binding site prediction and proposes possible research directions and future trends. The purpose of this survey is to furnish researchers with actionable suggestions for comprehending the methodologies used in predicting protein-nucleic acid binding sites, fostering the creation of protein-centric language models, and tackling real-world obstacles encountered in this field.
Collapse
Affiliation(s)
| | - Wenjin Li
- Institute for Advanced Study, Shenzhen University, Shenzhen 518061, China;
| |
Collapse
|
8
|
Zitnik M, Li MM, Wells A, Glass K, Morselli Gysi D, Krishnan A, Murali TM, Radivojac P, Roy S, Baudot A, Bozdag S, Chen DZ, Cowen L, Devkota K, Gitter A, Gosline SJC, Gu P, Guzzi PH, Huang H, Jiang M, Kesimoglu ZN, Koyuturk M, Ma J, Pico AR, Pržulj N, Przytycka TM, Raphael BJ, Ritz A, Sharan R, Shen Y, Singh M, Slonim DK, Tong H, Yang XH, Yoon BJ, Yu H, Milenković T. Current and future directions in network biology. BIOINFORMATICS ADVANCES 2024; 4:vbae099. [PMID: 39143982 PMCID: PMC11321866 DOI: 10.1093/bioadv/vbae099] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/27/2023] [Revised: 05/31/2024] [Accepted: 07/08/2024] [Indexed: 08/16/2024]
Abstract
Summary Network biology is an interdisciplinary field bridging computational and biological sciences that has proved pivotal in advancing the understanding of cellular functions and diseases across biological systems and scales. Although the field has been around for two decades, it remains nascent. It has witnessed rapid evolution, accompanied by emerging challenges. These stem from various factors, notably the growing complexity and volume of data together with the increased diversity of data types describing different tiers of biological organization. We discuss prevailing research directions in network biology, focusing on molecular/cellular networks but also on other biological network types such as biomedical knowledge graphs, patient similarity networks, brain networks, and social/contact networks relevant to disease spread. In more detail, we highlight areas of inference and comparison of biological networks, multimodal data integration and heterogeneous networks, higher-order network analysis, machine learning on networks, and network-based personalized medicine. Following the overview of recent breakthroughs across these five areas, we offer a perspective on future directions of network biology. Additionally, we discuss scientific communities, educational initiatives, and the importance of fostering diversity within the field. This article establishes a roadmap for an immediate and long-term vision for network biology. Availability and implementation Not applicable.
Collapse
Affiliation(s)
- Marinka Zitnik
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA 02115, United States
| | - Michelle M Li
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA 02115, United States
| | - Aydin Wells
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN 46556, United States
- Lucy Family Institute for Data and Society, University of Notre Dame, Notre Dame, IN 46556, United States
- Eck Institute for Global Health, University of Notre Dame, Notre Dame, IN 46556, United States
| | - Kimberly Glass
- Channing Division of Network Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA 02115, United States
| | - Deisy Morselli Gysi
- Channing Division of Network Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA 02115, United States
- Department of Statistics, Federal University of Paraná, Curitiba, Paraná 81530-015, Brazil
- Department of Physics, Northeastern University, Boston, MA 02115, United States
| | - Arjun Krishnan
- Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, United States
| | - T M Murali
- Department of Computer Science, Virginia Tech, Blacksburg, VA 24061, United States
| | - Predrag Radivojac
- Khoury College of Computer Sciences, Northeastern University, Boston, MA 02115, United States
| | - Sushmita Roy
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI 53715, United States
- Wisconsin Institute for Discovery, Madison, WI 53715, United States
| | - Anaïs Baudot
- Aix Marseille Université, INSERM, MMG, Marseille, France
| | - Serdar Bozdag
- Department of Computer Science and Engineering, University of North Texas, Denton, TX 76203, United States
- Department of Mathematics, University of North Texas, Denton, TX 76203, United States
| | - Danny Z Chen
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN 46556, United States
| | - Lenore Cowen
- Department of Computer Science, Tufts University, Medford, MA 02155, United States
| | - Kapil Devkota
- Department of Computer Science, Tufts University, Medford, MA 02155, United States
| | - Anthony Gitter
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI 53715, United States
- Morgridge Institute for Research, Madison, WI 53715, United States
| | - Sara J C Gosline
- Biological Sciences Division, Pacific Northwest National Laboratory, Seattle, WA 98109, United States
| | - Pengfei Gu
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN 46556, United States
| | - Pietro H Guzzi
- Department of Medical and Surgical Sciences, University Magna Graecia of Catanzaro, Catanzaro, 88100, Italy
| | - Heng Huang
- Department of Computer Science, University of Maryland College Park, College Park, MD 20742, United States
| | - Meng Jiang
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN 46556, United States
| | - Ziynet Nesibe Kesimoglu
- Department of Computer Science and Engineering, University of North Texas, Denton, TX 76203, United States
- National Center of Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20814, United States
| | - Mehmet Koyuturk
- Department of Computer and Data Sciences, Case Western Reserve University, Cleveland, OH 44106, United States
| | - Jian Ma
- Ray and Stephanie Lane Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, United States
| | - Alexander R Pico
- Institute of Data Science and Biotechnology, Gladstone Institutes, San Francisco, CA 94158, United States
| | - Nataša Pržulj
- Department of Computer Science, University College London, London, WC1E 6BT, England
- ICREA, Catalan Institution for Research and Advanced Studies, Barcelona, 08010, Spain
- Barcelona Supercomputing Center (BSC), Barcelona, 08034, Spain
| | - Teresa M Przytycka
- National Center of Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20814, United States
| | - Benjamin J Raphael
- Department of Computer Science, Princeton University, Princeton, NJ 08544, United States
| | - Anna Ritz
- Department of Biology, Reed College, Portland, OR 97202, United States
| | - Roded Sharan
- School of Computer Science, Tel Aviv University, Tel Aviv, 69978, Israel
| | - Yang Shen
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX 77843, United States
| | - Mona Singh
- Department of Computer Science, Princeton University, Princeton, NJ 08544, United States
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08544, United States
| | - Donna K Slonim
- Department of Computer Science, Tufts University, Medford, MA 02155, United States
| | - Hanghang Tong
- Department of Computer Science, University of Illinois Urbana-Champaign, Urbana, IL 61801, United States
| | - Xinan Holly Yang
- Department of Pediatrics, University of Chicago, Chicago, IL 60637, United States
| | - Byung-Jun Yoon
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX 77843, United States
- Computational Science Initiative, Brookhaven National Laboratory, Upton, NY 11973, United States
| | - Haiyuan Yu
- Department of Computational Biology, Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, NY 14853, United States
| | - Tijana Milenković
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN 46556, United States
- Lucy Family Institute for Data and Society, University of Notre Dame, Notre Dame, IN 46556, United States
- Eck Institute for Global Health, University of Notre Dame, Notre Dame, IN 46556, United States
| |
Collapse
|
9
|
Nithin C, Kmiecik S, Błaszczyk R, Nowicka J, Tuszyńska I. Comparative analysis of RNA 3D structure prediction methods: towards enhanced modeling of RNA-ligand interactions. Nucleic Acids Res 2024; 52:7465-7486. [PMID: 38917327 PMCID: PMC11260495 DOI: 10.1093/nar/gkae541] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2024] [Revised: 05/23/2024] [Accepted: 06/16/2024] [Indexed: 06/27/2024] Open
Abstract
Accurate RNA structure models are crucial for designing small molecule ligands that modulate their functions. This study assesses six standalone RNA 3D structure prediction methods-DeepFoldRNA, RhoFold, BRiQ, FARFAR2, SimRNA and Vfold2, excluding web-based tools due to intellectual property concerns. We focus on reproducing the RNA structure existing in RNA-small molecule complexes, particularly on the ability to model ligand binding sites. Using a comprehensive set of RNA structures from the PDB, which includes diverse structural elements, we found that machine learning (ML)-based methods effectively predict global RNA folds but are less accurate with local interactions. Conversely, non-ML-based methods demonstrate higher precision in modeling intramolecular interactions, particularly with secondary structure restraints. Importantly, ligand-binding site accuracy can remain sufficiently high for practical use, even if the overall model quality is not optimal. With the recent release of AlphaFold 3, we included this advanced method in our tests. Benchmark subsets containing new structures, not used in the training of the tested ML methods, show that AlphaFold 3's performance was comparable to other ML-based methods, albeit with some challenges in accurately modeling ligand binding sites. This study underscores the importance of enhancing binding site prediction accuracy and the challenges in modeling RNA-ligand interactions accurately.
Collapse
Affiliation(s)
- Chandran Nithin
- Molecure SA, 02-089 Warsaw, Poland
- Laboratory of Computational Biology, Biological and Chemical Research Center, Faculty of Chemistry, University of Warsaw, 02-089 Warsaw, Poland
| | - Sebastian Kmiecik
- Laboratory of Computational Biology, Biological and Chemical Research Center, Faculty of Chemistry, University of Warsaw, 02-089 Warsaw, Poland
| | | | | | | |
Collapse
|
10
|
Waterhouse AM, Studer G, Robin X, Bienert S, Tauriello G, Schwede T. The structure assessment web server: for proteins, complexes and more. Nucleic Acids Res 2024; 52:W318-W323. [PMID: 38634802 PMCID: PMC11223858 DOI: 10.1093/nar/gkae270] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2024] [Revised: 03/21/2024] [Accepted: 04/02/2024] [Indexed: 04/19/2024] Open
Abstract
The 'structure assessment' web server is a one-stop shop for interactive evaluation and benchmarking of structural models of macromolecular complexes including proteins and nucleic acids. A user-friendly web dashboard links sequence with structure information and results from a variety of state-of-the-art tools, which facilitates the visual exploration and evaluation of structure models. The dashboard integrates stereochemistry information, secondary structure information, global and local model quality assessment of the tertiary structure of comparative protein models, as well as prediction of membrane location. In addition, a benchmarking mode is available where a model can be compared to a reference structure, providing easy access to scores that have been used in recent CASP experiments and CAMEO. The structure assessment web server is available at https://swissmodel.expasy.org/assess.
Collapse
Affiliation(s)
- Andrew M Waterhouse
- Biozentrum, University of Basel, Switzerland
- SIB Swiss Institute of Bioinformatics, Computational Structural Biology, Basel, Switzerland
| | - Gabriel Studer
- Biozentrum, University of Basel, Switzerland
- SIB Swiss Institute of Bioinformatics, Computational Structural Biology, Basel, Switzerland
| | - Xavier Robin
- Biozentrum, University of Basel, Switzerland
- SIB Swiss Institute of Bioinformatics, Computational Structural Biology, Basel, Switzerland
| | - Stefan Bienert
- Biozentrum, University of Basel, Switzerland
- SIB Swiss Institute of Bioinformatics, Computational Structural Biology, Basel, Switzerland
| | - Gerardo Tauriello
- Biozentrum, University of Basel, Switzerland
- SIB Swiss Institute of Bioinformatics, Computational Structural Biology, Basel, Switzerland
| | - Torsten Schwede
- Biozentrum, University of Basel, Switzerland
- SIB Swiss Institute of Bioinformatics, Computational Structural Biology, Basel, Switzerland
| |
Collapse
|
11
|
Zurkowski M, Swiercz M, Wozny F, Antczak M, Szachniuk M. RNAhugs web server for customized 3D RNA structure alignment. Nucleic Acids Res 2024; 52:W348-W353. [PMID: 38587206 PMCID: PMC11223877 DOI: 10.1093/nar/gkae259] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2024] [Revised: 03/05/2024] [Accepted: 03/27/2024] [Indexed: 04/09/2024] Open
Abstract
Alignment of 3D molecular structures involves overlaying their sets of atoms in space in such a way as to minimize the distance between the corresponding atoms. The purpose of this procedure is usually to analyze and assess structural similarity on a global (e.g. evaluating predicted 3D models and clustering structures) or a local level (e.g. searching for common substructures). Although the idea of alignment is simple, combinatorial algorithms that implement it require considerable computational resources, even when processing relatively small structures. In this paper, we introduce RNAhugs, a web server for custom and flexible alignment of 3D RNA structures. Using two efficient heuristics, GEOS and GENS, it finds the longest corresponding fragments within 3D structures that may differ in sizes-given in the PDB or PDBx/mmCIF formats-that manage to align with user-specified accuracy (i.e. with an RMSD not exceeding a cutoff value given as an input parameter). A distinctive advantage of the system lies in its ability to process multi-model files and compare the results of 1-25 alignments in a single task. RNAhugs has an intuitive interface and is publicly available at https://rnahugs.cs.put.poznan.pl/.
Collapse
Affiliation(s)
- Michal Zurkowski
- Institute of Computing Science and European Centre for Bioinformatics and Genomics, Poznan University of Technology, Piotrowo 2, 60-965 Poznan, Poland
| | - Mateusz Swiercz
- Institute of Computing Science and European Centre for Bioinformatics and Genomics, Poznan University of Technology, Piotrowo 2, 60-965 Poznan, Poland
| | - Filip Wozny
- Institute of Computing Science and European Centre for Bioinformatics and Genomics, Poznan University of Technology, Piotrowo 2, 60-965 Poznan, Poland
| | - Maciej Antczak
- Institute of Computing Science and European Centre for Bioinformatics and Genomics, Poznan University of Technology, Piotrowo 2, 60-965 Poznan, Poland
- Institute of Bioorganic Chemistry, Polish Academy of Sciences, Noskowskiego 12/14, 61-704 Poznan, Poland
| | - Marta Szachniuk
- Institute of Computing Science and European Centre for Bioinformatics and Genomics, Poznan University of Technology, Piotrowo 2, 60-965 Poznan, Poland
- Institute of Bioorganic Chemistry, Polish Academy of Sciences, Noskowskiego 12/14, 61-704 Poznan, Poland
| |
Collapse
|
12
|
Tarafder S, Roche R, Bhattacharya D. The landscape of RNA 3D structure modeling with transformer networks. Biol Methods Protoc 2024; 9:bpae047. [PMID: 39006460 PMCID: PMC11244692 DOI: 10.1093/biomethods/bpae047] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2024] [Revised: 06/22/2024] [Accepted: 07/01/2024] [Indexed: 07/16/2024] Open
Abstract
Transformers are a powerful subclass of neural networks catalyzing the development of a growing number of computational methods for RNA structure modeling. Here, we conduct an objective and empirical study of the predictive modeling accuracy of the emerging transformer-based methods for RNA structure prediction. Our study reveals multi-faceted complementarity between the methods and underscores some key aspects that affect the prediction accuracy.
Collapse
Affiliation(s)
- Sumit Tarafder
- Department of Computer Science, Virginia Tech, Blacksburg, VA 24061, United States
| | - Rahmatullah Roche
- Department of Computer Science, Virginia Tech, Blacksburg, VA 24061, United States
| | | |
Collapse
|
13
|
Harris BJ, Nguyen PT, Zhou G, Wulff H, DiMaio F, Yarov-Yarovoy V. Toward high-resolution modeling of small molecule-ion channel interactions. Front Pharmacol 2024; 15:1411428. [PMID: 38919257 PMCID: PMC11196768 DOI: 10.3389/fphar.2024.1411428] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2024] [Accepted: 05/13/2024] [Indexed: 06/27/2024] Open
Abstract
Ion channels are critical drug targets for a range of pathologies, such as epilepsy, pain, itch, autoimmunity, and cardiac arrhythmias. To develop effective and safe therapeutics, it is necessary to design small molecules with high potency and selectivity for specific ion channel subtypes. There has been increasing implementation of structure-guided drug design for the development of small molecules targeting ion channels. We evaluated the performance of two RosettaLigand docking methods, RosettaLigand and GALigandDock, on the structures of known ligand-cation channel complexes. Ligands were docked to voltage-gated sodium (NaV), voltage-gated calcium (CaV), and transient receptor potential vanilloid (TRPV) channel families. For each test case, RosettaLigand and GALigandDock methods frequently sampled a ligand-binding pose within a root mean square deviation (RMSD) of 1-2 Å relative to the experimental ligand coordinates. However, RosettaLigand and GALigandDock scoring functions cannot consistently identify experimental ligand coordinates as top-scoring models. Our study reveals that the proper scoring criteria for RosettaLigand and GALigandDock modeling of ligand-ion channel complexes should be assessed on a case-by-case basis using sufficient ligand and receptor interface sampling, knowledge about state-specific interactions of the ion channel, and inherent receptor site flexibility that could influence ligand binding.
Collapse
Affiliation(s)
- Brandon J. Harris
- Department of Physiology and Membrane Biology, University of California, Davis, Davis, CA, United States
- Biophysics Graduate Group, University of California, Davis, Davis, CA, United States
| | - Phuong T. Nguyen
- Department of Physiology and Membrane Biology, University of California, Davis, Davis, CA, United States
| | - Guangfeng Zhou
- Department of Biochemistry, University of Washington, Seattle, WA, United States
- Institute for Protein Design, University of Washington, Seattle, WA, United States
| | - Heike Wulff
- Department of Pharmacology, School of Medicine, University of California, Davis, Davis, CA, United States
| | - Frank DiMaio
- Department of Biochemistry, University of Washington, Seattle, WA, United States
| | - Vladimir Yarov-Yarovoy
- Department of Physiology and Membrane Biology, University of California, Davis, Davis, CA, United States
- Biophysics Graduate Group, University of California, Davis, Davis, CA, United States
- Department of Anesthesiology and Pain Medicine, University of California, Davis, Davis, CA, United States
| |
Collapse
|
14
|
Gren BA, Antczak M, Zok T, Sulkowska JI, Szachniuk M. Knotted artifacts in predicted 3D RNA structures. PLoS Comput Biol 2024; 20:e1011959. [PMID: 38900780 PMCID: PMC11218946 DOI: 10.1371/journal.pcbi.1011959] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2024] [Revised: 07/02/2024] [Accepted: 06/01/2024] [Indexed: 06/22/2024] Open
Abstract
Unlike proteins, RNAs deposited in the Protein Data Bank do not contain topological knots. Recently, admittedly, the first trefoil knot and some lasso-type conformations have been found in experimental RNA structures, but these are still exceptional cases. Meanwhile, algorithms predicting 3D RNA models have happened to form knotted structures not so rarely. Interestingly, machine learning-based predictors seem to be more prone to generate knotted RNA folds than traditional methods. A similar situation is observed for the entanglements of structural elements. In this paper, we analyze all models submitted to the CASP15 competition in the 3D RNA structure prediction category. We show what types of topological knots and structure element entanglements appear in the submitted models and highlight what methods are behind the generation of such conformations. We also study the structural aspect of susceptibility to entanglement. We suggest that predictors take care of an evaluation of RNA models to avoid publishing structures with artifacts, such as unusual entanglements, that result from hallucinations of predictive algorithms.
Collapse
Affiliation(s)
- Bartosz A. Gren
- Centre of New Technologies, University of Warsaw, Warsaw, Poland
| | - Maciej Antczak
- Institute of Computing Science, Poznan University of Technology, Poznan, Poland
- Institute of Bioorganic Chemistry, Polish Academy of Sciences, Poznan, Poland
| | - Tomasz Zok
- Institute of Computing Science, Poznan University of Technology, Poznan, Poland
| | | | - Marta Szachniuk
- Institute of Computing Science, Poznan University of Technology, Poznan, Poland
- Institute of Bioorganic Chemistry, Polish Academy of Sciences, Poznan, Poland
| |
Collapse
|
15
|
Zhao H, Petrey D, Murray D, Honig B. ZEPPI: Proteome-scale sequence-based evaluation of protein-protein interaction models. Proc Natl Acad Sci U S A 2024; 121:e2400260121. [PMID: 38743624 PMCID: PMC11127014 DOI: 10.1073/pnas.2400260121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2024] [Accepted: 04/18/2024] [Indexed: 05/16/2024] Open
Abstract
We introduce ZEPPI (Z-score Evaluation of Protein-Protein Interfaces), a framework to evaluate structural models of a complex based on sequence coevolution and conservation involving residues in protein-protein interfaces. The ZEPPI score is calculated by comparing metrics for an interface to those obtained from randomly chosen residues. Since contacting residues are defined by the structural model, this obviates the need to account for indirect interactions. Further, although ZEPPI relies on species-paired multiple sequence alignments, its focus on interfacial residues allows it to leverage quite shallow alignments. ZEPPI can be implemented on a proteome-wide scale and is applied here to millions of structural models of dimeric complexes in the Escherichia coli and human interactomes found in the PrePPI database. PrePPI's scoring function is based primarily on the evaluation of protein-protein interfaces, and ZEPPI adds a new feature to this analysis through the incorporation of evolutionary information. ZEPPI performance is evaluated through applications to experimentally determined complexes and to decoys from the CASP-CAPRI experiment. As we discuss, the standard CAPRI scores used to evaluate docking models are based on model quality and not on the ability to give yes/no answers as to whether two proteins interact. ZEPPI is able to detect weak signals from PPI models that the CAPRI scores define as incorrect and, similarly, to identify potential PPIs defined as low confidence by the current PrePPI scoring function. A number of examples that illustrate how the combination of PrePPI and ZEPPI can yield functional hypotheses are provided.
Collapse
Affiliation(s)
- Haiqing Zhao
- Department of Systems Biology, Columbia University Irving Medical Center, New York, NY10032
| | - Donald Petrey
- Department of Systems Biology, Columbia University Irving Medical Center, New York, NY10032
| | - Diana Murray
- Department of Systems Biology, Columbia University Irving Medical Center, New York, NY10032
| | - Barry Honig
- Department of Systems Biology, Columbia University Irving Medical Center, New York, NY10032
- Department of Biochemistry and Molecular Biophysics, Columbia University Irving Medical Center, New York, NY10032
- Department of Medicine, Columbia University, New York, NY10032
- Zuckerman Institute, Columbia University, New York, NY10027
| |
Collapse
|
16
|
Xie T, Huang J. Can Protein Structure Prediction Methods Capture Alternative Conformations of Membrane Transporters? J Chem Inf Model 2024; 64:3524-3536. [PMID: 38564295 DOI: 10.1021/acs.jcim.3c01936] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/04/2024]
Abstract
Understanding the conformational dynamics of proteins, such as the inward-facing (IF) and outward-facing (OF) transition observed in transporters, is vital for elucidating their functional mechanisms. Despite significant advances in protein structure prediction (PSP) over the past three decades, most efforts have been focused on single-state prediction, leaving multistate or alternative conformation prediction (ACP) relatively unexplored. This discrepancy has led to the development of highly accurate PSP methods such as AlphaFold, yet their capabilities for ACP remain limited. To investigate the performance of current PSP methods in ACP, we curated a data set, named IOMemP, consisting of 32 experimentally determined high-resolution IF and OF structures of 16 membrane proteins with substantial conformational changes. We benchmarked 12 representative PSP methods, along with two recent multistate methods based on AlphaFold, against this data set. Our findings reveal a remarkably consistent preference for specific states across various PSP methods. We elucidated how coevolution information in MSAs influences state preference. Moreover, we showed that AlphaFold, when excluding coevolution information, estimated similar energies between the experimental IF and OF conformations, indicating that the energy model learned by AlphaFold is not biased toward any particular state. Our IOMemP data set and benchmark results are anticipated to advance the development of robust ACP methods.
Collapse
Affiliation(s)
- Tengyu Xie
- College of Life Science, Zhejiang University, HangZhou Zhejiang 310058, China
- Key Laboratory of Structural Biology of Zhejiang Province, School of Life Sciences, Westlake University, HangZhou Zhejiang 310024, China
- Westlake AI Therapeutics Lab, Westlake Laboratory of Life Sciences and Biomedicine, HangZhou Zhejiang 310024, China
| | - Jing Huang
- College of Life Science, Zhejiang University, HangZhou Zhejiang 310058, China
- Key Laboratory of Structural Biology of Zhejiang Province, School of Life Sciences, Westlake University, HangZhou Zhejiang 310024, China
- Westlake AI Therapeutics Lab, Westlake Laboratory of Life Sciences and Biomedicine, HangZhou Zhejiang 310024, China
| |
Collapse
|
17
|
Szikszai M, Magnus M, Sanghi S, Kadyan S, Bouatta N, Rivas E. RNA3DB: A structurally-dissimilar dataset split for training and benchmarking deep learning models for RNA structure prediction. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.01.30.578025. [PMID: 38352531 PMCID: PMC10862857 DOI: 10.1101/2024.01.30.578025] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/23/2024]
Abstract
With advances in protein structure prediction thanks to deep learning models like AlphaFold, RNA structure prediction has recently received increased attention from deep learning researchers. RNAs introduce substantial challenges due to the sparser availability and lower structural diversity of the experimentally resolved RNA structures in comparison to protein structures. These challenges are often poorly addressed by the existing literature, many of which report inflated performance due to using training and testing sets with significant structural overlap. Further, the most recent Critical Assessment of Structure Prediction (CASP15) has shown that deep learning models for RNA structure are currently outperformed by traditional methods. In this paper we present RNA3DB, a dataset of structured RNAs, derived from the Protein Data Bank (PDB), that is designed for training and benchmarking deep learning models. The RNA3DB method arranges the RNA 3D chains into distinct groups (Components) that are non-redundant both with regard to sequence as well as structure, providing a robust way of dividing training, validation, and testing sets. Any split of these structurally-dissimilar Components are guaranteed to produce test and validations sets that are distinct by sequence and structure from those in the training set. We provide the RNA3DB dataset, a particular train/test split of the RNA3DB Components (in an approximate 70/30 ratio) that will be updated periodically. We also provide the RNA3DB methodology along with the source-code, with the goal of creating a reproducible and customizable tool for producing structurally-dissimilar dataset splits for structural RNAs.
Collapse
Affiliation(s)
- Marcell Szikszai
- Department of Molecular and Cellular Biology, Harvard University, Cambridge, 02138, MA, USA
| | - Marcin Magnus
- Department of Molecular and Cellular Biology, Harvard University, Cambridge, 02138, MA, USA
| | - Siddhant Sanghi
- Department of Systems Biology, Columbia University, New York, 10027, NY, USA
- College of Biological Sciences, UC Davis, Davis, 95616, CA, USA
| | - Sachin Kadyan
- Department of Systems Biology, Columbia University, New York, 10027, NY, USA
| | - Nazim Bouatta
- Laboratory of Systems Pharmacology, Harvard Medical School, Boston, 02115, MA, USA
| | - Elena Rivas
- Department of Molecular and Cellular Biology, Harvard University, Cambridge, 02138, MA, USA
| |
Collapse
|
18
|
Das R, Kretsch RC, Simpkin AJ, Mulvaney T, Pham P, Rangan R, Bu F, Keegan RM, Topf M, Rigden DJ, Miao Z, Westhof E. Assessment of three-dimensional RNA structure prediction in CASP15. Proteins 2023; 91:1747-1770. [PMID: 37876231 PMCID: PMC10841292 DOI: 10.1002/prot.26602] [Citation(s) in RCA: 17] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2023] [Revised: 08/21/2023] [Accepted: 09/07/2023] [Indexed: 10/26/2023]
Abstract
The prediction of RNA three-dimensional structures remains an unsolved problem. Here, we report assessments of RNA structure predictions in CASP15, the first CASP exercise that involved RNA structure modeling. Forty-two predictor groups submitted models for at least one of twelve RNA-containing targets. These models were evaluated by the RNA-Puzzles organizers and, separately, by a CASP-recruited team using metrics (GDT, lDDT) and approaches (Z-score rankings) initially developed for assessment of proteins and generalized here for RNA assessment. The two assessments independently ranked the same predictor groups as first (AIchemy_RNA2), second (Chen), and third (RNAPolis and GeneSilico, tied); predictions from deep learning approaches were significantly worse than these top ranked groups, which did not use deep learning. Further analyses based on direct comparison of predicted models to cryogenic electron microscopy (cryo-EM) maps and x-ray diffraction data support these rankings. With the exception of two RNA-protein complexes, models submitted by CASP15 groups correctly predicted the global fold of the RNA targets. Comparisons of CASP15 submissions to designed RNA nanostructures as well as molecular replacement trials highlight the potential utility of current RNA modeling approaches for RNA nanotechnology and structural biology, respectively. Nevertheless, challenges remain in modeling fine details such as noncanonical pairs, in ranking among submitted models, and in prediction of multiple structures resolved by cryo-EM or crystallography.
Collapse
Affiliation(s)
- Rhiju Das
- Department of Biochemistry, Stanford University School of Medicine, CA USA
- Biophysics Program, Stanford University School of Medicine, CA USA
- Howard Hughes Medical Institute, Stanford University, CA USA
| | | | - Adam J. Simpkin
- Institute of Systems, Molecular & Integrative Biology, The University of Liverpool, UK
| | - Thomas Mulvaney
- Centre for Structural Systems Biology (CSSB), Leibniz-Institut für Virologie (LIV), Hamburg, Germany
- University Medical Center Hamburg-Eppendorf (UKE), Hamburg, Germany
| | - Phillip Pham
- Department of Biochemistry, Stanford University School of Medicine, CA USA
| | - Ramya Rangan
- Biophysics Program, Stanford University School of Medicine, CA USA
| | - Fan Bu
- Guangzhou Laboratory, Guangzhou International Bio Island, Guangzhou 510005, China
- Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei 230036, Anhui, China
| | - Ronan M. Keegan
- Institute of Systems, Molecular & Integrative Biology, The University of Liverpool, UK
- Life Science, Diamond Light Source, Harwell Science, UK
| | - Maya Topf
- Centre for Structural Systems Biology (CSSB), Leibniz-Institut für Virologie (LIV), Hamburg, Germany
- University Medical Center Hamburg-Eppendorf (UKE), Hamburg, Germany
| | - Daniel J. Rigden
- Institute of Systems, Molecular & Integrative Biology, The University of Liverpool, UK
| | - Zhichao Miao
- GMU-GIBH Joint School of Life Sciences, The Guangdong-Hong Kong-Macau Joint Laboratory for Cell Fate Regulation and Diseases, Guangzhou National Laboratory, Guangzhou Medical University
- Shanghai Key Laboratory of Anesthesiology and Brain Functional Modulation, Clinical Research Center for Anesthesiology and Perioperative Medicine, Translational Research Institute of Brain and Brain-Like Intelligence, Shanghai Fourth People's Hospital, School of Medicine, Tongji University, Shanghai 200434, China
| | - Eric Westhof
- Architecture et Réactivité de l’ARN, Institut de Biologie Moléculaire et Cellulaire du CNRS, Université de Strasbourg, F-67084, Strasbourg, France
| |
Collapse
|
19
|
Kryshtafovych A, Montelione GT, Rigden DJ, Mesdaghi S, Karaca E, Moult J. Breaking the conformational ensemble barrier: Ensemble structure modeling challenges in CASP15. Proteins 2023; 91:1903-1911. [PMID: 37872703 PMCID: PMC10840738 DOI: 10.1002/prot.26584] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2023] [Accepted: 08/14/2023] [Indexed: 10/25/2023]
Abstract
For the first time, the 2022 CASP (Critical Assessment of Structure Prediction) community experiment included a section on computing multiple conformations for protein and RNA structures. There was full or partial success in reproducing the ensembles for four of the nine targets, an encouraging result. For protein structures, enhanced sampling with variations of the AlphaFold2 deep learning method was by far the most effective approach. One substantial conformational change caused by a single mutation across a complex interface was accurately reproduced. In two other assembly modeling cases, methods succeeded in sampling conformations near to the experimental ones even though environmental factors were not included in the calculations. An experimentally derived flexibility ensemble allowed a single accurate RNA structure model to be identified. Difficulties included how to handle sparse or low-resolution experimental data and the current lack of effective methods for modeling RNA/protein complexes. However, these and other obstacles appear addressable.
Collapse
Affiliation(s)
| | - Gaetano T Montelione
- Department of Chemistry and Chemical Biology, Center for Biotechnology and Interdisciplinary Sciences, Rensselaer Polytechnic Institute, Troy, New York, USA
| | - Daniel J Rigden
- Institute of Systems, Molecular, and Integrative Biology, University of Liverpool, Liverpool, UK
| | - Shahram Mesdaghi
- Institute of Systems, Molecular, and Integrative Biology, University of Liverpool, Liverpool, UK
- Computational Biology Facility, MerseyBio, University of Liverpool, Liverpool, UK
| | - Ezgi Karaca
- Izmir Biomedicine and Genome Center, Izmir, Turkey
- Izmir International Biomedicine and Genome Institute, Dokuz Eylul University, Izmir, Turkey
| | - John Moult
- Institute for Bioscience and Biotechnology Research, Rockville, Maryland, USA
- Department of Cell Biology and Molecular Genetics, University of Maryland, College Park, Maryland, USA
| |
Collapse
|
20
|
Leemann M, Sagasta A, Eberhardt J, Schwede T, Robin X, Durairaj J. Automated benchmarking of combined protein structure and ligand conformation prediction. Proteins 2023; 91:1912-1924. [PMID: 37885318 DOI: 10.1002/prot.26605] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2023] [Revised: 09/15/2023] [Accepted: 09/21/2023] [Indexed: 10/28/2023]
Abstract
The prediction of protein-ligand complexes (PLC), using both experimental and predicted structures, is an active and important area of research, underscored by the inclusion of the Protein-Ligand Interaction category in the latest round of the Critical Assessment of Protein Structure Prediction experiment CASP15. The prediction task in CASP15 consisted of predicting both the three-dimensional structure of the receptor protein as well as the position and conformation of the ligand. This paper addresses the challenges and proposed solutions for devising automated benchmarking techniques for PLC prediction. The reliability of experimentally solved PLC as ground truth reference structures is assessed using various validation criteria. Similarity of PLC to previously released complexes are employed to judge PLC diversity and the difficulty of a PLC as a prediction target. We show that the commonly used PDBBind time-split test-set is inappropriate for comprehensive PLC evaluation, with state-of-the-art tools showing conflicting results on a more representative and high quality dataset constructed for benchmarking purposes. We also show that redocking on crystal structures is a much simpler task than docking into predicted protein models, demonstrated by the two PLC-prediction-specific scoring metrics created. Finally, we introduce a fully automated pipeline that predicts PLC and evaluates the accuracy of the protein structure, ligand pose, and protein-ligand interactions.
Collapse
Affiliation(s)
- Michèle Leemann
- Biozentrum, University of Basel, Basel, Switzerland
- SIB Swiss Institute of Bioinformatics, Basel, Switzerland
| | - Ander Sagasta
- Biozentrum, University of Basel, Basel, Switzerland
- SIB Swiss Institute of Bioinformatics, Basel, Switzerland
| | - Jerome Eberhardt
- Biozentrum, University of Basel, Basel, Switzerland
- SIB Swiss Institute of Bioinformatics, Basel, Switzerland
| | - Torsten Schwede
- Biozentrum, University of Basel, Basel, Switzerland
- SIB Swiss Institute of Bioinformatics, Basel, Switzerland
| | - Xavier Robin
- Biozentrum, University of Basel, Basel, Switzerland
- SIB Swiss Institute of Bioinformatics, Basel, Switzerland
| | - Janani Durairaj
- Biozentrum, University of Basel, Basel, Switzerland
- SIB Swiss Institute of Bioinformatics, Basel, Switzerland
| |
Collapse
|
21
|
Kryshtafovych A, Schwede T, Topf M, Fidelis K, Moult J. Critical assessment of methods of protein structure prediction (CASP)-Round XV. Proteins 2023; 91:1539-1549. [PMID: 37920879 PMCID: PMC10843301 DOI: 10.1002/prot.26617] [Citation(s) in RCA: 19] [Impact Index Per Article: 19.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2023] [Accepted: 10/06/2023] [Indexed: 11/04/2023]
Abstract
Computing protein structure from amino acid sequence information has been a long-standing grand challenge. Critical assessment of structure prediction (CASP) conducts community experiments aimed at advancing solutions to this and related problems. Experiments are conducted every 2 years. The 2020 experiment (CASP14) saw major progress, with the second generation of deep learning methods delivering accuracy comparable with experiment for many single proteins. There is an expectation that these methods will have much wider application in computational structural biology. Here we summarize results from the most recent experiment, CASP15, in 2022, with an emphasis on new deep learning-driven progress. Other papers in this special issue of proteins provide more detailed analysis. For single protein structures, the AlphaFold2 deep learning method is still superior to other approaches, but there are two points of note. First, although AlphaFold2 was the core of all the most successful methods, there was a wide variety of implementation and combination with other methods. Second, using the standard AlphaFold2 protocol and default parameters only produces the highest quality result for about two thirds of the targets, and more extensive sampling is required for the others. The major advance in this CASP is the enormous increase in the accuracy of computed protein complexes, achieved by the use of deep learning methods, although overall these do not fully match the performance for single proteins. Here too, AlphaFold2 based method perform best, and again more extensive sampling than the defaults is often required. Also of note are the encouraging early results on the use of deep learning to compute ensembles of macromolecular structures. Critically for the usability of computed structures, for both single proteins and protein complexes, deep learning derived estimates of both local and global accuracy are of high quality, however the estimates in interface regions are slightly less reliable. CASP15 also included computation of RNA structures for the first time. Here, the classical approaches produced better agreement with experiment than the new deep learning ones, and accuracy is limited. Also, for the first time, CASP included the computation of protein-ligand complexes, an area of special interest for drug design. Here too, classical methods were still superior to deep learning ones. Many new approaches were discussed at the CASP conference, and it is clear methods will continue to advance.
Collapse
Affiliation(s)
| | - Torsten Schwede
- University of Basel, Biozentrum & SIB Swiss Institute of Bioinformatics, Basel, Switzerland
| | - Maya Topf
- Centre for Structural Systems Biology, Leibniz-Institut für Experimentelle Virologie and Universitätsklinikum Hamburg-Eppendorf (UKE), Hamburg, Germany
| | | | - John Moult
- Institute for Bioscience and Biotechnology Research, Rockville, MD, USA, and Department of Cell Biology and Molecular Genetics, University of Maryland, College Park, MD, USA
| |
Collapse
|
22
|
Roy S, Ben-Hur A. Protein quality assessment with a loss function designed for high-quality decoys. FRONTIERS IN BIOINFORMATICS 2023; 3:1198218. [PMID: 37915563 PMCID: PMC10616882 DOI: 10.3389/fbinf.2023.1198218] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2023] [Accepted: 09/29/2023] [Indexed: 11/03/2023] Open
Abstract
Motivation: The prediction of a protein 3D structure is essential for understanding protein function, drug discovery, and disease mechanisms; with the advent of methods like AlphaFold that are capable of producing very high-quality decoys, ensuring the quality of those decoys can provide further confidence in the accuracy of their predictions. Results: In this work, we describe Qϵ, a graph convolutional network (GCN) that utilizes a minimal set of atom and residue features as inputs to predict the global distance test total score (GDTTS) and local distance difference test (lDDT) score of a decoy. To improve the model's performance, we introduce a novel loss function based on the ϵ-insensitive loss function used for SVM regression. This loss function is specifically designed for evaluating the characteristics of the quality assessment problem and provides predictions with improved accuracy over standard loss functions used for this task. Despite using only a minimal set of features, it matches the performance of recent state-of-the-art methods like DeepUMQA. Availability: The code for Qϵ is available at https://github.com/soumyadip1997/qepsilon.
Collapse
Affiliation(s)
| | - Asa Ben-Hur
- Department of Computer Science, Colorado State University, Fort Collins, CO, United States
| |
Collapse
|
23
|
Dai X, Wu L, Yoo S, Liu Q. Integrating AlphaFold and deep learning for atomistic interpretation of cryo-EM maps. Brief Bioinform 2023; 24:bbad405. [PMID: 37982712 DOI: 10.1093/bib/bbad405] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2023] [Revised: 10/09/2023] [Accepted: 10/23/2023] [Indexed: 11/21/2023] Open
Abstract
Interpretation of cryo-electron microscopy (cryo-EM) maps requires building and fitting 3D atomic models of biological molecules. AlphaFold-predicted models generate initial 3D coordinates; however, model inaccuracy and conformational heterogeneity often necessitate labor-intensive manual model building and fitting into cryo-EM maps. In this work, we designed a protein model-building workflow, which combines a deep-learning cryo-EM map feature enhancement tool, CryoFEM (Cryo-EM Feature Enhancement Model) and AlphaFold. A benchmark test using 36 cryo-EM maps shows that CryoFEM achieves state-of-the-art performance in optimizing the Fourier Shell Correlations between the maps and the ground truth models. Furthermore, in a subset of 17 datasets where the initial AlphaFold predictions are less accurate, the workflow significantly improves their model accuracy. Our work demonstrates that the integration of modern deep learning image enhancement and AlphaFold may lead to automated model building and fitting for the atomistic interpretation of cryo-EM maps.
Collapse
Affiliation(s)
- Xin Dai
- Computational Science Initiative, Brookhaven National Laboratory, Upton, NY, USA
| | - Longlong Wu
- Condensed Matter Physics and Materials Science Department, Brookhaven National Laboratory, Upton, NY, USA
| | - Shinjae Yoo
- Computational Science Initiative, Brookhaven National Laboratory, Upton, NY, USA
| | - Qun Liu
- Biology Department, Brookhaven National Laboratory, Upton, NY, USA
| |
Collapse
|
24
|
Li Y, Zhang C, Feng C, Pearce R, Lydia Freddolino P, Zhang Y. Integrating end-to-end learning with deep geometrical potentials for ab initio RNA structure prediction. Nat Commun 2023; 14:5745. [PMID: 37717036 PMCID: PMC10505173 DOI: 10.1038/s41467-023-41303-9] [Citation(s) in RCA: 18] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2023] [Accepted: 08/22/2023] [Indexed: 09/18/2023] Open
Abstract
RNAs are fundamental in living cells and perform critical functions determined by their tertiary architectures. However, accurate modeling of 3D RNA structure remains a challenging problem. We present a novel method, DRfold, to predict RNA tertiary structures by simultaneous learning of local frame rotations and geometric restraints from experimentally solved RNA structures, where the learned knowledge is converted into a hybrid energy potential to guide RNA structure assembly. The method significantly outperforms previous approaches by >73.3% in TM-score on a sequence-nonredundant dataset containing recently released structures. Detailed analyses showed that the major contribution to the improvements arise from the deep end-to-end learning supervised with the atom coordinates and the composite energy function integrating complementary information from geometry restraints and end-to-end learning models. The open-source DRfold program with fast training protocol allows large-scale application of high-resolution RNA structure modeling and can be further improved with future expansion of RNA structure databases.
Collapse
Affiliation(s)
- Yang Li
- Cancer Science Institute of Singapore, National University of Singapore, 117599, Singapore, Singapore
- Department of Computational Medicine and Bioinformatics, University of Michigan Medical School, Ann Arbor, MI, 48109, USA
| | - Chengxin Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan Medical School, Ann Arbor, MI, 48109, USA
- Department of Molecular, Cellular, and Developmental Biology, Yale University, New Haven, CT, 06511, USA
| | - Chenjie Feng
- Department of Computational Medicine and Bioinformatics, University of Michigan Medical School, Ann Arbor, MI, 48109, USA
- School of Science, Ningxia Medical University, Yinchuan, 750004, China
| | - Robin Pearce
- Department of Computational Medicine and Bioinformatics, University of Michigan Medical School, Ann Arbor, MI, 48109, USA
- Department of Computer Science, School of Computing, National University of Singapore, 117417, Singapore, Singapore
| | - P Lydia Freddolino
- Department of Computational Medicine and Bioinformatics, University of Michigan Medical School, Ann Arbor, MI, 48109, USA.
- Department of Biological Chemistry, University of Michigan Medical School, Ann Arbor, MI, 48109, USA.
| | - Yang Zhang
- Cancer Science Institute of Singapore, National University of Singapore, 117599, Singapore, Singapore.
- Department of Computational Medicine and Bioinformatics, University of Michigan Medical School, Ann Arbor, MI, 48109, USA.
- Department of Computer Science, School of Computing, National University of Singapore, 117417, Singapore, Singapore.
- Department of Biological Chemistry, University of Michigan Medical School, Ann Arbor, MI, 48109, USA.
- Department of Biochemistry, Yong Loo Lin School of Medicine, National University of Singapore, 117596, Singapore, Singapore.
| |
Collapse
|
25
|
Wang X, Yu S, Lou E, Tan YL, Tan ZJ. RNA 3D Structure Prediction: Progress and Perspective. Molecules 2023; 28:5532. [PMID: 37513407 PMCID: PMC10386116 DOI: 10.3390/molecules28145532] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2023] [Revised: 07/05/2023] [Accepted: 07/13/2023] [Indexed: 07/30/2023] Open
Abstract
Ribonucleic acid (RNA) molecules play vital roles in numerous important biological functions such as catalysis and gene regulation. The functions of RNAs are strongly coupled to their structures or proper structure changes, and RNA structure prediction has been paid much attention in the last two decades. Some computational models have been developed to predict RNA three-dimensional (3D) structures in silico, and these models are generally composed of predicting RNA 3D structure ensemble, evaluating near-native RNAs from the structure ensemble, and refining the identified RNAs. In this review, we will make a comprehensive overview of the recent advances in RNA 3D structure modeling, including structure ensemble prediction, evaluation, and refinement. Finally, we will emphasize some insights and perspectives in modeling RNA 3D structures.
Collapse
Affiliation(s)
- Xunxun Wang
- Department of Physics, Key Laboratory of Artificial Micro & Nano-Structures of Ministry of Education, School of Physics and Technology, Wuhan University, Wuhan 430072, China
| | - Shixiong Yu
- Department of Physics, Key Laboratory of Artificial Micro & Nano-Structures of Ministry of Education, School of Physics and Technology, Wuhan University, Wuhan 430072, China
| | - En Lou
- Department of Physics, Key Laboratory of Artificial Micro & Nano-Structures of Ministry of Education, School of Physics and Technology, Wuhan University, Wuhan 430072, China
| | - Ya-Lan Tan
- School of Bioengineering and Health, Wuhan Textile University, Wuhan 430200, China
- Research Center of Nonlinear Science, School of Mathematical and Physical Sciences, Wuhan Textile University, Wuhan 430200, China
| | - Zhi-Jie Tan
- Department of Physics, Key Laboratory of Artificial Micro & Nano-Structures of Ministry of Education, School of Physics and Technology, Wuhan University, Wuhan 430072, China
| |
Collapse
|