1
|
Liang F, Sun M, Xie L, Zhao X, Liu D, Zhao K, Zhang G. Recent advances and challenges in protein complex model accuracy estimation. Comput Struct Biotechnol J 2024; 23:1824-1832. [PMID: 38707538 PMCID: PMC11066466 DOI: 10.1016/j.csbj.2024.04.049] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2024] [Revised: 04/18/2024] [Accepted: 04/18/2024] [Indexed: 05/07/2024] Open
Abstract
Estimation of model accuracy plays a crucial role in protein structure prediction, aiming to evaluate the quality of predicted protein structure models accurately and objectively. This process is not only key to screening candidate models that are close to the real structure, but also provides guidance for further optimization of protein structures. With the significant advancements made by AlphaFold2 in monomer structure, the problem of single-domain protein structure prediction has been widely solved. Correspondingly, the importance of assessing the quality of single-domain protein models decreased, and the research focus has shifted to estimation of model accuracy of protein complexes. In this review, our goal is to provide a comprehensive overview of the reference and statistical metrics, as well as representative methods, and the current challenges within four distinct facets (Topology Global Score, Interface Total Score, Interface Residue-Wise Score, and Tertiary Residue-Wise Score) in the field of complex EMA.
Collapse
Affiliation(s)
| | | | - Lei Xie
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Xuanfeng Zhao
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Dong Liu
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Kailong Zhao
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Guijun Zhang
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| |
Collapse
|
2
|
Manfredi M, Savojardo C, Iardukhin G, Salomoni D, Costantini A, Martelli PL, Casadio R. Alpha&ESMhFolds: A Web Server for Comparing AlphaFold2 and ESMFold Models of the Human Reference Proteome. J Mol Biol 2024; 436:168593. [PMID: 38718922 DOI: 10.1016/j.jmb.2024.168593] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2024] [Revised: 04/22/2024] [Accepted: 04/30/2024] [Indexed: 05/16/2024]
Abstract
We develop a novel database Alpha&ESMhFolds which allows the direct comparison of AlphaFold2 and ESMFold predicted models for 42,942 proteins of the Reference Human Proteome, and when available, their comparison with 2,900 directly associated PDB structures with at least a structure to sequence coverage of 70%. Statistics indicate that good quality models tend to overlap with a TM-score >0.6 as long as some PDB structural information is available. As expected, a direct model superimposition to the PDB structure highlights that AlphaFold2 models are slightly superior to ESMFold ones. However, some 55% of the database is endowed with models overlapping with TM-score <0.6. This highlights the different outputs of the two methods. The database is freely available for usage at https://alpha-esmhfolds.biocomp.unibo.it/.
Collapse
Affiliation(s)
- Matteo Manfredi
- Biocomputing Group, Dept. of Pharmacy and Biotechnology, University of Bologna, Italy
| | - Castrense Savojardo
- Biocomputing Group, Dept. of Pharmacy and Biotechnology, University of Bologna, Italy.
| | - Georgii Iardukhin
- Biocomputing Group, Dept. of Pharmacy and Biotechnology, University of Bologna, Italy
| | | | | | - Pier Luigi Martelli
- Biocomputing Group, Dept. of Pharmacy and Biotechnology, University of Bologna, Italy.
| | - Rita Casadio
- Biocomputing Group, Dept. of Pharmacy and Biotechnology, University of Bologna, Italy
| |
Collapse
|
3
|
Margelevičius M. GTalign: spatial index-driven protein structure alignment, superposition, and search. Nat Commun 2024; 15:7305. [PMID: 39181863 PMCID: PMC11344802 DOI: 10.1038/s41467-024-51669-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2024] [Accepted: 08/14/2024] [Indexed: 08/27/2024] Open
Abstract
With protein databases growing rapidly due to advances in structural and computational biology, the ability to accurately align and rapidly search protein structures has become essential for biological research. In response to the challenge posed by vast protein structure repositories, GTalign offers an innovative solution to protein structure alignment and search-an algorithm that achieves optimal superposition at high speeds. Through the design and implementation of spatial structure indexing, GTalign parallelizes all stages of superposition search across residues and protein structure pairs, yielding rapid identification of optimal superpositions. Rigorous evaluation across diverse datasets reveals GTalign as the most accurate among structure aligners while presenting orders of magnitude in speedup at state-of-the-art accuracy. GTalign's high speed and accuracy make it useful for numerous applications, including functional inference, evolutionary analyses, protein design, and drug discovery, contributing to advancing understanding of protein structure and function.
Collapse
|
4
|
Arshad NF, Nordin FJ, Foong LC, In LLA, Teo MYM. Engineering receptor-binding domain and heptad repeat domains towards the development of multi-epitopes oral vaccines against SARS-CoV-2 variants. PLoS One 2024; 19:e0306111. [PMID: 39146295 PMCID: PMC11326571 DOI: 10.1371/journal.pone.0306111] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2023] [Accepted: 06/11/2024] [Indexed: 08/17/2024] Open
Abstract
The inability of existing vaccines to cope with the mutation rate has highlighted the need for effective preventative strategies for COVID-19. Through the secretion of immunoglobulin A, mucosal delivery of vaccines can effectively stimulate mucosal immunity for better protection against SARS-CoV-2 infection. In this study, various immunoinformatic tools were used to design a multi-epitope oral vaccine against SARS-CoV-2 based on its receptor-binding domain (RBD) and heptad repeat (HR) domains. T and B lymphocyte epitopes were initially predicted from the RBD and HR domains of SARS-CoV-2, and potential antigenic, immunogenic, non-allergenic, and non-toxic epitopes were identified. Epitopes that are highly conserved and have no significant similarity to human proteome were selected. The epitopes were joined with appropriate linkers, and an adjuvant was added to enhance the vaccine efficacy. The vaccine 3D structure constructs were docked with toll-like receptor 4 (TLR-4) and TLR1-TLR2, and the binding affinity was calculated. The designed multi-epitope vaccine construct (MEVC) consisted of 33 antigenic T and B lymphocyte epitopes. The results of molecular dockings and free binding energies confirmed that the MEVC effectively binds to TLR molecules, and the complexes were stable. The results suggested that the designed MEVC is a potentially safe and effective oral vaccine against SARS-CoV-2. This in silico study presents a novel approach for creating an oral multi-epitope vaccine against the rapidly evolving SARS-CoV-2 variants. These findings offer valuable insights for developing an effective strategy to combat COVID-19. Further preclinical and clinical studies are required to confirm the efficacy of the MEVC vaccine.
Collapse
Affiliation(s)
- Nur Farhanah Arshad
- Department of Biotechnology, Faculty of Applied Sciences, UCSI University, Kuala Lumpur, Malaysia
| | - Fariza Juliana Nordin
- Department of Biological Sciences and Biotechnology, Faculty of Science & Technology, Universiti Kebangsaan Malaysia, Bangi, Selangor, Malaysia
| | - Lian Chee Foong
- State Key Laboratory of Systems Medicine for Cancer, Renji-Med X Clinical Stem Cell Research Center, Ren Ji Hospital, School of Medicine, Shanghai Jiao Tong University, Shanghai, China
| | - Lionel Lian Aun In
- Department of Biotechnology, Faculty of Applied Sciences, UCSI University, Kuala Lumpur, Malaysia
| | - Michelle Yee Mun Teo
- Department of Biotechnology, Faculty of Applied Sciences, UCSI University, Kuala Lumpur, Malaysia
| |
Collapse
|
5
|
Ahdritz G, Bouatta N, Floristean C, Kadyan S, Xia Q, Gerecke W, O'Donnell TJ, Berenberg D, Fisk I, Zanichelli N, Zhang B, Nowaczynski A, Wang B, Stepniewska-Dziubinska MM, Zhang S, Ojewole A, Guney ME, Biderman S, Watkins AM, Ra S, Lorenzo PR, Nivon L, Weitzner B, Ban YEA, Chen S, Zhang M, Li C, Song SL, He Y, Sorger PK, Mostaque E, Zhang Z, Bonneau R, AlQuraishi M. OpenFold: retraining AlphaFold2 yields new insights into its learning mechanisms and capacity for generalization. Nat Methods 2024; 21:1514-1524. [PMID: 38744917 DOI: 10.1038/s41592-024-02272-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2023] [Accepted: 04/03/2024] [Indexed: 05/16/2024]
Abstract
AlphaFold2 revolutionized structural biology with the ability to predict protein structures with exceptionally high accuracy. Its implementation, however, lacks the code and data required to train new models. These are necessary to (1) tackle new tasks, like protein-ligand complex structure prediction, (2) investigate the process by which the model learns and (3) assess the model's capacity to generalize to unseen regions of fold space. Here we report OpenFold, a fast, memory efficient and trainable implementation of AlphaFold2. We train OpenFold from scratch, matching the accuracy of AlphaFold2. Having established parity, we find that OpenFold is remarkably robust at generalizing even when the size and diversity of its training set is deliberately limited, including near-complete elisions of classes of secondary structure elements. By analyzing intermediate structures produced during training, we also gain insights into the hierarchical manner in which OpenFold learns to fold. In sum, our studies demonstrate the power and utility of OpenFold, which we believe will prove to be a crucial resource for the protein modeling community.
Collapse
Affiliation(s)
- Gustaf Ahdritz
- Department of Systems Biology, Columbia University, New York, NY, USA
- Harvard University, Cambridge, MA, USA
| | - Nazim Bouatta
- Laboratory of Systems Pharmacology, Harvard Medical School, Boston, MA, USA.
| | | | - Sachin Kadyan
- Department of Systems Biology, Columbia University, New York, NY, USA
| | - Qinghui Xia
- Department of Systems Biology, Columbia University, New York, NY, USA
| | - William Gerecke
- Laboratory of Systems Pharmacology, Harvard Medical School, Boston, MA, USA
| | | | - Daniel Berenberg
- Department of Computer Science, Courant Institute of Mathematical Sciences, New York University, New York, NY, USA
| | - Ian Fisk
- Flatiron Institute, New York, NY, USA
| | | | - Bo Zhang
- Scientific Computing and Imaging Institute, University of Utah, Salt Lake City, UT, USA
| | | | | | | | | | | | | | - Stella Biderman
- EleutherAI, New York, NY, USA
- Booz Allen Hamilton, McLean, VA, USA
| | | | - Stephen Ra
- Prescient Design, Genentech, New York, NY, USA
| | | | | | | | | | | | - Minjia Zhang
- University of Illinois at Urbana-Champaign, Champaign, IL, USA
| | | | | | | | - Peter K Sorger
- Laboratory of Systems Pharmacology, Harvard Medical School, Boston, MA, USA
| | | | - Zhao Zhang
- Rutgers University, New Brunswick, NJ, USA
| | | | | |
Collapse
|
6
|
Lawson CL, Kryshtafovych A, Pintilie GD, Burley SK, Černý J, Chen VB, Emsley P, Gobbi A, Joachimiak A, Noreng S, Prisant MG, Read RJ, Richardson JS, Rohou AL, Schneider B, Sellers BD, Shao C, Sourial E, Williams CI, Williams CJ, Yang Y, Abbaraju V, Afonine PV, Baker ML, Bond PS, Blundell TL, Burnley T, Campbell A, Cao R, Cheng J, Chojnowski G, Cowtan KD, DiMaio F, Esmaeeli R, Giri N, Grubmüller H, Hoh SW, Hou J, Hryc CF, Hunte C, Igaev M, Joseph AP, Kao WC, Kihara D, Kumar D, Lang L, Lin S, Maddhuri Venkata Subramaniya SR, Mittal S, Mondal A, Moriarty NW, Muenks A, Murshudov GN, Nicholls RA, Olek M, Palmer CM, Perez A, Pohjolainen E, Pothula KR, Rowley CN, Sarkar D, Schäfer LU, Schlicksup CJ, Schröder GF, Shekhar M, Si D, Singharoy A, Sobolev OV, Terashi G, Vaiana AC, Vedithi SC, Verburgt J, Wang X, Warshamanage R, Winn MD, Weyand S, Yamashita K, Zhao M, Schmid MF, Berman HM, Chiu W. Outcomes of the EMDataResource cryo-EM Ligand Modeling Challenge. Nat Methods 2024; 21:1340-1348. [PMID: 38918604 DOI: 10.1038/s41592-024-02321-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2024] [Accepted: 05/24/2024] [Indexed: 06/27/2024]
Abstract
The EMDataResource Ligand Model Challenge aimed to assess the reliability and reproducibility of modeling ligands bound to protein and protein-nucleic acid complexes in cryogenic electron microscopy (cryo-EM) maps determined at near-atomic (1.9-2.5 Å) resolution. Three published maps were selected as targets: Escherichia coli beta-galactosidase with inhibitor, SARS-CoV-2 virus RNA-dependent RNA polymerase with covalently bound nucleotide analog and SARS-CoV-2 virus ion channel ORF3a with bound lipid. Sixty-one models were submitted from 17 independent research groups, each with supporting workflow details. The quality of submitted ligand models and surrounding atoms were analyzed by visual inspection and quantification of local map quality, model-to-map fit, geometry, energetics and contact scores. A composite rather than a single score was needed to assess macromolecule+ligand model quality. These observations lead us to recommend best practices for assessing cryo-EM structures of liganded macromolecules reported at near-atomic resolution.
Collapse
Affiliation(s)
- Catherine L Lawson
- RCSB Protein Data Bank and Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ, USA.
| | | | - Grigore D Pintilie
- Departments of Bioengineering and of Microbiology and Immunology, Stanford University, Stanford, CA, USA
| | - Stephen K Burley
- RCSB Protein Data Bank and Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ, USA
- Department of Chemistry and Chemical Biology, Rutgers, The State University of New Jersey, Piscataway, NJ, USA
- Rutgers Cancer Institute of New Jersey, Rutgers, The State University of New Jersey, New Brunswick, NJ, USA
- RCSB Protein Data Bank and San Diego Supercomputer Center, University of California San Diego, La Jolla, CA, USA
| | - Jiří Černý
- Institute of Biotechnology, Czech Academy of Sciences, Vestec, Czech Republic
| | - Vincent B Chen
- Department of Biochemistry, Duke University, Durham, NC, USA
| | - Paul Emsley
- MRC Laboratory of Molecular Biology, Cambridge, UK
| | - Alberto Gobbi
- Discovery Chemistry, Genentech Inc., San Francisco, CA, USA
- , Berlin, Germany
| | - Andrzej Joachimiak
- Structural Biology Center, X-ray Science Division, Argonne National Laboratory, Argonne, IL, USA
- Department of Biochemistry and Molecular Biology, University of Chicago, Chicago, IL, USA
| | - Sigrid Noreng
- Structural Biology, Genentech Inc., South San Francisco, CA, USA
- Protein Science, Septerna, South San Francisco, CA, USA
| | | | - Randy J Read
- Department of Haematology, Cambridge Institute for Medical Research, University of Cambridge, Cambridge, UK
| | | | - Alexis L Rohou
- Structural Biology, Genentech Inc., South San Francisco, CA, USA
| | - Bohdan Schneider
- Institute of Biotechnology, Czech Academy of Sciences, Vestec, Czech Republic
| | - Benjamin D Sellers
- Discovery Chemistry, Genentech Inc., San Francisco, CA, USA
- Computational Chemistry, Vilya, South San Francisco, CA, USA
| | - Chenghua Shao
- RCSB Protein Data Bank and Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ, USA
| | | | | | | | - Ying Yang
- Structural Biology, Genentech Inc., South San Francisco, CA, USA
| | - Venkat Abbaraju
- RCSB Protein Data Bank and Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ, USA
| | - Pavel V Afonine
- Molecular Biophysics and Integrated Bioimaging Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Matthew L Baker
- Department of Biochemistry and Molecular Biology, The University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Paul S Bond
- York Structural Biology Laboratory, Department of Chemistry, University of York, York, UK
| | - Tom L Blundell
- Department of Biochemistry, University of Cambridge, Cambridge, UK
| | - Tom Burnley
- Scientific Computing Department, UKRI Science and Technology Facilities Council, Research Complex at Harwell, Didcot, UK
| | - Arthur Campbell
- Center for Development of Therapeutics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Renzhi Cao
- Department of Computer Science, Pacific Lutheran University, Tacoma, WA, USA
| | - Jianlin Cheng
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO, USA
| | | | - K D Cowtan
- York Structural Biology Laboratory, Department of Chemistry, University of York, York, UK
| | - Frank DiMaio
- Department of Biochemistry and Institute for Protein Design, University of Washington, Seattle, WA, USA
| | - Reza Esmaeeli
- Department of Chemistry and Quantum Theory Project, University of Florida, Gainesville, FL, USA
| | - Nabin Giri
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO, USA
| | - Helmut Grubmüller
- Theoretical and Computational Biophysics Department, Max Planck Institute for Multidisciplinary Sciences, Göttingen, Germany
| | - Soon Wen Hoh
- York Structural Biology Laboratory, Department of Chemistry, University of York, York, UK
| | - Jie Hou
- Department of Computer Science, Saint Louis University, St. Louis, MO, USA
| | - Corey F Hryc
- Department of Biochemistry and Molecular Biology, The University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Carola Hunte
- Institute of Biochemistry and Molecular Biology, ZBMZ, Faculty of Medicine and CIBSS-Centre for Integrative Biological Signalling Studies, University of Freiburg, Freiburg, Germany
| | - Maxim Igaev
- Theoretical and Computational Biophysics Department, Max Planck Institute for Multidisciplinary Sciences, Göttingen, Germany
| | - Agnel P Joseph
- Scientific Computing Department, UKRI Science and Technology Facilities Council, Research Complex at Harwell, Didcot, UK
| | - Wei-Chun Kao
- Institute of Biochemistry and Molecular Biology, ZBMZ, Faculty of Medicine and CIBSS-Centre for Integrative Biological Signalling Studies, University of Freiburg, Freiburg, Germany
| | - Daisuke Kihara
- Department of Biological Sciences, Purdue University, West Lafayette, IN, USA
- Department of Computer Science, Purdue University, West Lafayette, IN, USA
| | - Dilip Kumar
- Verna and Marrs McLean Department of Biochemistry and Molecular Biology, Baylor College of Medicine, Houston, TX, USA
- Trivedi School of Biosciences, Ashoka University, Sonipat, India
| | - Lijun Lang
- Department of Chemistry and Quantum Theory Project, University of Florida, Gainesville, FL, USA
- The Chinese University of Hong Kong, Hong Kong, China
| | - Sean Lin
- Division of Computing & Software Systems, University of Washington, Bothell, WA, USA
| | | | - Sumit Mittal
- Biodesign Institute, Arizona State University, Tempe, AZ, USA
- School of Advanced Sciences and Languages, VIT Bhopal University, Bhopal, India
| | - Arup Mondal
- Department of Chemistry and Quantum Theory Project, University of Florida, Gainesville, FL, USA
- National Renewable Energy Laboratory (NREL), Golden, CO, USA
| | - Nigel W Moriarty
- Molecular Biophysics and Integrated Bioimaging Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Andrew Muenks
- Department of Biochemistry and Institute for Protein Design, University of Washington, Seattle, WA, USA
| | | | - Robert A Nicholls
- MRC Laboratory of Molecular Biology, Cambridge, UK
- Scientific Computing Department, UKRI Science and Technology Facilities Council, Research Complex at Harwell, Didcot, UK
| | - Mateusz Olek
- York Structural Biology Laboratory, Department of Chemistry, University of York, York, UK
- Electron Bio-Imaging Centre, Diamond Light Source, Harwell Science and Innovation Campus, Didcot, UK
| | - Colin M Palmer
- Scientific Computing Department, UKRI Science and Technology Facilities Council, Research Complex at Harwell, Didcot, UK
| | - Alberto Perez
- Department of Chemistry and Quantum Theory Project, University of Florida, Gainesville, FL, USA
| | - Emmi Pohjolainen
- Theoretical and Computational Biophysics Department, Max Planck Institute for Multidisciplinary Sciences, Göttingen, Germany
| | - Karunakar R Pothula
- Institute of Biological Information Processing (IBI-7, Structural Biochemistry) and Jülich Centre for Structural Biology (JuStruct), Forschungszentrum Jülich, Jülich, Germany
| | | | - Daipayan Sarkar
- Department of Biological Sciences, Purdue University, West Lafayette, IN, USA
- Biodesign Institute, Arizona State University, Tempe, AZ, USA
- MSU-DOE Plant Research Laboratory, East Lansing, MI, USA
- School of Molecular Sciences, Arizona State University, Tempe, AZ, USA
| | - Luisa U Schäfer
- Institute of Biological Information Processing (IBI-7, Structural Biochemistry) and Jülich Centre for Structural Biology (JuStruct), Forschungszentrum Jülich, Jülich, Germany
| | - Christopher J Schlicksup
- Molecular Biophysics and Integrated Bioimaging Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Gunnar F Schröder
- Institute of Biological Information Processing (IBI-7, Structural Biochemistry) and Jülich Centre for Structural Biology (JuStruct), Forschungszentrum Jülich, Jülich, Germany
- Physics Department, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
| | - Mrinal Shekhar
- Center for Development of Therapeutics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Biodesign Institute, Arizona State University, Tempe, AZ, USA
| | - Dong Si
- Division of Computing & Software Systems, University of Washington, Bothell, WA, USA
| | | | - Oleg V Sobolev
- Molecular Biophysics and Integrated Bioimaging Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Genki Terashi
- Department of Biological Sciences, Purdue University, West Lafayette, IN, USA
| | - Andrea C Vaiana
- Theoretical and Computational Biophysics Department, Max Planck Institute for Multidisciplinary Sciences, Göttingen, Germany
- Nature's Toolbox (NTx), Rio Rancho, NM, USA
| | | | - Jacob Verburgt
- Department of Biological Sciences, Purdue University, West Lafayette, IN, USA
| | - Xiao Wang
- Department of Computer Science, Purdue University, West Lafayette, IN, USA
| | | | - Martyn D Winn
- Scientific Computing Department, UKRI Science and Technology Facilities Council, Research Complex at Harwell, Didcot, UK
| | - Simone Weyand
- Department of Biochemistry, University of Cambridge, Cambridge, UK
| | | | - Minglei Zhao
- Department of Biochemistry and Molecular Biology, University of Chicago, Chicago, IL, USA
| | - Michael F Schmid
- Division of Cryo-EM and Bioimaging, SSRL, SLAC National Accelerator Laboratory, Menlo Park, CA, USA
| | - Helen M Berman
- Department of Chemistry and Chemical Biology, Rutgers, The State University of New Jersey, Piscataway, NJ, USA
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA
| | - Wah Chiu
- Departments of Bioengineering and of Microbiology and Immunology, Stanford University, Stanford, CA, USA.
- Division of Cryo-EM and Bioimaging, SSRL, SLAC National Accelerator Laboratory, Menlo Park, CA, USA.
| |
Collapse
|
7
|
Zhao N, Wu T, Wang W, Zhang L, Gong X. Review and Comparative Analysis of Methods and Advancements in Predicting Protein Complex Structure. Interdiscip Sci 2024; 16:261-288. [PMID: 38955920 DOI: 10.1007/s12539-024-00626-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2023] [Revised: 02/29/2024] [Accepted: 03/01/2024] [Indexed: 07/04/2024]
Abstract
Protein complexes perform diverse biological functions, and obtaining their three-dimensional structure is critical to understanding and grasping their functions. In many cases, it's not just two proteins interacting to form a dimer; instead, multiple proteins interact to form a multimer. Experimentally resolving protein complex structures can be quite challenging. Recently, there have been efforts and methods that build upon prior predictions of dimer structures to attempt to predict multimer structures. However, in comparison to monomeric protein structure prediction, the accuracy of protein complex structure prediction remains relatively low. This paper provides an overview of recent advancements in efficient computational models for predicting protein complex structures. We introduce protein-protein docking methods in detail and summarize their main ideas, applicable modes, and related information. To enhance prediction accuracy, other critical protein-related information is also integrated, such as predicting interchain residue contact, utilizing experimental data like cryo-EM experiments, and considering protein interactions and non-interactions. In addition, we comprehensively review computational approaches for end-to-end prediction of protein complex structures based on artificial intelligence (AI) technology and describe commonly used datasets and representative evaluation metrics in protein complexes. Finally, we analyze the formidable challenges faced in current protein complex structure prediction tasks, including the structure prediction of heteromeric complex, disordered regions in complex, antibody-antigen complex, and RNA-related complex, as well as the evaluation metrics for complex assessment. We hope that this work will provide comprehensive knowledge of complex structure predictions to contribute to future advanced predictions.
Collapse
Affiliation(s)
- Nan Zhao
- Institute for Mathematical Sciences, Renmin University of China, Beijing, 100872, China
- School of Mathematics, Renmin University of China, Beijing, 100872, China
| | - Tong Wu
- Institute for Mathematical Sciences, Renmin University of China, Beijing, 100872, China
- School of Mathematics, Renmin University of China, Beijing, 100872, China
| | - Wenda Wang
- Institute for Mathematical Sciences, Renmin University of China, Beijing, 100872, China
- School of Mathematics, Renmin University of China, Beijing, 100872, China
| | - Lunchuan Zhang
- School of Mathematics, Renmin University of China, Beijing, 100872, China.
| | - Xinqi Gong
- Institute for Mathematical Sciences, Renmin University of China, Beijing, 100872, China.
- School of Mathematics, Renmin University of China, Beijing, 100872, China.
- Beijing Academy of Artificial Intelligence, Beijing, 100084, China.
| |
Collapse
|
8
|
Abramson J, Adler J, Dunger J, Evans R, Green T, Pritzel A, Ronneberger O, Willmore L, Ballard AJ, Bambrick J, Bodenstein SW, Evans DA, Hung CC, O'Neill M, Reiman D, Tunyasuvunakool K, Wu Z, Žemgulytė A, Arvaniti E, Beattie C, Bertolli O, Bridgland A, Cherepanov A, Congreve M, Cowen-Rivers AI, Cowie A, Figurnov M, Fuchs FB, Gladman H, Jain R, Khan YA, Low CMR, Perlin K, Potapenko A, Savy P, Singh S, Stecula A, Thillaisundaram A, Tong C, Yakneen S, Zhong ED, Zielinski M, Žídek A, Bapst V, Kohli P, Jaderberg M, Hassabis D, Jumper JM. Accurate structure prediction of biomolecular interactions with AlphaFold 3. Nature 2024; 630:493-500. [PMID: 38718835 PMCID: PMC11168924 DOI: 10.1038/s41586-024-07487-w] [Citation(s) in RCA: 10] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2023] [Accepted: 04/29/2024] [Indexed: 06/13/2024]
Abstract
The introduction of AlphaFold 21 has spurred a revolution in modelling the structure of proteins and their interactions, enabling a huge range of applications in protein modelling and design2-6. Here we describe our AlphaFold 3 model with a substantially updated diffusion-based architecture that is capable of predicting the joint structure of complexes including proteins, nucleic acids, small molecules, ions and modified residues. The new AlphaFold model demonstrates substantially improved accuracy over many previous specialized tools: far greater accuracy for protein-ligand interactions compared with state-of-the-art docking tools, much higher accuracy for protein-nucleic acid interactions compared with nucleic-acid-specific predictors and substantially higher antibody-antigen prediction accuracy compared with AlphaFold-Multimer v.2.37,8. Together, these results show that high-accuracy modelling across biomolecular space is possible within a single unified deep-learning framework.
Collapse
Affiliation(s)
| | - Jonas Adler
- Core Contributor, Google DeepMind, London, UK
| | - Jack Dunger
- Core Contributor, Google DeepMind, London, UK
| | | | - Tim Green
- Core Contributor, Google DeepMind, London, UK
| | | | | | | | | | | | | | | | | | | | | | | | - Zachary Wu
- Core Contributor, Google DeepMind, London, UK
| | | | | | | | | | | | | | | | | | | | | | | | | | | | - Yousuf A Khan
- Google DeepMind, London, UK
- Department of Molecular and Cellular Physiology, Stanford University, Stanford, CA, USA
| | | | | | | | | | | | | | | | | | | | - Ellen D Zhong
- Google DeepMind, London, UK
- Department of Computer Science, Princeton University, Princeton, NJ, USA
| | | | | | | | | | | | - Demis Hassabis
- Core Contributor, Google DeepMind, London, UK.
- Core Contributor, Isomorphic Labs, London, UK.
| | | |
Collapse
|
9
|
Han Y, Lu Y, Yan X, Cui H, Cheng S, Zheng J, Zhou Y, Wang S, Li Z. Atom-ProteinQA: Atom-level protein model quality assessment through fine-grained joint learning. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2024; 249:108078. [PMID: 38537495 DOI: 10.1016/j.cmpb.2024.108078] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/08/2023] [Revised: 12/26/2023] [Accepted: 02/10/2024] [Indexed: 04/21/2024]
Abstract
MOTIVATION Protein model quality assessment (ProteinQA) is a fundamental task that is essential for biologically relevant applications, i.e., protein structure refinement, protein design, etc. Previous works aimed to conduct ProteinQA only on the global structure or per-residue level, ignoring potentially usable and precise cues from a fine-grained per-atom perspective. In this study, we propose an atom-level ProteinQA model, named Atom-ProteinQA, in which two innovative modules are designed to extract geometric and topological atom-level relationships respectively. Specifically, on the one hand, a geometric perception module exploits 3D sparse convolution to capture the geometric features of the input protein, generating fine-grained atom-level predictions. On the other hand, natural chemical bonds are utilized to construct an atom-level graph, then message passing from a topological perception module is applied to output residue-level predictions in parallel. Eventually, through a cross-model aggregation module, features from different modules mutually interact, enhancing performance on both the atom and residue levels. RESULTS Extensive experiments show that our proposed Atom-ProteinQA outperforms previous methods by a large margin, regardless of residue-level or atom-level assessment. Concretely, we achieved state-of-the-art performance on CATH-2084, Decoy-8000, public benchmarks CASP13 & CASP14, and the CAMEO. AVAILABILITY The repository of this project is released on: https://github.com/luyfcandy/Atom_ProteinQA.
Collapse
Affiliation(s)
- Yatong Han
- Future Network of Intelligence Institute, the Chinese University of Hong Kong (Shenzhen), Shenzhen, 518172, China; School of Science and Engineering, the Chinese University of Hong Kong (Shenzhen), Shenzhen, 518172, China
| | - Yingfeng Lu
- Future Network of Intelligence Institute, the Chinese University of Hong Kong (Shenzhen), Shenzhen, 518172, China; School of Science and Engineering, the Chinese University of Hong Kong (Shenzhen), Shenzhen, 518172, China
| | - Xu Yan
- Future Network of Intelligence Institute, the Chinese University of Hong Kong (Shenzhen), Shenzhen, 518172, China; School of Science and Engineering, the Chinese University of Hong Kong (Shenzhen), Shenzhen, 518172, China
| | - Hannah Cui
- Future Network of Intelligence Institute, the Chinese University of Hong Kong (Shenzhen), Shenzhen, 518172, China; School of Science and Engineering, the Chinese University of Hong Kong (Shenzhen), Shenzhen, 518172, China
| | | | - Jiayou Zheng
- Future Network of Intelligence Institute, the Chinese University of Hong Kong (Shenzhen), Shenzhen, 518172, China; School of Science and Engineering, the Chinese University of Hong Kong (Shenzhen), Shenzhen, 518172, China
| | - Yuzhe Zhou
- Future Network of Intelligence Institute, the Chinese University of Hong Kong (Shenzhen), Shenzhen, 518172, China; School of Science and Engineering, the Chinese University of Hong Kong (Shenzhen), Shenzhen, 518172, China
| | - Sheng Wang
- Shanghai Zelixir Biotech Company Ltd., Shanghai, 200030, China.
| | - Zhen Li
- Future Network of Intelligence Institute, the Chinese University of Hong Kong (Shenzhen), Shenzhen, 518172, China; School of Science and Engineering, the Chinese University of Hong Kong (Shenzhen), Shenzhen, 518172, China.
| |
Collapse
|
10
|
Tang X, Dai H, Knight E, Wu F, Li Y, Li T, Gerstein M. A survey of generative AI for de novo drug design: new frontiers in molecule and protein generation. Brief Bioinform 2024; 25:bbae338. [PMID: 39007594 PMCID: PMC11247410 DOI: 10.1093/bib/bbae338] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2024] [Revised: 05/21/2024] [Accepted: 06/27/2024] [Indexed: 07/16/2024] Open
Abstract
Artificial intelligence (AI)-driven methods can vastly improve the historically costly drug design process, with various generative models already in widespread use. Generative models for de novo drug design, in particular, focus on the creation of novel biological compounds entirely from scratch, representing a promising future direction. Rapid development in the field, combined with the inherent complexity of the drug design process, creates a difficult landscape for new researchers to enter. In this survey, we organize de novo drug design into two overarching themes: small molecule and protein generation. Within each theme, we identify a variety of subtasks and applications, highlighting important datasets, benchmarks, and model architectures and comparing the performance of top models. We take a broad approach to AI-driven drug design, allowing for both micro-level comparisons of various methods within each subtask and macro-level observations across different fields. We discuss parallel challenges and approaches between the two applications and highlight future directions for AI-driven de novo drug design as a whole. An organized repository of all covered sources is available at https://github.com/gersteinlab/GenAI4Drug.
Collapse
Affiliation(s)
- Xiangru Tang
- Department of Computer Science, Yale University, New Haven, CT 06520, United States
| | - Howard Dai
- Department of Computer Science, Yale University, New Haven, CT 06520, United States
| | - Elizabeth Knight
- School of Medicine, Yale University, New Haven, CT 06520, United States
| | - Fang Wu
- Computer Science Department, Stanford University, CA 94305, United States
| | - Yunyang Li
- Department of Computer Science, Yale University, New Haven, CT 06520, United States
| | - Tianxiao Li
- Program in Computational Biology & Bioinformatics, Yale University, New Haven, CT 06520, United States
| | - Mark Gerstein
- Department of Computer Science, Yale University, New Haven, CT 06520, United States
- Program in Computational Biology & Bioinformatics, Yale University, New Haven, CT 06520, United States
- Department of Statistics & Data Science, Yale University, New Haven, CT 06520, United States
- Department of Biomedical Informatics & Data Science, Yale University, New Haven, CT 06520, United States
- Department of Molecular Biophysics & Biochemistry, Yale University, New Haven, CT 06520, United States
| |
Collapse
|
11
|
Raisinghani N, Alshahrani M, Gupta G, Xiao S, Tao P, Verkhivker G. AlphaFold2 Predictions of Conformational Ensembles and Atomistic Simulations of the SARS-CoV-2 Spike XBB Lineages Reveal Epistatic Couplings between Convergent Mutational Hotspots that Control ACE2 Affinity. J Phys Chem B 2024; 128:4696-4715. [PMID: 38696745 DOI: 10.1021/acs.jpcb.4c01341] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/04/2024]
Abstract
In this study, we combined AlphaFold-based atomistic structural modeling, microsecond molecular simulations, mutational profiling, and network analysis to characterize binding mechanisms of the SARS-CoV-2 spike protein with the host receptor ACE2 for a series of Omicron XBB variants including XBB.1.5, XBB.1.5+L455F, XBB.1.5+F456L, and XBB.1.5+L455F+F456L. AlphaFold-based structural and dynamic modeling of SARS-CoV-2 Spike XBB lineages can accurately predict the experimental structures and characterize conformational ensembles of the spike protein complexes with the ACE2. Microsecond molecular dynamics simulations identified important differences in the conformational landscapes and equilibrium ensembles of the XBB variants, suggesting that combining AlphaFold predictions of multiple conformations with molecular dynamics simulations can provide a complementary approach for the characterization of functional protein states and binding mechanisms. Using the ensemble-based mutational profiling of protein residues and physics-based rigorous calculations of binding affinities, we identified binding energy hotspots and characterized the molecular basis underlying epistatic couplings between convergent mutational hotspots. Consistent with the experiments, the results revealed the mediating role of the Q493 hotspot in the synchronization of epistatic couplings between L455F and F456L mutations, providing a quantitative insight into the energetic determinants underlying binding differences between XBB lineages. We also proposed a network-based perturbation approach for mutational profiling of allosteric communications and uncovered the important relationships between allosteric centers mediating long-range communication and binding hotspots of epistatic couplings. The results of this study support a mechanism in which the binding mechanisms of the XBB variants may be determined by epistatic effects between convergent evolutionary hotspots that control ACE2 binding.
Collapse
Affiliation(s)
- Nishank Raisinghani
- Keck Center for Science and Engineering, Graduate Program in Computational and Data Sciences, Schmid College of Science and Technology, Chapman University, Orange, California 92866, United States
| | - Mohammed Alshahrani
- Keck Center for Science and Engineering, Graduate Program in Computational and Data Sciences, Schmid College of Science and Technology, Chapman University, Orange, California 92866, United States
| | - Grace Gupta
- Keck Center for Science and Engineering, Graduate Program in Computational and Data Sciences, Schmid College of Science and Technology, Chapman University, Orange, California 92866, United States
| | - Sian Xiao
- Department of Chemistry, Center for Research Computing, Center for Drug Discovery, Design, and Delivery (CD4), Southern Methodist University, Dallas, Texas 75275, United States
| | - Peng Tao
- Department of Chemistry, Center for Research Computing, Center for Drug Discovery, Design, and Delivery (CD4), Southern Methodist University, Dallas, Texas 75275, United States
| | - Gennady Verkhivker
- Keck Center for Science and Engineering, Graduate Program in Computational and Data Sciences, Schmid College of Science and Technology, Chapman University, Orange, California 92866, United States
- Department of Biomedical and Pharmaceutical Sciences, Chapman University School of Pharmacy, Irvine, California 92618, United States
| |
Collapse
|
12
|
Fazekas Z, K Menyhárd D, Perczel A. LoCoHD: a metric for comparing local environments of proteins. Nat Commun 2024; 15:4029. [PMID: 38740745 DOI: 10.1038/s41467-024-48225-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2023] [Accepted: 04/22/2024] [Indexed: 05/16/2024] Open
Abstract
Protein folds and the local environments they create can be compared using a variety of differently designed measures, such as the root mean squared deviation, the global distance test, the template modeling score or the local distance difference test. Although these measures have proven to be useful for a variety of tasks, each fails to fully incorporate the valuable chemical information inherent to atoms and residues, and considers these only partially and indirectly. Here, we develop the highly flexible local composition Hellinger distance (LoCoHD) metric, which is based on the chemical composition of local residue environments. Using LoCoHD, we analyze the chemical heterogeneity of amino acid environments and identify valines having the most conserved-, and arginines having the most variable chemical environments. We use LoCoHD to investigate structural ensembles, to evaluate critical assessment of structure prediction (CASP) competitors, to compare the results with the local distance difference test (lDDT) scoring system, and to evaluate a molecular dynamics simulation. We show that LoCoHD measurements provide unique information about protein structures that is distinct from, for example, those derived using the alignment-based RMSD metric, or the similarly distance matrix-based but alignment-free lDDT metric.
Collapse
Affiliation(s)
- Zsolt Fazekas
- Laboratory of Structural Chemistry and Biology, Institute of Chemistry, ELTE Eötvös Loránd University, Budapest, Hungary
- ELTE Hevesy György PhD School of Chemistry, ELTE Eötvös Loránd University, Budapest, Hungary
| | - Dóra K Menyhárd
- Laboratory of Structural Chemistry and Biology, Institute of Chemistry, ELTE Eötvös Loránd University, Budapest, Hungary
- HUN-REN-ELTE Protein Modeling Research Group, ELTE Eötvös Loránd University, Budapest, Hungary
| | - András Perczel
- Laboratory of Structural Chemistry and Biology, Institute of Chemistry, ELTE Eötvös Loránd University, Budapest, Hungary.
- HUN-REN-ELTE Protein Modeling Research Group, ELTE Eötvös Loránd University, Budapest, Hungary.
| |
Collapse
|
13
|
Chen X, Liu J, Park N, Cheng J. A Survey of Deep Learning Methods for Estimating the Accuracy of Protein Quaternary Structure Models. Biomolecules 2024; 14:574. [PMID: 38785981 PMCID: PMC11117562 DOI: 10.3390/biom14050574] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2024] [Revised: 04/07/2024] [Accepted: 05/09/2024] [Indexed: 05/25/2024] Open
Abstract
The quality prediction of quaternary structure models of a protein complex, in the absence of its true structure, is known as the Estimation of Model Accuracy (EMA). EMA is useful for ranking predicted protein complex structures and using them appropriately in biomedical research, such as protein-protein interaction studies, protein design, and drug discovery. With the advent of more accurate protein complex (multimer) prediction tools, such as AlphaFold2-Multimer and ESMFold, the estimation of the accuracy of protein complex structures has attracted increasing attention. Many deep learning methods have been developed to tackle this problem; however, there is a noticeable absence of a comprehensive overview of these methods to facilitate future development. Addressing this gap, we present a review of deep learning EMA methods for protein complex structures developed in the past several years, analyzing their methodologies, data and feature construction. We also provide a prospective summary of some potential new developments for further improving the accuracy of the EMA methods.
Collapse
Affiliation(s)
- Xiao Chen
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO 65211, USA
| | - Jian Liu
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO 65211, USA
- NextGen Precision Health Institute, University of Missouri, Columbia, MO 65211, USA
| | - Nolan Park
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO 65211, USA
| | - Jianlin Cheng
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO 65211, USA
- NextGen Precision Health Institute, University of Missouri, Columbia, MO 65211, USA
| |
Collapse
|
14
|
Lee S, Kim G, Karin EL, Mirdita M, Park S, Chikhi R, Babaian A, Kryshtafovych A, Steinegger M. Petabase-Scale Homology Search for Structure Prediction. Cold Spring Harb Perspect Biol 2024; 16:a041465. [PMID: 38316555 PMCID: PMC11065157 DOI: 10.1101/cshperspect.a041465] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2024]
Abstract
The recent CASP15 competition highlighted the critical role of multiple sequence alignments (MSAs) in protein structure prediction, as demonstrated by the success of the top AlphaFold2-based prediction methods. To push the boundaries of MSA utilization, we conducted a petabase-scale search of the Sequence Read Archive (SRA), resulting in gigabytes of aligned homologs for CASP15 targets. These were merged with default MSAs produced by ColabFold-search and provided to ColabFold-predict. By using SRA data, we achieved highly accurate predictions (GDT_TS > 70) for 66% of the non-easy targets, whereas using ColabFold-search default MSAs scored highly in only 52%. Next, we tested the effect of deep homology search and ColabFold's advanced features, such as more recycles, on prediction accuracy. While SRA homologs were most significant for improving ColabFold's CASP15 ranking from 11th to 3rd place, other strategies contributed too. We analyze these in the context of existing strategies to improve prediction.
Collapse
Affiliation(s)
- Sewon Lee
- School of Biological Sciences, Seoul National University, Gwanak-gu, Seoul 08826, South Korea
| | - Gyuri Kim
- School of Biological Sciences, Seoul National University, Gwanak-gu, Seoul 08826, South Korea
| | | | - Milot Mirdita
- School of Biological Sciences, Seoul National University, Gwanak-gu, Seoul 08826, South Korea
| | - Sukhwan Park
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul 08826, South Korea
| | - Rayan Chikhi
- Institut Pasteur, Université Paris Cité, G5 Sequence Bioinformatics, 75015 Paris, France
| | - Artem Babaian
- Department of Molecular Genetics, University of Toronto, Toronto, Ontario M5S 1A8, Canada
- Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, Ontario M5S 3E1, Canada
| | | | - Martin Steinegger
- School of Biological Sciences, Seoul National University, Gwanak-gu, Seoul 08826, South Korea
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul 08826, South Korea
- Artificial Intelligence Institute, Seoul National University, Seoul 08826, South Korea
- Institute of Molecular Biology and Genetics, Seoul National University, Seoul 08826, South Korea
| |
Collapse
|
15
|
Raisinghani N, Alshahrani M, Gupta G, Xiao S, Tao P, Verkhivker G. Predicting Functional Conformational Ensembles and Binding Mechanisms of Convergent Evolution for SARS-CoV-2 Spike Omicron Variants Using AlphaFold2 Sequence Scanning Adaptations and Molecular Dynamics Simulations. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.04.02.587850. [PMID: 38617283 PMCID: PMC11014522 DOI: 10.1101/2024.04.02.587850] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/16/2024]
Abstract
In this study, we combined AlphaFold-based approaches for atomistic modeling of multiple protein states and microsecond molecular simulations to accurately characterize conformational ensembles and binding mechanisms of convergent evolution for the SARS-CoV-2 Spike Omicron variants BA.1, BA.2, BA.2.75, BA.3, BA.4/BA.5 and BQ.1.1. We employed and validated several different adaptations of the AlphaFold methodology for modeling of conformational ensembles including the introduced randomized full sequence scanning for manipulation of sequence variations to systematically explore conformational dynamics of Omicron Spike protein complexes with the ACE2 receptor. Microsecond atomistic molecular dynamic simulations provide a detailed characterization of the conformational landscapes and thermodynamic stability of the Omicron variant complexes. By integrating the predictions of conformational ensembles from different AlphaFold adaptations and applying statistical confidence metrics we can expand characterization of the conformational ensembles and identify functional protein conformations that determine the equilibrium dynamics for the Omicron Spike complexes with the ACE2. Conformational ensembles of the Omicron RBD-ACE2 complexes obtained using AlphaFold-based approaches for modeling protein states and molecular dynamics simulations are employed for accurate comparative prediction of the binding energetics revealing an excellent agreement with the experimental data. In particular, the results demonstrated that AlphaFold-generated extended conformational ensembles can produce accurate binding energies for the Omicron RBD-ACE2 complexes. The results of this study suggested complementarities and potential synergies between AlphaFold predictions of protein conformational ensembles and molecular dynamics simulations showing that integrating information from both methods can potentially yield a more adequate characterization of the conformational landscapes for the Omicron RBD-ACE2 complexes. This study provides insights in the interplay between conformational dynamics and binding, showing that evolution of Omicron variants through acquisition of convergent mutational sites may leverage conformational adaptability and dynamic couplings between key binding energy hotspots to optimize ACE2 binding affinity and enable immune evasion.
Collapse
|
16
|
Raisinghani N, Alshahrani M, Gupta G, Xiao S, Tao P, Verkhivker G. AlphaFold2-Enabled Atomistic Modeling of Structure, Conformational Ensembles, and Binding Energetics of the SARS-CoV-2 Omicron BA.2.86 Spike Protein with ACE2 Host Receptor and Antibodies: Compensatory Functional Effects of Binding Hotspots in Modulating Mechanisms of Receptor Binding and Immune Escape. J Chem Inf Model 2024; 64:1657-1681. [PMID: 38373700 DOI: 10.1021/acs.jcim.3c01857] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/21/2024]
Abstract
The latest wave of SARS-CoV-2 Omicron variants displayed a growth advantage and increased viral fitness through convergent evolution of functional hotspots that work synchronously to balance fitness requirements for productive receptor binding and efficient immune evasion. In this study, we combined AlphaFold2-based structural modeling approaches with atomistic simulations and mutational profiling of binding energetics and stability for prediction and comprehensive analysis of the structure, dynamics, and binding of the SARS-CoV-2 Omicron BA.2.86 spike variant with ACE2 host receptor and distinct classes of antibodies. We adapted several AlphaFold2 approaches to predict both the structure and conformational ensembles of the Omicron BA.2.86 spike protein in the complex with the host receptor. The results showed that the AlphaFold2-predicted structural ensemble of the BA.2.86 spike protein complex with ACE2 can accurately capture the main conformational states of the Omicron variant. Complementary to AlphaFold2 structural predictions, microsecond molecular dynamics simulations reveal the details of the conformational landscape and produced equilibrium ensembles of the BA.2.86 structures that are used to perform mutational scanning of spike residues and characterize structural stability and binding energy hotspots. The ensemble-based mutational profiling of the receptor binding domain residues in the BA.2 and BA.2.86 spike complexes with ACE2 revealed a group of conserved hydrophobic hotspots and critical variant-specific contributions of the BA.2.86 convergent mutational hotspots R403K, F486P, and R493Q. To examine the immune evasion properties of BA.2.86 in atomistic detail, we performed structure-based mutational profiling of the spike protein binding interfaces with distinct classes of antibodies that displayed significantly reduced neutralization against the BA.2.86 variant. The results revealed the molecular basis of compensatory functional effects of the binding hotspots, showing that BA.2.86 lineage may have evolved to outcompete other Omicron subvariants by improving immune evasion while preserving binding affinity with ACE2 via through a compensatory effect of R493Q and F486P convergent mutational hotspots. This study demonstrated that an integrative approach combining AlphaFold2 predictions with complementary atomistic molecular dynamics simulations and robust ensemble-based mutational profiling of spike residues can enable accurate and comprehensive characterization of structure, dynamics, and binding mechanisms of newly emerging Omicron variants.
Collapse
Affiliation(s)
- Nishank Raisinghani
- Keck Center for Science and Engineering, Graduate Program in Computational and Data Sciences, Schmid College of Science and Technology, Chapman University, Orange, California 92866, United States of America
| | - Mohammed Alshahrani
- Keck Center for Science and Engineering, Graduate Program in Computational and Data Sciences, Schmid College of Science and Technology, Chapman University, Orange, California 92866, United States of America
| | - Grace Gupta
- Keck Center for Science and Engineering, Graduate Program in Computational and Data Sciences, Schmid College of Science and Technology, Chapman University, Orange, California 92866, United States of America
| | - Sian Xiao
- Department of Chemistry, Center for Research Computing, Center for Drug Discovery, Design, and Delivery (CD4), Southern Methodist University, Dallas, Texas 75275, United States of America
| | - Peng Tao
- Department of Chemistry, Center for Research Computing, Center for Drug Discovery, Design, and Delivery (CD4), Southern Methodist University, Dallas, Texas 75275, United States of America
| | - Gennady Verkhivker
- Keck Center for Science and Engineering, Graduate Program in Computational and Data Sciences, Schmid College of Science and Technology, Chapman University, Orange, California 92866, United States of America
- Department of Biomedical and Pharmaceutical Sciences, Chapman University School of Pharmacy, Irvine, California 92618, United States of America
| |
Collapse
|
17
|
Chen Y, Zhang H, Wang W, Shen Y, Ping Z. Rapid generation of high-quality structure figures for publication with PyMOL-PUB. Bioinformatics 2024; 40:btae139. [PMID: 38449297 PMCID: PMC10950480 DOI: 10.1093/bioinformatics/btae139] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2023] [Revised: 02/08/2024] [Accepted: 03/05/2024] [Indexed: 03/08/2024] Open
Abstract
MOTIVATION The advancement of structural biology has increased the requirements for researchers to quickly and efficiently visualize molecular structures in silico. Meanwhile, it is also time-consuming for structural biologists to create publication-standard figures, as no useful tools can directly generate figures from structure data. Although manual editing can ensure that figures meet the standards required for publication, it requires a deep understanding of software operations and/or program call commands. Therefore, providing interfaces based on established software instead of manual editing becomes a significant necessity. RESULTS We developed PyMOL-PUB, based on the original design of PyMOL, to effectively create publication-quality figures from molecular structure data. It provides functions including structural alignment methods, functional coloring schemes, conformation adjustments, and layout plotting strategies. These functions allow users to easily generate high-quality figures, demonstrate structural differences, illustrate inter-molecular interactions, and predict performances of biomacromolecules. AVAILABILITY AND IMPLEMENTATION Our tool is publicly available at https://github.com/BGI-SynBio/PyMOL-PUB.
Collapse
Affiliation(s)
- Yuting Chen
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing 100049, China
- BGI Research, Shenzhen 518083, China
| | | | - Wen Wang
- BGI Research, Shenzhen 518083, China
- BGI Research, Changzhou 213299, China
| | - Yue Shen
- BGI Research, Shenzhen 518083, China
- BGI Research, Changzhou 213299, China
| | - Zhi Ping
- BGI Research, Shenzhen 518083, China
- BGI Research, Changzhou 213299, China
- School of Medicine, The Chinese University of Hong Kong, Shenzhen 518172, China
| |
Collapse
|
18
|
Raghuraman P, Ramireddy S, Raman G, Park S, Sudandiradoss C. Understanding a point mutation signature D54K in the caspase activation recruitment domain of NOD1 capitulating concerted immunity via atomistic simulation. J Biomol Struct Dyn 2024:1-17. [PMID: 38415678 DOI: 10.1080/07391102.2024.2322618] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2023] [Accepted: 12/11/2023] [Indexed: 02/29/2024]
Abstract
Point mutation D54K in the human N-terminal caspase recruitment domain (CARD) of nucleotide-binding oligomerization domain -1 (NOD1) abrogates an imperative downstream interaction with receptor-interacting protein kinase (RIPK2) that entails combating bacterial infections and inflammatory dysfunction. Here, we addressed the molecular details concerning conformational changes and interaction patterns (monomeric-dimeric states) of D54K by signature-based molecular dynamics simulation. Initially, the sequence analysis prioritized D54K as a pathogenic mutation, among other variants, based on a sequence signature. Since the mutation is highly conserved, we derived the distant ortholog to predict the sequence and structural similarity between native and mutant. This analysis showed the utility of 33 communal core residues associated with structural-functional preservation and variations, concurrently served to infer the cryptic hotspots Cys39, Glu53, Asp54, Glu56, Ile57, Leu74, and Lys78 determining the inter helical fold forming homodimers for putative receptor interaction. Subsequently, the atomistic simulations with free energy (MM/PB(GB)SA) calculations predicted structural alteration that takes place in the N-terminal mutant CARD where coils changed to helices (45 α3- L4-α4-L6- α683) in contrast to native (45T2-L4-α4-L6-T483). Likewise, the C-terminal helices 93T1-α7105 connected to the loops distorted compared to native 93α6-L7105 may result in conformational misfolding that promotes functional regulation and activation. These structural perturbations of D54K possibly destabilize the flexible adaptation of critical homotypic NOD1CARD-CARDRIPK2 interactions (α4Asp42-Arg488α5 and α6Phe86-Lys471α4) is consistent with earlier experimental reports. Altogether, our findings unveil the conformational plasticity of mutation-dependent immunomodulatory response and may aid in functional validation exploring clinical investigation on CARD-regulated immunotherapies to prevent systemic infection and inflammation.Communicated by Ramaswamy H. Sarma.
Collapse
Affiliation(s)
- P Raghuraman
- Department of Biotechnology, School of Bioscience and Technology, Vellore Institute of Technology, Vellore, India
- Department of Life Sciences, Yeungnam University, Gyeongsan, Gyeongsangbuk-do, Republic of Korea
| | - Sriroopreddy Ramireddy
- Department of Biotechnology, School of Bioscience and Technology, Vellore Institute of Technology, Vellore, India
- Department of Genetics and Molecular Biology, School of Health Sciences, The Apollo University, Chittoor, India
| | - Gurusamy Raman
- Department of Life Sciences, Yeungnam University, Gyeongsan, Gyeongsangbuk-do, Republic of Korea
| | - SeonJoo Park
- Department of Life Sciences, Yeungnam University, Gyeongsan, Gyeongsangbuk-do, Republic of Korea
| | - C Sudandiradoss
- Department of Biotechnology, School of Bioscience and Technology, Vellore Institute of Technology, Vellore, India
| |
Collapse
|
19
|
Ali MA, Caetano-Anollés G. AlphaFold2 Reveals Structural Patterns of Seasonal Haplotype Diversification in SARS-CoV-2 Spike Protein Variants. BIOLOGY 2024; 13:134. [PMID: 38534404 DOI: 10.3390/biology13030134] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/10/2024] [Revised: 02/07/2024] [Accepted: 02/16/2024] [Indexed: 03/28/2024]
Abstract
The slow experimental acquisition of high-quality atomic structures of the rapidly changing proteins of the COVID-19 virus challenges vaccine and therapeutic drug development efforts. Fortunately, deep learning tools such as AlphaFold2 can quickly generate reliable models of atomic structure at experimental resolution. Current modeling studies have focused solely on definitions of mutant constellations of Variants of Concern (VOCs), leaving out the impact of haplotypes on protein structure. Here, we conduct a thorough comparative structural analysis of S-proteins belonging to major VOCs and corresponding latitude-delimited haplotypes that affect viral seasonal behavior. Our approach identified molecular regions of importance as well as patterns of structural recruitment. The S1 subunit hosted the majority of structural changes, especially those involving the N-terminal domain (NTD) and the receptor-binding domain (RBD). In particular, structural changes in the NTD were much greater than just translations in three-dimensional space, altering the sub-structures to greater extents. We also revealed a notable pattern of structural recruitment with the early VOCs Alpha and Delta behaving antagonistically by suppressing regions of structural change introduced by their corresponding haplotypes, and the current VOC Omicron behaving synergistically by amplifying or collecting structural change. Remarkably, haplotypes altering the galectin-like structure of the NTD were major contributors to seasonal behavior, supporting its putative environmental-sensing role. Our results provide an extensive view of the evolutionary landscape of the S-protein across the COVID-19 pandemic. This view will help predict important regions of structural change in future variants and haplotypes for more efficient vaccine and drug development.
Collapse
Affiliation(s)
- Muhammad Asif Ali
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA
| | - Gustavo Caetano-Anollés
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA
| |
Collapse
|
20
|
Raisinghani N, Alshahrani M, Gupta G, Tian H, Xiao S, Tao P, Verkhivker G. Interpretable Atomistic Prediction and Functional Analysis of Conformational Ensembles and Allosteric States in Protein Kinases Using AlphaFold2 Adaptation with Randomized Sequence Scanning and Local Frustration Profiling. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.02.15.580591. [PMID: 38496487 PMCID: PMC10942451 DOI: 10.1101/2024.02.15.580591] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/19/2024]
Abstract
The groundbreaking achievements of AlphaFold2 (AF2) approaches in protein structure modeling marked a transformative era in structural biology. Despite the success of AF2 tools in predicting single protein structures, these methods showed intrinsic limitations in predicting multiple functional conformations of allosteric proteins and fold-switching systems. The recent NMR-based structural determination of the unbound ABL kinase in the active state and two inactive low-populated functional conformations that are unique for ABL kinase presents an ideal challenge for AF2 approaches. In the current study we employ several implementations of AF2 methods to predict protein conformational ensembles and allosteric states of the ABL kinase including (a) multiple sequence alignments (MSA) subsampling approach; (b) SPEACH_AF approach in which alanine scanning is performed on generated MSAs; and (c) introduced in this study randomized full sequence mutational scanning for manipulation of sequence variations combined with the MSA subsampling. We show that the proposed AF2 adaptation combined with local frustration mapping of conformational states enable accurate prediction of the ABL active and intermediate structures and conformational ensembles, also offering a robust approach for interpretable characterization of the AF2 predictions and limitations in detecting hidden allosteric states. We found that the large high frustration residue clusters are uniquely characteristic of the low-populated, fully inactive ABL form and can define energetically frustrated cracking sites of conformational transitions, presenting difficult targets for AF2 methods. This study uncovered previously unappreciated, fundamental connections between distinct patterns of local frustration in functional kinase states and AF2 successes/limitations in detecting low-populated frustrated conformations, providing a better understanding of benefits and limitations of current AF2-based adaptations in modeling of conformational ensembles.
Collapse
|
21
|
Morehead A, Cheng J. Geometry-complete perceptron networks for 3D molecular graphs. Bioinformatics 2024; 40:btae087. [PMID: 38373819 PMCID: PMC10904142 DOI: 10.1093/bioinformatics/btae087] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2023] [Revised: 12/30/2023] [Accepted: 02/16/2024] [Indexed: 02/21/2024] Open
Abstract
MOTIVATION The field of geometric deep learning has recently had a profound impact on several scientific domains such as protein structure prediction and design, leading to methodological advancements within and outside of the realm of traditional machine learning. Within this spirit, in this work, we introduce GCPNet, a new chirality-aware SE(3)-equivariant graph neural network designed for representation learning of 3D biomolecular graphs. We show that GCPNet, unlike previous representation learning methods for 3D biomolecules, is widely applicable to a variety of invariant or equivariant node-level, edge-level, and graph-level tasks on biomolecular structures while being able to (1) learn important chiral properties of 3D molecules and (2) detect external force fields. RESULTS Across four distinct molecular-geometric tasks, we demonstrate that GCPNet's predictions (1) for protein-ligand binding affinity achieve a statistically significant correlation of 0.608, more than 5%, greater than current state-of-the-art methods; (2) for protein structure ranking achieve statistically significant target-local and dataset-global correlations of 0.616 and 0.871, respectively; (3) for Newtownian many-body systems modeling achieve a task-averaged mean squared error less than 0.01, more than 15% better than current methods; and (4) for molecular chirality recognition achieve a state-of-the-art prediction accuracy of 98.7%, better than any other machine learning method to date. AVAILABILITY AND IMPLEMENTATION The source code, data, and instructions to train new models or reproduce our results are freely available at https://github.com/BioinfoMachineLearning/GCPNet.
Collapse
Affiliation(s)
- Alex Morehead
- Electrical Engineering & Computer Science, University of Missouri-Columbia, Columbia, MO 65211, United States
| | - Jianlin Cheng
- Electrical Engineering & Computer Science, University of Missouri-Columbia, Columbia, MO 65211, United States
| |
Collapse
|
22
|
Rosignoli S, Lustrino E, Di Silverio I, Paiardini A. Making Use of Averaging Methods in MODELLER for Protein Structure Prediction. Int J Mol Sci 2024; 25:1731. [PMID: 38339009 PMCID: PMC10855553 DOI: 10.3390/ijms25031731] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2023] [Revised: 01/23/2024] [Accepted: 01/29/2024] [Indexed: 02/12/2024] Open
Abstract
Recent advances in protein structure prediction, driven by AlphaFold 2 and machine learning, demonstrate proficiency in static structures but encounter challenges in capturing essential dynamic features crucial for understanding biological function. In this context, homology-based modeling emerges as a cost-effective and computationally efficient alternative. The MODELLER (version 10.5, accessed on 30 November 2023) algorithm can be harnessed for this purpose since it computes intermediate models during simulated annealing, enabling the exploration of attainable configurational states and energies while minimizing its objective function. There have been a few attempts to date to improve the models generated by its algorithm, and in particular, there is no literature regarding the implementation of an averaging procedure involving the intermediate models in the MODELLER algorithm. In this study, we examined MODELLER's output using 225 target-template pairs, extracting the best representatives of intermediate models. Applying an averaging procedure to the selected intermediate structures based on statistical potentials, we aimed to determine: (1) whether averaging improves the quality of structural models during the building phase; (2) if ranking by statistical potentials reliably selects the best models, leading to improved final model quality; (3) whether using a single template versus multiple templates affects the averaging approach; (4) whether the "ensemble" nature of the MODELLER building phase can be harnessed to capture low-energy conformations in holo structures modeling. Our findings indicate that while improvements typically fall short of a few decimal points in the model evaluation metric, a notable fraction of configurations exhibit slightly higher similarity to the native structure than MODELLER's proposed final model. The averaging-building procedure proves particularly beneficial in (1) regions of low sequence identity between the target and template(s), the most challenging aspect of homology modeling; (2) holo protein conformations generation, an area in which MODELLER and related tools usually fall short of the expected performance.
Collapse
Affiliation(s)
| | | | | | - Alessandro Paiardini
- Department of Biochemical Sciences, Sapienza University of Rome, 00185 Rome, Italy; (S.R.); (E.L.); (I.D.S.)
| |
Collapse
|
23
|
Radjasandirane R, de Brevern AG. AlphaFold2 for Protein Structure Prediction: Best Practices and Critical Analyses. Methods Mol Biol 2024; 2836:235-252. [PMID: 38995544 DOI: 10.1007/978-1-0716-4007-4_13] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/13/2024]
Abstract
AlphaFold2 (AF2) has emerged in recent years as a groundbreaking innovation that has revolutionized several scientific fields, in particular structural biology, drug design, and the elucidation of disease mechanisms. Many scientists now use AF2 on a daily basis, including non-specialist users. This chapter is aimed at the latter. Tips and tricks for getting the most out of AF2 to produce a high-quality biological model are discussed here. We suggest to non-specialist users how to maintain a critical perspective when working with AF2 models and provide guidelines on how to properly evaluate them. After showing how to perform our own structure prediction using ColabFold, we list several ways to improve AF2 models by adding information that is missing from the original AF2 model. By using software such as AlphaFill to add cofactors and ligands to the models, or MODELLER to add disulfide bridges between cysteines, we guide users to build a high-quality biological model suitable for applications such as drug design, protein interaction, or molecular dynamics studies.
Collapse
Affiliation(s)
- Ragousandirane Radjasandirane
- Université Paris Cité and Université des Antilles and Université de la Réunion, BIGR, UMR_S1134, DSIMB Team, Inserm, Paris, France
| | - Alexandre G de Brevern
- Université Paris Cité and Université des Antilles and Université de la Réunion, BIGR, UMR_S1134, DSIMB Team, Inserm, Paris, France.
| |
Collapse
|
24
|
Chailyan A, Marcatili P. Structural Characterization of Peptide Antibodies. Methods Mol Biol 2024; 2821:195-204. [PMID: 38997490 DOI: 10.1007/978-1-0716-3914-6_15] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/14/2024]
Abstract
The role of proteins as very effective immunogens for the generation of antibodies is indisputable. Nevertheless, cases in which protein usage for antibody production is not feasible or convenient compelled the creation of a powerful alternative consisting of synthetic peptides. Synthetic peptides can be modified to obtain desired properties or conformation, tagged for purification, isotopically labeled for protein quantitation or conjugated to immunogens for antibody production. The antibodies that bind to these peptides represent an invaluable tool for biological research and discovery. To better understand the underlying mechanisms of antibody-antigen interaction, here, we present a pipeline developed by us to structurally classify immunoglobulin antigen binding sites and to infer key sequence residues and other variables that have a prominent role in each structural class.
Collapse
Affiliation(s)
- Anna Chailyan
- Department of Biochemistry and Molecular Biology, University of Southern Denmark, Odense, Denmark
| | - Paolo Marcatili
- Center for Biological Sequence Analysis, Department of Systems Biology, Technical University of Denmark, Lyngby, Denmark
| |
Collapse
|
25
|
Ng TK, Ji J, Liu Q, Yao Y, Wang WY, Cao Y, Chen CB, Lin JW, Dong G, Cen LP, Huang C, Zhang M. Evaluation of Myocilin Variant Protein Structures Modeled by AlphaFold2. Biomolecules 2023; 14:14. [PMID: 38275755 PMCID: PMC10813463 DOI: 10.3390/biom14010014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2023] [Revised: 12/12/2023] [Accepted: 12/15/2023] [Indexed: 01/27/2024] Open
Abstract
Deep neural network-based programs can be applied to protein structure modeling by inputting amino acid sequences. Here, we aimed to evaluate the AlphaFold2-modeled myocilin wild-type and variant protein structures and compare to the experimentally determined protein structures. Molecular dynamic and ligand binding properties of the experimentally determined and AlphaFold2-modeled protein structures were also analyzed. AlphaFold2-modeled myocilin variant protein structures showed high similarities in overall structure to the experimentally determined mutant protein structures, but the orientations and geometries of amino acid side chains were slightly different. The olfactomedin-like domain of the modeled missense variant protein structures showed fewer folding changes than the nonsense variant when compared to the predicted wild-type protein structure. Differences were also observed in molecular dynamics and ligand binding sites between the AlphaFold2-modeled and experimentally determined structures as well as between the wild-type and variant structures. In summary, the folding of the AlphaFold2-modeled MYOC variant protein structures could be similar to that determined by the experiments but with differences in amino acid side chain orientations and geometries. Careful comparisons with experimentally determined structures are needed before the applications of the in silico modeled variant protein structures.
Collapse
Affiliation(s)
- Tsz Kin Ng
- Joint Shantou International Eye Center of Shantou University and The Chinese University of Hong Kong, Shantou 515041, China; (T.K.N.)
- Department of Ophthalmology and Visual Sciences, The Chinese University of Hong Kong, Hong Kong, China
| | - Jie Ji
- Network & Information Centre, Shantou University, Shantou 515041, China
| | - Qingping Liu
- Joint Shantou International Eye Center of Shantou University and The Chinese University of Hong Kong, Shantou 515041, China; (T.K.N.)
- Key Laboratory of Carbohydrate and Lipid Metabolism Research, College of Life Science and Technology, Dalian University, Dalian 116622, China
| | - Yao Yao
- Joint Shantou International Eye Center of Shantou University and The Chinese University of Hong Kong, Shantou 515041, China; (T.K.N.)
- Shantou University Medical College, Shantou 515041, China
| | - Wen-Ying Wang
- Joint Shantou International Eye Center of Shantou University and The Chinese University of Hong Kong, Shantou 515041, China; (T.K.N.)
- Shantou University Medical College, Shantou 515041, China
| | - Yingjie Cao
- Joint Shantou International Eye Center of Shantou University and The Chinese University of Hong Kong, Shantou 515041, China; (T.K.N.)
| | - Chong-Bo Chen
- Joint Shantou International Eye Center of Shantou University and The Chinese University of Hong Kong, Shantou 515041, China; (T.K.N.)
| | - Jian-Wei Lin
- Joint Shantou International Eye Center of Shantou University and The Chinese University of Hong Kong, Shantou 515041, China; (T.K.N.)
| | - Geng Dong
- Shantou University Medical College, Shantou 515041, China
| | - Ling-Ping Cen
- Joint Shantou International Eye Center of Shantou University and The Chinese University of Hong Kong, Shantou 515041, China; (T.K.N.)
| | - Chukai Huang
- Joint Shantou International Eye Center of Shantou University and The Chinese University of Hong Kong, Shantou 515041, China; (T.K.N.)
| | - Mingzhi Zhang
- Joint Shantou International Eye Center of Shantou University and The Chinese University of Hong Kong, Shantou 515041, China; (T.K.N.)
| |
Collapse
|
26
|
Raisinghani N, Alshahrani M, Gupta G, Xiao S, Tao P, Verkhivker G. AlphaFold2-Enabled Atomistic Modeling of Epistatic Binding Mechanisms for the SARS-CoV-2 Spike Omicron XBB.1.5, EG.5 and FLip Variants: Convergent Evolution Hotspots Cooperate to Control Stability and Conformational Adaptability in Balancing ACE2 Binding and Antibody Resistance. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.12.11.571185. [PMID: 38168257 PMCID: PMC10760024 DOI: 10.1101/2023.12.11.571185] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/05/2024]
Abstract
In this study, we combined AI-based atomistic structural modeling and microsecond molecular simulations of the SARS-CoV-2 Spike complexes with the host receptor ACE2 for XBB.1.5+L455F, XBB.1.5+F456L(EG.5) and XBB.1.5+L455F/F456L (FLip) lineages to examine the mechanisms underlying the role of convergent evolution hotspots in balancing ACE2 binding and antibody evasion. Using the ensemble-based mutational scanning of the spike protein residues and physics-based rigorous computations of binding affinities, we identified binding energy hotspots and characterized molecular basis underlying epistatic couplings between convergent mutational hotspots. Consistent with the experiments, the results revealed the mediating role of Q493 hotspot in synchronization of epistatic couplings between L455F and F456L mutations providing a quantitative insight into the mechanism underlying differences between XBB lineages. Mutational profiling is combined with network-based model of epistatic couplings showing that the Q493, L455 and F456 sites mediate stable communities at the binding interface with ACE2 and can serve as stable mediators of non-additive couplings. Structure-based mutational analysis of Spike protein binding with the class 1 antibodies quantified the critical role of F456L and F486P mutations in eliciting strong immune evasion response. The results of this analysis support a mechanism in which the emergence of EG.5 and FLip variants may have been dictated by leveraging strong epistatic effects between several convergent revolutionary hotspots that provide synergy between the improved ACE2 binding and broad neutralization resistance. This interpretation is consistent with the notion that functionally balanced substitutions which simultaneously optimize immune evasion and high ACE2 affinity may continue to emerge through lineages with beneficial pair or triplet combinations of RBD mutations involving mediators of epistatic couplings and sites in highly adaptable RBD regions.
Collapse
|
27
|
Simpkin AJ, Mesdaghi S, Sánchez Rodríguez F, Elliott L, Murphy DL, Kryshtafovych A, Keegan RM, Rigden DJ. Tertiary structure assessment at CASP15. Proteins 2023; 91:1616-1635. [PMID: 37746927 PMCID: PMC10792517 DOI: 10.1002/prot.26593] [Citation(s) in RCA: 14] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2023] [Revised: 08/25/2023] [Accepted: 09/07/2023] [Indexed: 09/26/2023]
Abstract
The results of tertiary structure assessment at CASP15 are reported. For the first time, recognizing the outstanding performance of AlphaFold 2 (AF2) at CASP14, all single-chain predictions were assessed together, irrespective of whether a template was available. At CASP15, there was no single stand-out group, with most of the best-scoring groups-led by PEZYFoldings, UM-TBM, and Yang Server-employing AF2 in one way or another. Many top groups paid special attention to generating deep Multiple Sequence Alignments (MSAs) and testing variant MSAs, thereby allowing them to successfully address some of the hardest targets. Such difficult targets, as well as lacking templates, were typically proteins with few homologues. Local divergence between prediction and target correlated with localization at crystal lattice or chain interfaces, and with regions exhibiting high B-factor factors in crystal structure targets, and should not necessarily be considered as representing error in the prediction. However, analysis of exposed and buried side chain accuracy showed room for improvement even in the latter. Nevertheless, a majority of groups produced high-quality predictions for most targets, which are valuable for experimental structure determination, functional analysis, and many other tasks across biology. These include those applying methods similar to those used to generate major resources such as the AlphaFold Protein Structure Database and the ESM Metagenomic atlas: the confidence estimates of the former were also notably accurate.
Collapse
Affiliation(s)
- Adam J. Simpkin
- Department of Biochemistry, Cell and Systems BiologyInstitute of Structural, Molecular and Integrative Biology, University of LiverpoolLiverpoolUK
| | - Shahram Mesdaghi
- Department of Biochemistry, Cell and Systems BiologyInstitute of Structural, Molecular and Integrative Biology, University of LiverpoolLiverpoolUK
- Computational Biology Facility, MerseyBio, University of LiverpoolLiverpoolUK
| | - Filomeno Sánchez Rodríguez
- Department of Biochemistry, Cell and Systems BiologyInstitute of Structural, Molecular and Integrative Biology, University of LiverpoolLiverpoolUK
- Life Science, Diamond Light Source, Harwell Science and Innovation CampusOxfordshireUK
- Department of Chemistry, York Structural Biology LaboratoryUniversity of YorkYorkUK
| | - Luc Elliott
- Department of Biochemistry, Cell and Systems BiologyInstitute of Structural, Molecular and Integrative Biology, University of LiverpoolLiverpoolUK
| | - David L. Murphy
- Department of Biochemistry, Cell and Systems BiologyInstitute of Structural, Molecular and Integrative Biology, University of LiverpoolLiverpoolUK
| | | | - Ronan M. Keegan
- UKRI‐STFC, Rutherford Appleton Laboratory, Research Complex at HarwellDidcotUK
| | - Daniel J. Rigden
- Department of Biochemistry, Cell and Systems BiologyInstitute of Structural, Molecular and Integrative Biology, University of LiverpoolLiverpoolUK
| |
Collapse
|
28
|
Das R, Kretsch RC, Simpkin AJ, Mulvaney T, Pham P, Rangan R, Bu F, Keegan RM, Topf M, Rigden DJ, Miao Z, Westhof E. Assessment of three-dimensional RNA structure prediction in CASP15. Proteins 2023; 91:1747-1770. [PMID: 37876231 PMCID: PMC10841292 DOI: 10.1002/prot.26602] [Citation(s) in RCA: 17] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2023] [Revised: 08/21/2023] [Accepted: 09/07/2023] [Indexed: 10/26/2023]
Abstract
The prediction of RNA three-dimensional structures remains an unsolved problem. Here, we report assessments of RNA structure predictions in CASP15, the first CASP exercise that involved RNA structure modeling. Forty-two predictor groups submitted models for at least one of twelve RNA-containing targets. These models were evaluated by the RNA-Puzzles organizers and, separately, by a CASP-recruited team using metrics (GDT, lDDT) and approaches (Z-score rankings) initially developed for assessment of proteins and generalized here for RNA assessment. The two assessments independently ranked the same predictor groups as first (AIchemy_RNA2), second (Chen), and third (RNAPolis and GeneSilico, tied); predictions from deep learning approaches were significantly worse than these top ranked groups, which did not use deep learning. Further analyses based on direct comparison of predicted models to cryogenic electron microscopy (cryo-EM) maps and x-ray diffraction data support these rankings. With the exception of two RNA-protein complexes, models submitted by CASP15 groups correctly predicted the global fold of the RNA targets. Comparisons of CASP15 submissions to designed RNA nanostructures as well as molecular replacement trials highlight the potential utility of current RNA modeling approaches for RNA nanotechnology and structural biology, respectively. Nevertheless, challenges remain in modeling fine details such as noncanonical pairs, in ranking among submitted models, and in prediction of multiple structures resolved by cryo-EM or crystallography.
Collapse
Affiliation(s)
- Rhiju Das
- Department of Biochemistry, Stanford University School of Medicine, CA USA
- Biophysics Program, Stanford University School of Medicine, CA USA
- Howard Hughes Medical Institute, Stanford University, CA USA
| | | | - Adam J. Simpkin
- Institute of Systems, Molecular & Integrative Biology, The University of Liverpool, UK
| | - Thomas Mulvaney
- Centre for Structural Systems Biology (CSSB), Leibniz-Institut für Virologie (LIV), Hamburg, Germany
- University Medical Center Hamburg-Eppendorf (UKE), Hamburg, Germany
| | - Phillip Pham
- Department of Biochemistry, Stanford University School of Medicine, CA USA
| | - Ramya Rangan
- Biophysics Program, Stanford University School of Medicine, CA USA
| | - Fan Bu
- Guangzhou Laboratory, Guangzhou International Bio Island, Guangzhou 510005, China
- Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei 230036, Anhui, China
| | - Ronan M. Keegan
- Institute of Systems, Molecular & Integrative Biology, The University of Liverpool, UK
- Life Science, Diamond Light Source, Harwell Science, UK
| | - Maya Topf
- Centre for Structural Systems Biology (CSSB), Leibniz-Institut für Virologie (LIV), Hamburg, Germany
- University Medical Center Hamburg-Eppendorf (UKE), Hamburg, Germany
| | - Daniel J. Rigden
- Institute of Systems, Molecular & Integrative Biology, The University of Liverpool, UK
| | - Zhichao Miao
- GMU-GIBH Joint School of Life Sciences, The Guangdong-Hong Kong-Macau Joint Laboratory for Cell Fate Regulation and Diseases, Guangzhou National Laboratory, Guangzhou Medical University
- Shanghai Key Laboratory of Anesthesiology and Brain Functional Modulation, Clinical Research Center for Anesthesiology and Perioperative Medicine, Translational Research Institute of Brain and Brain-Like Intelligence, Shanghai Fourth People's Hospital, School of Medicine, Tongji University, Shanghai 200434, China
| | - Eric Westhof
- Architecture et Réactivité de l’ARN, Institut de Biologie Moléculaire et Cellulaire du CNRS, Université de Strasbourg, F-67084, Strasbourg, France
| |
Collapse
|
29
|
Li J, Zhang S, Chen SJ. Advancing RNA 3D structure prediction: Exploring hierarchical and hybrid approaches in CASP15. Proteins 2023; 91:1779-1789. [PMID: 37615235 PMCID: PMC10841231 DOI: 10.1002/prot.26583] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2023] [Revised: 06/19/2023] [Accepted: 08/08/2023] [Indexed: 08/25/2023]
Abstract
In CASP15, we used an integrated hierarchical and hybrid approach to predict RNA structures. The approach involves three steps. First, with the use of physics-based methods, Vfold2D-MC and VfoldMCPX, we predict the 2D structures from the sequence. Second, we employ template-based methods, Vfold3D and VfoldLA, to build 3D scaffolds for the predicted 2D structures. Third, using the 3D scaffolds as initial structures and the predicted 2D structures as constraints, we predict the 3D structure from coarse-grained molecular dynamics simulations, IsRNA and RNAJP. Our approach was evaluated on 12 RNA targets in CASP15 and ranked second among all the 34 participating teams. The result demonstrated the reliability of our method in predicting RNA 2D structures with high accuracy and RNA 3D structures with moderate accuracy. Further improvements in RNA structure prediction for the next round of CASP may come from the incorporation of the physics-based method with machine learning techniques.
Collapse
Affiliation(s)
- Jun Li
- Department of Physics, Department of Biochemistry, and Institute for Data Science and Informatics, University of Missouri, Columbia, Missouri 65211, United States
| | - Sicheng Zhang
- Department of Physics, Department of Biochemistry, and Institute for Data Science and Informatics, University of Missouri, Columbia, Missouri 65211, United States
| | - Shi-Jie Chen
- Department of Physics, Department of Biochemistry, and Institute for Data Science and Informatics, University of Missouri, Columbia, Missouri 65211, United States
| |
Collapse
|
30
|
Fang Y, Jiang Y, Wei L, Ma Q, Ren Z, Yuan Q, Wei DQ. DeepProSite: structure-aware protein binding site prediction using ESMFold and pretrained language model. Bioinformatics 2023; 39:btad718. [PMID: 38015872 PMCID: PMC10723037 DOI: 10.1093/bioinformatics/btad718] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2023] [Revised: 11/04/2023] [Accepted: 11/27/2023] [Indexed: 11/30/2023] Open
Abstract
MOTIVATION Identifying the functional sites of a protein, such as the binding sites of proteins, peptides, or other biological components, is crucial for understanding related biological processes and drug design. However, existing sequence-based methods have limited predictive accuracy, as they only consider sequence-adjacent contextual features and lack structural information. RESULTS In this study, DeepProSite is presented as a new framework for identifying protein binding site that utilizes protein structure and sequence information. DeepProSite first generates protein structures from ESMFold and sequence representations from pretrained language models. It then uses Graph Transformer and formulates binding site predictions as graph node classifications. In predicting protein-protein/peptide binding sites, DeepProSite outperforms state-of-the-art sequence- and structure-based methods on most metrics. Moreover, DeepProSite maintains its performance when predicting unbound structures, in contrast to competing structure-based prediction methods. DeepProSite is also extended to the prediction of binding sites for nucleic acids and other ligands, verifying its generalization capability. Finally, an online server for predicting multiple types of residue is established as the implementation of the proposed DeepProSite. AVAILABILITY AND IMPLEMENTATION The datasets and source codes can be accessed at https://github.com/WeiLab-Biology/DeepProSite. The proposed DeepProSite can be accessed at https://inner.wei-group.net/DeepProSite/.
Collapse
Affiliation(s)
- Yitian Fang
- State Key Laboratory of Microbial Metabolism, Shanghai-Islamabad-Belgrade Joint Innovation Center on Antibacterial Resistances, Joint International Research Laboratory of Metabolic & Developmental Sciences and School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200040, China
- Peng Cheng Laboratory, Shenzhen 518055, China
| | - Yi Jiang
- Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, OH 43210, USA
| | - Leyi Wei
- School of Software, Shandong University, Jinan, Shandong 250100, China
| | - Qin Ma
- Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, OH 43210, USA
| | | | - Qianmu Yuan
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou 510000, China
| | - Dong-Qing Wei
- State Key Laboratory of Microbial Metabolism, Shanghai-Islamabad-Belgrade Joint Innovation Center on Antibacterial Resistances, Joint International Research Laboratory of Metabolic & Developmental Sciences and School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200040, China
- Peng Cheng Laboratory, Shenzhen 518055, China
| |
Collapse
|
31
|
Kryshtafovych A, Montelione GT, Rigden DJ, Mesdaghi S, Karaca E, Moult J. Breaking the conformational ensemble barrier: Ensemble structure modeling challenges in CASP15. Proteins 2023; 91:1903-1911. [PMID: 37872703 PMCID: PMC10840738 DOI: 10.1002/prot.26584] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2023] [Accepted: 08/14/2023] [Indexed: 10/25/2023]
Abstract
For the first time, the 2022 CASP (Critical Assessment of Structure Prediction) community experiment included a section on computing multiple conformations for protein and RNA structures. There was full or partial success in reproducing the ensembles for four of the nine targets, an encouraging result. For protein structures, enhanced sampling with variations of the AlphaFold2 deep learning method was by far the most effective approach. One substantial conformational change caused by a single mutation across a complex interface was accurately reproduced. In two other assembly modeling cases, methods succeeded in sampling conformations near to the experimental ones even though environmental factors were not included in the calculations. An experimentally derived flexibility ensemble allowed a single accurate RNA structure model to be identified. Difficulties included how to handle sparse or low-resolution experimental data and the current lack of effective methods for modeling RNA/protein complexes. However, these and other obstacles appear addressable.
Collapse
Affiliation(s)
| | - Gaetano T Montelione
- Department of Chemistry and Chemical Biology, Center for Biotechnology and Interdisciplinary Sciences, Rensselaer Polytechnic Institute, Troy, New York, USA
| | - Daniel J Rigden
- Institute of Systems, Molecular, and Integrative Biology, University of Liverpool, Liverpool, UK
| | - Shahram Mesdaghi
- Institute of Systems, Molecular, and Integrative Biology, University of Liverpool, Liverpool, UK
- Computational Biology Facility, MerseyBio, University of Liverpool, Liverpool, UK
| | - Ezgi Karaca
- Izmir Biomedicine and Genome Center, Izmir, Turkey
- Izmir International Biomedicine and Genome Institute, Dokuz Eylul University, Izmir, Turkey
| | - John Moult
- Institute for Bioscience and Biotechnology Research, Rockville, Maryland, USA
- Department of Cell Biology and Molecular Genetics, University of Maryland, College Park, Maryland, USA
| |
Collapse
|
32
|
Studer G, Tauriello G, Schwede T. Assessment of the assessment-All about complexes. Proteins 2023; 91:1850-1860. [PMID: 37858934 DOI: 10.1002/prot.26612] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2023] [Revised: 09/26/2023] [Accepted: 09/29/2023] [Indexed: 10/21/2023]
Abstract
Predicting model quality is a fundamental component of any modeling procedure, and blind assessment of these methods constitutes a crucial aspect of the Critical Assessment of Protein Structure Prediction (CASP) experiment. Historically, the main focus was on assessing methods that predict global and per-residue accuracies in tertiary structure models. This focus shifted with the community's increased efforts in modeling complexes and assemblies. We asked the community to process the models from the CASP15 assembly category and provide estimates of the accuracy of the predicted quaternary structure, both globally and at the local interface level. Besides identifying remarkable accuracy of modeling groups in assessing their own predictions, we set up a benchmarking pipeline to highlight different aspects of quaternary structure models and introduced a simple consensus EMA method as baseline. While participating methods showed commendable performance, the baseline was difficult to surpass. It is important to point out that prediction performance varies for the individual CASP targets, highlighting potential areas of improvement and challenges ahead.
Collapse
Affiliation(s)
- Gabriel Studer
- Biozentrum, University of Basel, Basel, Switzerland
- SIB Swiss Institute of Bioinformatics, Basel, Switzerland
| | - Gerardo Tauriello
- Biozentrum, University of Basel, Basel, Switzerland
- SIB Swiss Institute of Bioinformatics, Basel, Switzerland
| | - Torsten Schwede
- Biozentrum, University of Basel, Basel, Switzerland
- SIB Swiss Institute of Bioinformatics, Basel, Switzerland
| |
Collapse
|
33
|
Roy RS, Liu J, Giri N, Guo Z, Cheng J. Combining pairwise structural similarity and deep learning interface contact prediction to estimate protein complex model accuracy in CASP15. Proteins 2023; 91:1889-1902. [PMID: 37357816 PMCID: PMC10749984 DOI: 10.1002/prot.26542] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2023] [Revised: 06/07/2023] [Accepted: 06/08/2023] [Indexed: 06/27/2023]
Abstract
Estimating the accuracy of quaternary structural models of protein complexes and assemblies (EMA) is important for predicting quaternary structures and applying them to studying protein function and interaction. The pairwise similarity between structural models is proven useful for estimating the quality of protein tertiary structural models, but it has been rarely applied to predicting the quality of quaternary structural models. Moreover, the pairwise similarity approach often fails when many structural models are of low quality and similar to each other. To address the gap, we developed a hybrid method (MULTICOM_qa) combining a pairwise similarity score (PSS) and an interface contact probability score (ICPS) based on the deep learning inter-chain contact prediction for estimating protein complex model accuracy. It blindly participated in the 15th Critical Assessment of Techniques for Protein Structure Prediction (CASP15) in 2022 and performed very well in estimating the global structure accuracy of assembly models. The average per-target correlation coefficient between the model quality scores predicted by MULTICOM_qa and the true quality scores of the models of CASP15 assembly targets is 0.66. The average per-target ranking loss in using the predicted quality scores to rank the models is 0.14. It was able to select good models for most targets. Moreover, several key factors (i.e., target difficulty, model sampling difficulty, skewness of model quality, and similarity between good/bad models) for EMA are identified and analyzed. The results demonstrate that combining the multi-model method (PSS) with the complementary single-model method (ICPS) is a promising approach to EMA.
Collapse
Affiliation(s)
- Raj S. Roy
- Department of Electrical Engineering and Computer Science, NextGen Precision Health, University of Missouri, Columbia, MO 65211, USA
| | - Jian Liu
- Department of Electrical Engineering and Computer Science, NextGen Precision Health, University of Missouri, Columbia, MO 65211, USA
| | - Nabin Giri
- Department of Electrical Engineering and Computer Science, NextGen Precision Health, University of Missouri, Columbia, MO 65211, USA
| | - Zhiye Guo
- Department of Electrical Engineering and Computer Science, NextGen Precision Health, University of Missouri, Columbia, MO 65211, USA
| | - Jianlin Cheng
- Department of Electrical Engineering and Computer Science, NextGen Precision Health, University of Missouri, Columbia, MO 65211, USA
| |
Collapse
|
34
|
Kretsch RC, Andersen ES, Bujnicki JM, Chiu W, Das R, Luo B, Masquida B, McRae EK, Schroeder GM, Su Z, Wedekind JE, Xu L, Zhang K, Zheludev IN, Moult J, Kryshtafovych A. RNA target highlights in CASP15: Evaluation of predicted models by structure providers. Proteins 2023; 91:1600-1615. [PMID: 37466021 PMCID: PMC10792523 DOI: 10.1002/prot.26550] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2023] [Revised: 06/16/2023] [Accepted: 06/26/2023] [Indexed: 07/20/2023]
Abstract
The first RNA category of the Critical Assessment of Techniques for Structure Prediction competition was only made possible because of the scientists who provided experimental structures to challenge the predictors. In this article, these scientists offer a unique and valuable analysis of both the successes and areas for improvement in the predicted models. All 10 RNA-only targets yielded predictions topologically similar to experimentally determined structures. For one target, experimentalists were able to phase their x-ray diffraction data by molecular replacement, showing a potential application of structure predictions for RNA structural biologists. Recommended areas for improvement include: enhancing the accuracy in local interaction predictions and increased consideration of the experimental conditions such as multimerization, structure determination method, and time along folding pathways. The prediction of RNA-protein complexes remains the most significant challenge. Finally, given the intrinsic flexibility of many RNAs, we propose the consideration of ensemble models.
Collapse
Affiliation(s)
- Rachael C. Kretsch
- Biophysics Program, Stanford University School of Medicine, Stanford, CA, USA
| | - Ebbe S. Andersen
- Interdisciplinary Nanoscience Center and Department of Molecular Biology and Genetics, Aarhus University, Aarhus, Denmark
| | - Janusz M. Bujnicki
- International Institute of Molecular and Cell Biology in Warsaw, Warsaw, Poland
| | - Wah Chiu
- Biophysics Program, Stanford University School of Medicine, Stanford, CA, USA
- Department of Bioengineering and James H. Clark Center, Stanford University, Stanford, CA, USA
- Division of CryoEM and Bioimaging, SSRL, SLAC National Accelerator Laboratory, Menlo Park, CA, USA
| | - Rhiju Das
- Biophysics Program, Stanford University School of Medicine, Stanford, CA, USA
- Department of Biochemistry, Stanford University School of Medicine, Stanford, CA, USA
- Howard Hughes Medical Institute, Stanford, CA, USA
| | - Bingnan Luo
- The State Key Laboratory of Biotherapy, Frontiers Medical Center of Tianfu Jincheng Laboratory, Department of Geriatrics and National Clinical Research Center for Geriatrics, West China Hospital, Sichuan University, Chengdu 610044, Sichuan, China
| | - Benoît Masquida
- UMR 7156, CNRS – Universite de Strasbourg, Strasbourg, France
| | - Ewan K.S. McRae
- Center for RNA Therapeutics, Houston Methodist Research Institute, Houston, TX 77030, USA
| | - Griffin M. Schroeder
- Department of Biochemistry and Biophysics, University of Rochester School of Medicine and Dentistry, Rochester, NY, 14642, USA
- Center for RNA Biology, University of Rochester School of Medicine and Dentistry, Rochester, NY, 14642, USA
| | - Zhaoming Su
- The State Key Laboratory of Biotherapy, Frontiers Medical Center of Tianfu Jincheng Laboratory, Department of Geriatrics and National Clinical Research Center for Geriatrics, West China Hospital, Sichuan University, Chengdu 610044, Sichuan, China
| | - Joseph E. Wedekind
- Department of Biochemistry and Biophysics, University of Rochester School of Medicine and Dentistry, Rochester, NY, 14642, USA
- Center for RNA Biology, University of Rochester School of Medicine and Dentistry, Rochester, NY, 14642, USA
| | - Lily Xu
- Department of Microbiology and Immunology, Stanford University School of Medicine, Stanford, CA, USA
| | - Kaiming Zhang
- Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei 230027, China
| | - Ivan N. Zheludev
- Department of Biochemistry, Stanford University School of Medicine, Stanford, CA, USA
| | - John Moult
- Department of Cell Biology and Molecular Genetics, Institute for Bioscience and Biotechnology Research, University of Maryland, Rockville, Maryland, USA
| | | |
Collapse
|
35
|
Kryshtafovych A, Rigden DJ. To split or not to split: CASP15 targets and their processing into tertiary structure evaluation units. Proteins 2023; 91:1558-1570. [PMID: 37254889 PMCID: PMC10687315 DOI: 10.1002/prot.26533] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2023] [Revised: 05/02/2023] [Accepted: 05/18/2023] [Indexed: 06/01/2023]
Abstract
Processing of CASP15 targets into evaluation units (EUs) and assigning them to evolutionary-based prediction classes is presented in this study. The targets were first split into structural domains based on compactness and similarity to other proteins. Models were then evaluated against these domains and their combinations. The domains were joined into larger EUs if predictors' performance on the combined units was similar to that on individual domains. Alternatively, if most predictors performed better on the individual domains, then they were retained as EUs. As a result, 112 evaluation units were created from 77 tertiary structure prediction targets. The EUs were assigned to four prediction classes roughly corresponding to target difficulty categories in previous CASPs: TBM (template-based modeling, easy or hard), FM (free modeling), and the TBM/FM overlap category. More than a third of CASP15 EUs were attributed to the historically most challenging FM class, where homology or structural analogy to proteins of known fold cannot be detected.
Collapse
Affiliation(s)
| | - Daniel J. Rigden
- Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Liverpool L69 7ZB, England
| |
Collapse
|
36
|
Kryshtafovych A, Schwede T, Topf M, Fidelis K, Moult J. Critical assessment of methods of protein structure prediction (CASP)-Round XV. Proteins 2023; 91:1539-1549. [PMID: 37920879 PMCID: PMC10843301 DOI: 10.1002/prot.26617] [Citation(s) in RCA: 13] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2023] [Accepted: 10/06/2023] [Indexed: 11/04/2023]
Abstract
Computing protein structure from amino acid sequence information has been a long-standing grand challenge. Critical assessment of structure prediction (CASP) conducts community experiments aimed at advancing solutions to this and related problems. Experiments are conducted every 2 years. The 2020 experiment (CASP14) saw major progress, with the second generation of deep learning methods delivering accuracy comparable with experiment for many single proteins. There is an expectation that these methods will have much wider application in computational structural biology. Here we summarize results from the most recent experiment, CASP15, in 2022, with an emphasis on new deep learning-driven progress. Other papers in this special issue of proteins provide more detailed analysis. For single protein structures, the AlphaFold2 deep learning method is still superior to other approaches, but there are two points of note. First, although AlphaFold2 was the core of all the most successful methods, there was a wide variety of implementation and combination with other methods. Second, using the standard AlphaFold2 protocol and default parameters only produces the highest quality result for about two thirds of the targets, and more extensive sampling is required for the others. The major advance in this CASP is the enormous increase in the accuracy of computed protein complexes, achieved by the use of deep learning methods, although overall these do not fully match the performance for single proteins. Here too, AlphaFold2 based method perform best, and again more extensive sampling than the defaults is often required. Also of note are the encouraging early results on the use of deep learning to compute ensembles of macromolecular structures. Critically for the usability of computed structures, for both single proteins and protein complexes, deep learning derived estimates of both local and global accuracy are of high quality, however the estimates in interface regions are slightly less reliable. CASP15 also included computation of RNA structures for the first time. Here, the classical approaches produced better agreement with experiment than the new deep learning ones, and accuracy is limited. Also, for the first time, CASP included the computation of protein-ligand complexes, an area of special interest for drug design. Here too, classical methods were still superior to deep learning ones. Many new approaches were discussed at the CASP conference, and it is clear methods will continue to advance.
Collapse
Affiliation(s)
| | - Torsten Schwede
- University of Basel, Biozentrum & SIB Swiss Institute of Bioinformatics, Basel, Switzerland
| | - Maya Topf
- Centre for Structural Systems Biology, Leibniz-Institut für Experimentelle Virologie and Universitätsklinikum Hamburg-Eppendorf (UKE), Hamburg, Germany
| | | | - John Moult
- Institute for Bioscience and Biotechnology Research, Rockville, MD, USA, and Department of Cell Biology and Molecular Genetics, University of Maryland, College Park, MD, USA
| |
Collapse
|
37
|
Kryshtafovych A, Antczak M, Szachniuk M, Zok T, Kretsch RC, Rangan R, Pham P, Das R, Robin X, Studer G, Durairaj J, Eberhardt J, Sweeney A, Topf M, Schwede T, Fidelis K, Moult J. New prediction categories in CASP15. Proteins 2023; 91:1550-1557. [PMID: 37306011 PMCID: PMC10713864 DOI: 10.1002/prot.26515] [Citation(s) in RCA: 20] [Impact Index Per Article: 20.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2023] [Accepted: 05/10/2023] [Indexed: 06/13/2023]
Abstract
Prediction categories in the Critical Assessment of Structure Prediction (CASP) experiments change with the need to address specific problems in structure modeling. In CASP15, four new prediction categories were introduced: RNA structure, ligand-protein complexes, accuracy of oligomeric structures and their interfaces, and ensembles of alternative conformations. This paper lists technical specifications for these categories and describes their integration in the CASP data management system.
Collapse
Affiliation(s)
| | - Maciej Antczak
- Institute of Computing Science, Poznan University of TechnologyPoznanPoland
- Institute of Bioorganic Chemistry, Polish Academy of SciencesPoznanPoland
| | - Marta Szachniuk
- Institute of Computing Science, Poznan University of TechnologyPoznanPoland
- Institute of Bioorganic Chemistry, Polish Academy of SciencesPoznanPoland
| | - Tomasz Zok
- Institute of Computing Science, Poznan University of TechnologyPoznanPoland
- Institute of Bioorganic Chemistry, Polish Academy of SciencesPoznanPoland
| | - Rachael C. Kretsch
- Biophysics Program, Stanford University School of MedicineStanfordCaliforniaUSA
| | - Ramya Rangan
- Biophysics Program, Stanford University School of MedicineStanfordCaliforniaUSA
| | - Phillip Pham
- Biochemistry DepartmentStanford University School of MedicineStanfordCaliforniaUSA
| | - Rhiju Das
- Biochemistry DepartmentStanford University School of MedicineStanfordCaliforniaUSA
- Howard Hughes Medical Institute, Stanford UniversityStanfordCaliforniaUSA
| | - Xavier Robin
- Biozentrum, University of BaselBaselSwitzerland
- SIB Swiss Institute of BioinformaticsBaselSwitzerland
| | - Gabriel Studer
- Biozentrum, University of BaselBaselSwitzerland
- SIB Swiss Institute of BioinformaticsBaselSwitzerland
| | - Janani Durairaj
- Biozentrum, University of BaselBaselSwitzerland
- SIB Swiss Institute of BioinformaticsBaselSwitzerland
| | - Jerome Eberhardt
- Biozentrum, University of BaselBaselSwitzerland
- SIB Swiss Institute of BioinformaticsBaselSwitzerland
| | - Aaron Sweeney
- Centre for Structural Systems Biology (CSSB), Leibniz‐Institut für Virologie (LIV)HamburgGermany
| | - Maya Topf
- Centre for Structural Systems Biology (CSSB), Leibniz‐Institut für Virologie (LIV)HamburgGermany
- Universitätsklinikum Hamburg Eppendorf (UKE)HamburgGermany
| | - Torsten Schwede
- Biozentrum, University of BaselBaselSwitzerland
- SIB Swiss Institute of BioinformaticsBaselSwitzerland
| | | | - John Moult
- Institute for Bioscience and Biotechnology Research, Department of Cell Biology and Molecular genetics, University of MarylandRockvilleMarylandUSA
| |
Collapse
|
38
|
Huang GJ, Parry TK, McLaughlin WA. Assessment of the Performances of the Protein Modeling Techniques Participating in CASP15 Using a Structure-Based Functional Site Prediction Approach: ResiRole. Bioengineering (Basel) 2023; 10:1377. [PMID: 38135968 PMCID: PMC10740689 DOI: 10.3390/bioengineering10121377] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2023] [Revised: 11/27/2023] [Accepted: 11/28/2023] [Indexed: 12/24/2023] Open
Abstract
BACKGROUND Model quality assessments via computational methods which entail comparisons of the modeled structures to the experimentally determined structures are essential in the field of protein structure prediction. The assessments provide means to benchmark the accuracies of the modeling techniques and to aid with their development. We previously described the ResiRole method to gauge model quality principally based on the preservation of the structural characteristics described in SeqFEATURE functional site prediction models. METHODS We apply ResiRole to benchmark modeling group performances in the Critical Assessment of Structure Prediction experiment, round 15. To gauge model quality, a normalized Predicted Functional site Similarity Score (PFSS) was calculated as the average of one minus the absolute values of the differences of the functional site prediction probabilities, as found for the experimental structures versus those found at the corresponding sites in the structure models. RESULTS The average PFSS per modeling group (gPFSS) correlates with standard quality metrics, and can effectively be used to rank the accuracies of the groups. For the free modeling (FM) category, correlation coefficients of the Local Distance Difference Test (LDDT) and Global Distance Test-Total Score (GDT-TS) metrics with gPFSS were 0.98239 and 0.87691, respectively. An example finding for a specific group is that the gPFSS for EMBER3D was higher than expected based on the predictive relationship between gPFSS and LDDT. We infer the result is due to the use of constraints imprinted by function that are a part of the EMBER3D methodology. Also, we find functional site predictions that may guide further functional characterizations of the respective proteins. CONCLUSION The gPFSS metric provides an effective means to assess and rank the performances of the structure prediction techniques according to their abilities to accurately recount the structural features at predicted functional sites.
Collapse
Affiliation(s)
| | | | - William A. McLaughlin
- Department of Medical Education, Geisinger Commonwealth School of Medicine, 525 Pine Street, Scranton, PA 18509, USA (T.K.P.)
| |
Collapse
|
39
|
Harmalkar A, Lyskov S, Gray JJ. Reliable protein-protein docking with AlphaFold, Rosetta and replica-exchange. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.07.28.551063. [PMID: 37546760 PMCID: PMC10402144 DOI: 10.1101/2023.07.28.551063] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/08/2023]
Abstract
Despite the recent breakthrough of AlphaFold (AF) in the field of protein sequence-to-structure prediction, modeling protein interfaces and predicting protein complex structures remains challenging, especially when there is a significant conformational change in one or both binding partners. Prior studies have demonstrated that AF-multimer (AFm) can predict accurate protein complexes in only up to 43% of cases. In this work, we combine AlphaFold as a structural template generator with a physics-based replica exchange docking algorithm. Using a curated collection of 254 available protein targets with both unbound and bound structures, we first demonstrate that AlphaFold confidence measures (pLDDT) can be repurposed for estimating protein flexibility and docking accuracy for multimers. We incorporate these metrics within our ReplicaDock 2.0 protocol to complete a robust in-silico pipeline for accurate protein complex structure prediction. AlphaRED (AlphaFold-initiated Replica Exchange Docking) successfully docks failed AF predictions including 97 failure cases in Docking Benchmark Set 5.5. AlphaRED generates CAPRI acceptable-quality or better predictions for 66% of benchmark targets. Further, on a subset of antigen-antibody targets, which is challenging for AFm (19% success rate), AlphaRED demonstrates a success rate of 51%. This new strategy demonstrates the success possible by integrating deep-learning based architectures trained on evolutionary information with physics-based enhanced sampling. The pipeline is available at github.com/Graylab/AlphaRED.
Collapse
|
40
|
McBride JM, Polev K, Abdirasulov A, Reinharz V, Grzybowski BA, Tlusty T. AlphaFold2 Can Predict Single-Mutation Effects. PHYSICAL REVIEW LETTERS 2023; 131:218401. [PMID: 38072605 DOI: 10.1103/physrevlett.131.218401] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/03/2023] [Accepted: 09/26/2023] [Indexed: 12/18/2023]
Abstract
AlphaFold2 (AF) is a promising tool, but is it accurate enough to predict single mutation effects? Here, we report that the localized structural deformation between protein pairs differing by only 1-3 mutations-as measured by the effective strain-is correlated across 3901 experimental and AF-predicted structures. Furthermore, analysis of ∼11 000 proteins shows that the local structural change correlates with various phenotypic changes. These findings suggest that AF can predict the range and magnitude of single-mutation effects on average, and we propose a method to improve precision of AF predictions and to indicate when predictions are unreliable.
Collapse
Affiliation(s)
- John M McBride
- Center for Soft and Living Matter, Institute for Basic Science, Ulsan 44919, South Korea
| | - Konstantin Polev
- Center for Soft and Living Matter, Institute for Basic Science, Ulsan 44919, South Korea
- Department of Biomedical Engineering, Ulsan National Institute of Science and Technology, Ulsan 44919, South Korea
| | - Amirbek Abdirasulov
- Department of Computer Science and Engineering, Ulsan National Institute of Science and Technology, Ulsan 44919, South Korea
| | | | - Bartosz A Grzybowski
- Center for Soft and Living Matter, Institute for Basic Science, Ulsan 44919, South Korea
- Departments of Physics and Chemistry, Ulsan National Institute of Science and Technology, Ulsan 44919, South Korea
| | - Tsvi Tlusty
- Center for Soft and Living Matter, Institute for Basic Science, Ulsan 44919, South Korea
- Departments of Physics and Chemistry, Ulsan National Institute of Science and Technology, Ulsan 44919, South Korea
| |
Collapse
|
41
|
Raisinghani N, Alshahrani M, Gupta G, Xiao S, Tao P, Verkhivker G. Accurate Characterization of Conformational Ensembles and Binding Mechanisms of the SARS-CoV-2 Omicron BA.2 and BA.2.86 Spike Protein with the Host Receptor and Distinct Classes of Antibodies Using AlphaFold2-Augmented Integrative Computational Modeling. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.11.18.567697. [PMID: 38045395 PMCID: PMC10690158 DOI: 10.1101/2023.11.18.567697] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/05/2023]
Abstract
The latest wave SARS-CoV-2 Omicron variants displayed a growth advantage and the increased viral fitness through convergent evolution of functional hotspots that work synchronously to balance fitness requirements for productive receptor binding and efficient immune evasion. In this study, we combined AlphaFold2-based structural modeling approaches with all-atom MD simulations and mutational profiling of binding energetics and stability for prediction and comprehensive analysis of the structure, dynamics, and binding of the SARS-CoV-2 Omicron BA.2.86 spike variant with ACE2 host receptor and distinct classes of antibodies. We adapted several AlphaFold2 approaches to predict both structure and conformational ensembles of the Omicron BA.2.86 spike protein in the complex with the host receptor. The results showed that AlphaFold2-predicted conformational ensemble of the BA.2.86 spike protein complex can accurately capture the main dynamics signatures obtained from microscond molecular dynamics simulations. The ensemble-based dynamic mutational scanning of the receptor binding domain residues in the BA.2 and BA.2.86 spike complexes with ACE2 dissected the role of the BA.2 and BA.2.86 backgrounds in modulating binding free energy changes revealing a group of conserved hydrophobic hotspots and critical variant-specific contributions of the BA.2.86 mutational sites R403K, F486P and R493Q. To examine immune evasion properties of BA.2.86 in atomistic detail, we performed large scale structure-based mutational profiling of the S protein binding interfaces with distinct classes of antibodies that displayed significantly reduced neutralization against BA.2.86 variant. The results quantified specific function of the BA.2.86 mutations to ensure broad resistance against different classes of RBD antibodies. This study revealed the molecular basis of compensatory functional effects of the binding hotspots, showing that BA.2.86 lineage may have primarily evolved to improve immune escape while modulating binding affinity with ACE2 through cooperative effect of R403K, F486P and R493Q mutations. The study supports a hypothesis that the impact of the increased ACE2 binding affinity on viral fitness is more universal and is mediated through cross-talk between convergent mutational hotspots, while the effect of immune evasion could be more variant-dependent.
Collapse
|
42
|
Polonsky K, Pupko T, Freund NT. Evaluation of the Ability of AlphaFold to Predict the Three-Dimensional Structures of Antibodies and Epitopes. JOURNAL OF IMMUNOLOGY (BALTIMORE, MD. : 1950) 2023; 211:1578-1588. [PMID: 37782047 DOI: 10.4049/jimmunol.2300150] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/03/2023] [Accepted: 09/06/2023] [Indexed: 10/03/2023]
Abstract
Being able to accurately predict the three-dimensional structure of an Ab can facilitate Ab characterization and epitope prediction, with important diagnostic and clinical implications. In this study, we evaluated the ability of AlphaFold to predict the structures of 222 recently published, high-resolution Fab H and L chain structures of Abs from different species directed against different Ags. We show that although the overall Ab prediction quality is in line with the results of CASP14, regions such as the complementarity-determining regions (CDRs) of the H chain, which are prone to higher variation, are predicted less accurately. Moreover, we discovered that AlphaFold mispredicts the bending angles between the variable and constant domains. To evaluate the ability of AlphaFold to model Ab-Ag interactions based only on sequence, we used AlphaFold-Multimer in combination with ZDOCK to predict the structures of 26 known Ab-Ag complexes. ZDOCK, which was applied on bound components of both the Ab and the Ag, succeeded in assembling 11 complexes, whereas AlphaFold succeeded in predicting only 2 of 26 models, with significant deviations in the docking contacts predicted in the rest of the molecules. Within the 11 complexes that were successfully predicted by ZDOCK, 9 involved short-peptide Ags (18-mer or less), whereas only 2 were complexes of Ab with a full-length protein. Docking of modeled unbound Ab and Ag was unsuccessful. In summary, our study provides important information about the abilities and limitations of using AlphaFold to predict Ab-Ag interactions and suggests areas for possible improvement.
Collapse
Affiliation(s)
- Ksenia Polonsky
- Department of Clinical Microbiology and Immunology, Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel
- Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv, Israel
| | - Tal Pupko
- Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv, Israel
| | - Natalia T Freund
- Department of Clinical Microbiology and Immunology, Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel
| |
Collapse
|
43
|
Gil Zuluaga FH, D’Arminio N, Bardozzo F, Tagliaferri R, Marabotti A. An automated pipeline integrating AlphaFold 2 and MODELLER for protein structure prediction. Comput Struct Biotechnol J 2023; 21:5620-5629. [PMID: 38047234 PMCID: PMC10690423 DOI: 10.1016/j.csbj.2023.10.056] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2023] [Revised: 10/31/2023] [Accepted: 10/31/2023] [Indexed: 12/05/2023] Open
Abstract
The ability to predict a protein's three-dimensional conformation represents a crucial starting point for investigating evolutionary connections with other members of the corresponding protein family, examining interactions with other proteins, and potentially utilizing this knowledge for the purpose of rational drug design. In this work, we evaluated the feasibility of improving AlphaFold2's three-dimensional protein predictions by developing a novel pipeline (AlphaMod) that incorporates AlphaFold2 with MODELLER, a template-based modeling program. Additionally, our tool can drive a comprehensive quality assessment of the tertiary protein structure by incorporating and comparing a set of different quality assessment tools. The outcomes of selected tools are combined into a composite score (BORDASCORE) that exhibits a meaningful correlation with GDT_TS and facilitates the selection of optimal models in the absence of a reference structure. To validate AlphaMod's results, we conducted evaluations using two distinct datasets summing up to 72 targets, previously used to independently assess AlphaFold2's performance. The generated models underwent evaluation through two methods: i) averaging the GDT_TS scores across all produced structures for a single target sequence, and ii) a pairwise comparison of the best structures generated by AlphaFold2 and AlphaMod. The latter, within the unsupervised setups, shows a rising accuracy of approximately 34% over AlphaFold2. While, when considering the supervised setup, AlphaMod surpasses AlphaFold2 in 18% of the instances. Finally, there is an 11% correspondence in outcomes between the diverse methodologies. Consequently, AlphaMod's best-predicted tertiary structures in several cases exhibited a significant improvement in the accuracy of the predictions with respect to the best models obtained by AlphaFold2. This pipeline paves the way for the integration of additional data and AI-based algorithms to further improve the reliability of the predictions.
Collapse
Affiliation(s)
- Fabio Hernan Gil Zuluaga
- Department of Management & Innovation Systems, University of Salerno, Via Giovanni Paolo II, 132, 84084 Fisciano, SA, Italy
| | - Nancy D’Arminio
- Department of Chemistry and Biology “A. Zambelli”, University of Salerno, Via Giovanni Paolo II, 132, 84084 Fisciano, SA, Italy
| | - Francesco Bardozzo
- Department of Management & Innovation Systems, University of Salerno, Via Giovanni Paolo II, 132, 84084 Fisciano, SA, Italy
| | - Roberto Tagliaferri
- Department of Management & Innovation Systems, University of Salerno, Via Giovanni Paolo II, 132, 84084 Fisciano, SA, Italy
| | - Anna Marabotti
- Department of Chemistry and Biology “A. Zambelli”, University of Salerno, Via Giovanni Paolo II, 132, 84084 Fisciano, SA, Italy
| |
Collapse
|
44
|
Mappin F, Bellantuono AJ, Ebrahimi B, DeGennaro M. Odor-evoked transcriptomics of Aedes aegypti mosquitoes. PLoS One 2023; 18:e0293018. [PMID: 37874813 PMCID: PMC10597520 DOI: 10.1371/journal.pone.0293018] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2023] [Accepted: 10/03/2023] [Indexed: 10/26/2023] Open
Abstract
Modulation of odorant receptors mRNA induced by prolonged odor exposure is highly correlated with ligand-receptor interactions in Drosophila as well as mammals of the Muridae family. If this response feature is conserved in other organisms, this presents an intriguing initial screening tool when searching for novel receptor-ligand interactions in species with predominantly orphan olfactory receptors. We demonstrate that mRNA modulation in response to 1-octen-3-ol odor exposure occurs in a time- and concentration-dependent manner in Aedes aegypti mosquitoes. To investigate gene expression patterns at a global level, we generated an odor-evoked transcriptome associated with 1-octen-3-ol odor exposure. Transcriptomic data revealed that ORs and OBPs were transcriptionally responsive whereas other chemosensory gene families showed little to no differential expression. Alongside chemosensory gene expression changes, transcriptomic analysis found that prolonged exposure to 1-octen-3-ol modulated xenobiotic response genes, primarily members of the cytochrome P450, insect cuticle proteins, and glucuronosyltransferases families. Together, these findings suggest that mRNA transcriptional modulation of olfactory receptors caused by prolonged odor exposure is pervasive across taxa and can be accompanied by the activation of xenobiotic responses.
Collapse
Affiliation(s)
- Fredis Mappin
- Department of Biological Sciences & Biomolecular Sciences Institute, Florida International University, Miami, Florida, United States of America
| | - Anthony J. Bellantuono
- Department of Biological Sciences & Biomolecular Sciences Institute, Florida International University, Miami, Florida, United States of America
| | - Babak Ebrahimi
- Department of Biological Sciences & Biomolecular Sciences Institute, Florida International University, Miami, Florida, United States of America
| | - Matthew DeGennaro
- Department of Biological Sciences & Biomolecular Sciences Institute, Florida International University, Miami, Florida, United States of America
| |
Collapse
|
45
|
Varadi M, Tsenkov M, Velankar S. Challenges in bridging the gap between protein structure prediction and functional interpretation. Proteins 2023. [PMID: 37850517 DOI: 10.1002/prot.26614] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2023] [Revised: 09/26/2023] [Accepted: 10/04/2023] [Indexed: 10/19/2023]
Abstract
The rapid evolution of protein structure prediction tools has significantly broadened access to protein structural data. Although predicted structure models have the potential to accelerate and impact fundamental and translational research significantly, it is essential to note that they are not validated and cannot be considered the ground truth. Thus, challenges persist, particularly in capturing protein dynamics, predicting multi-chain structures, interpreting protein function, and assessing model quality. Interdisciplinary collaborations are crucial to overcoming these obstacles. Databases like the AlphaFold Protein Structure Database, the ESM Metagenomic Atlas, and initiatives like the 3D-Beacons Network provide FAIR access to these data, enabling their interpretation and application across a broader scientific community. Whilst substantial advancements have been made in protein structure prediction, further progress is required to address the remaining challenges. Developing training materials, nurturing collaborations, and ensuring open data sharing will be paramount in this pursuit. The continued evolution of these tools and methodologies will deepen our understanding of protein function and accelerate disease pathogenesis and drug development discoveries.
Collapse
Affiliation(s)
- Mihaly Varadi
- Protein Data Bank in Europe, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Maxim Tsenkov
- Protein Data Bank in Europe, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Sameer Velankar
- Protein Data Bank in Europe, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, UK
| |
Collapse
|
46
|
Roy S, Ben-Hur A. Protein quality assessment with a loss function designed for high-quality decoys. FRONTIERS IN BIOINFORMATICS 2023; 3:1198218. [PMID: 37915563 PMCID: PMC10616882 DOI: 10.3389/fbinf.2023.1198218] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2023] [Accepted: 09/29/2023] [Indexed: 11/03/2023] Open
Abstract
Motivation: The prediction of a protein 3D structure is essential for understanding protein function, drug discovery, and disease mechanisms; with the advent of methods like AlphaFold that are capable of producing very high-quality decoys, ensuring the quality of those decoys can provide further confidence in the accuracy of their predictions. Results: In this work, we describe Qϵ, a graph convolutional network (GCN) that utilizes a minimal set of atom and residue features as inputs to predict the global distance test total score (GDTTS) and local distance difference test (lDDT) score of a decoy. To improve the model's performance, we introduce a novel loss function based on the ϵ-insensitive loss function used for SVM regression. This loss function is specifically designed for evaluating the characteristics of the quality assessment problem and provides predictions with improved accuracy over standard loss functions used for this task. Despite using only a minimal set of features, it matches the performance of recent state-of-the-art methods like DeepUMQA. Availability: The code for Qϵ is available at https://github.com/soumyadip1997/qepsilon.
Collapse
Affiliation(s)
| | - Asa Ben-Hur
- Department of Computer Science, Colorado State University, Fort Collins, CO, United States
| |
Collapse
|
47
|
Mahmud S, Morehead A, Cheng J. Accurate prediction of protein tertiary structural changes induced by single-site mutations with equivariant graph neural networks. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.10.03.560758. [PMID: 37873289 PMCID: PMC10592624 DOI: 10.1101/2023.10.03.560758] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/25/2023]
Abstract
Predicting the change of protein tertiary structure caused by singlesite mutations is important for studying protein structure, function, and interaction. Even though computational protein structure prediction methods such as AlphaFold can predict the overall tertiary structures of most proteins rather accurately, they are not sensitive enough to accurately predict the structural changes induced by single-site amino acid mutations on proteins. Specialized mutation prediction methods mostly focus on predicting the overall stability or function changes caused by mutations without attempting to predict the exact mutation-induced structural changes, limiting their use in protein mutation study. In this work, we develop the first deep learning method based on equivariant graph neural networks (EGNN) to directly predict the tertiary structural changes caused by single-site mutations and the tertiary structure of any protein mutant from the structure of its wild-type counterpart. The results show that it performs substantially better in predicting the tertiary structures of protein mutants than the widely used protein structure prediction method AlphaFold.
Collapse
|
48
|
Das R, Kretsch RC, Simpkin AJ, Mulvaney T, Pham P, Rangan R, Bu F, Keegan RM, Topf M, Rigden DJ, Miao Z, Westhof E. Assessment of three-dimensional RNA structure prediction in CASP15. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.04.25.538330. [PMID: 37162955 PMCID: PMC10168427 DOI: 10.1101/2023.04.25.538330] [Citation(s) in RCA: 10] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/11/2023]
Abstract
The prediction of RNA three-dimensional structures remains an unsolved problem. Here, we report assessments of RNA structure predictions in CASP15, the first CASP exercise that involved RNA structure modeling. Forty two predictor groups submitted models for at least one of twelve RNA-containing targets. These models were evaluated by the RNA-Puzzles organizers and, separately, by a CASP-recruited team using metrics (GDT, lDDT) and approaches (Z-score rankings) initially developed for assessment of proteins and generalized here for RNA assessment. The two assessments independently ranked the same predictor groups as first (AIchemy_RNA2), second (Chen), and third (RNAPolis and GeneSilico, tied); predictions from deep learning approaches were significantly worse than these top ranked groups, which did not use deep learning. Further analyses based on direct comparison of predicted models to cryogenic electron microscopy (cryo-EM) maps and X-ray diffraction data support these rankings. With the exception of two RNA-protein complexes, models submitted by CASP15 groups correctly predicted the global fold of the RNA targets. Comparisons of CASP15 submissions to designed RNA nanostructures as well as molecular replacement trials highlight the potential utility of current RNA modeling approaches for RNA nanotechnology and structural biology, respectively. Nevertheless, challenges remain in modeling fine details such as non-canonical pairs, in ranking among submitted models, and in prediction of multiple structures resolved by cryo-EM or crystallography.
Collapse
Affiliation(s)
- Rhiju Das
- Department of Biochemistry, Stanford University School of Medicine, CA USA
- Biophysics Program, Stanford University School of Medicine, CA USA
- Howard Hughes Medical Institute, Stanford University, CA USA
| | | | - Adam J. Simpkin
- Institute of Systems, Molecular & Integrative Biology, The University of Liverpool, UK
| | - Thomas Mulvaney
- Centre for Structural Systems Biology (CSSB), Leibniz-Institut für Virologie (LIV)
- University Medical Center Hamburg-Eppendorf (UKE), Hamburg, Germany
| | - Phillip Pham
- Department of Biochemistry, Stanford University School of Medicine, CA USA
| | - Ramya Rangan
- Biophysics Program, Stanford University School of Medicine, CA USA
| | - Fan Bu
- Guangzhou Laboratory, Guangzhou International Bio Island, Guangzhou 510005, China
- Division of Life Sciences and Medicine,University of Science and Technology of China, Hefei 230036, Anhui, China
| | - Ronan M. Keegan
- Institute of Systems, Molecular & Integrative Biology, The University of Liverpool, UK
- Life Science, Diamond Light Source, Harwell Science, UK
| | - Maya Topf
- Centre for Structural Systems Biology (CSSB), Leibniz-Institut für Virologie (LIV)
- University Medical Center Hamburg-Eppendorf (UKE), Hamburg, Germany
| | - Daniel J. Rigden
- Institute of Systems, Molecular & Integrative Biology, The University of Liverpool, UK
| | - Zhichao Miao
- GMU-GIBH Joint School of Life Sciences, The Guangdong-Hong Kong-Macau Joint Laboratory for Cell Fate Regulation and Diseases, Guangzhou National Laboratory, Guangzhou Medical University
- Shanghai Key Laboratory of Anesthesiology and Brain Functional Modulation, Clinical Research Center for Anesthesiology and Perioperative Medicine, Translational Research Institute of Brain and Brain-Like Intelligence, Shanghai Fourth People’s Hospital, School of Medicine, Tongji University, Shanghai 200434, China
| | - Eric Westhof
- Architecture et Réactivité de l’ARN, Institut de Biologie Moléculaire et Cellulaire du CNRS, Université de Strasbourg, F-67084, Strasbourg, France
| |
Collapse
|
49
|
Majewski M, Pérez A, Thölke P, Doerr S, Charron NE, Giorgino T, Husic BE, Clementi C, Noé F, De Fabritiis G. Machine learning coarse-grained potentials of protein thermodynamics. Nat Commun 2023; 14:5739. [PMID: 37714883 PMCID: PMC10504246 DOI: 10.1038/s41467-023-41343-1] [Citation(s) in RCA: 19] [Impact Index Per Article: 19.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2023] [Accepted: 08/29/2023] [Indexed: 09/17/2023] Open
Abstract
A generalized understanding of protein dynamics is an unsolved scientific problem, the solution of which is critical to the interpretation of the structure-function relationships that govern essential biological processes. Here, we approach this problem by constructing coarse-grained molecular potentials based on artificial neural networks and grounded in statistical mechanics. For training, we build a unique dataset of unbiased all-atom molecular dynamics simulations of approximately 9 ms for twelve different proteins with multiple secondary structure arrangements. The coarse-grained models are capable of accelerating the dynamics by more than three orders of magnitude while preserving the thermodynamics of the systems. Coarse-grained simulations identify relevant structural states in the ensemble with comparable energetics to the all-atom systems. Furthermore, we show that a single coarse-grained potential can integrate all twelve proteins and can capture experimental structural features of mutated proteins. These results indicate that machine learning coarse-grained potentials could provide a feasible approach to simulate and understand protein dynamics.
Collapse
Affiliation(s)
- Maciej Majewski
- Computational Science Laboratory, Universitat Pompeu Fabra, Barcelona Biomedical Research Park (PRBB), Carrer Dr. Aiguader 88, 08003, Barcelona, Spain
- Acellera Labs, Doctor Trueta 183, 08005, Barcelona, Spain
| | - Adrià Pérez
- Computational Science Laboratory, Universitat Pompeu Fabra, Barcelona Biomedical Research Park (PRBB), Carrer Dr. Aiguader 88, 08003, Barcelona, Spain
- Acellera Labs, Doctor Trueta 183, 08005, Barcelona, Spain
| | - Philipp Thölke
- Computational Science Laboratory, Universitat Pompeu Fabra, Barcelona Biomedical Research Park (PRBB), Carrer Dr. Aiguader 88, 08003, Barcelona, Spain
| | - Stefan Doerr
- Acellera Labs, Doctor Trueta 183, 08005, Barcelona, Spain
| | - Nicholas E Charron
- Department of Physics, Rice University, Houston, TX, 77005, USA
- Center for Theoretical Biological Physics, Rice University, Houston, TX, 77005, USA
- Department of Physics, FU Berlin, Arnimallee 12, 14195, Berlin, Germany
| | - Toni Giorgino
- Biophysics Institute, National Research Council (CNR-IBF), 20133, Milan, Italy
| | - Brooke E Husic
- Department of Mathematics and Computer Science, FU Berlin, Arnimallee 12, 14195, Berlin, Germany
- Lewis Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ, 08540, USA
- Princeton Center for Theoretical Science, Princeton University, Princeton, NJ, 08540, USA
- Center for the Physics of Biological Function, Princeton University, Princeton, NJ, 08540, USA
| | - Cecilia Clementi
- Department of Physics, Rice University, Houston, TX, 77005, USA.
- Center for Theoretical Biological Physics, Rice University, Houston, TX, 77005, USA.
- Department of Physics, FU Berlin, Arnimallee 12, 14195, Berlin, Germany.
- Department of Chemistry, Rice University, Houston, TX, 77005, USA.
| | - Frank Noé
- Department of Physics, FU Berlin, Arnimallee 12, 14195, Berlin, Germany.
- Department of Mathematics and Computer Science, FU Berlin, Arnimallee 12, 14195, Berlin, Germany.
- Department of Chemistry, Rice University, Houston, TX, 77005, USA.
- Microsoft Research AI4Science, Karl-Liebknecht Str. 32, 10178, Berlin, Germany.
| | - Gianni De Fabritiis
- Computational Science Laboratory, Universitat Pompeu Fabra, Barcelona Biomedical Research Park (PRBB), Carrer Dr. Aiguader 88, 08003, Barcelona, Spain.
- Acellera Labs, Doctor Trueta 183, 08005, Barcelona, Spain.
- Institució Catalana de Recerca i Estudis Avançats (ICREA), Passeig Lluis Companys 23, 08010, Barcelona, Spain.
| |
Collapse
|
50
|
Ho C, Nazarie WFWM, Lee PC. An In Silico Design of Peptides Targeting the S1/S2 Cleavage Site of the SARS-CoV-2 Spike Protein. Viruses 2023; 15:1930. [PMID: 37766336 PMCID: PMC10536081 DOI: 10.3390/v15091930] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2023] [Accepted: 08/23/2023] [Indexed: 09/29/2023] Open
Abstract
SARS-CoV-2, responsible for the COVID-19 pandemic, invades host cells via its spike protein, which includes critical binding regions, such as the receptor-binding domain (RBD), the S1/S2 cleavage site, the S2 cleavage site, and heptad-repeat (HR) sections. Peptides targeting the RBD and HR1 inhibit binding to host ACE2 receptors and the formation of the fusion core. Other peptides target proteases, such as TMPRSS2 and cathepsin L, to prevent the cleavage of the S protein. However, research has largely ignored peptides targeting the S1/S2 cleavage site. In this study, bioinformatics was used to investigate the binding of the S1/S2 cleavage site to host proteases, including furin, trypsin, TMPRSS2, matriptase, cathepsin B, and cathepsin L. Peptides targeting the S1/S2 site were designed by identifying binding residues. Peptides were docked to the S1/S2 site using HADDOCK (High-Ambiguity-Driven protein-protein DOCKing). Nine peptides with the lowest HADDOCK scores and strong binding affinities were selected, which was followed by molecular dynamics simulations (MDSs) for further investigation. Among these peptides, BR582 and BR599 stand out. They exhibited relatively high interaction energies with the S protein at -1004.769 ± 21.2 kJ/mol and -1040.334 ± 24.1 kJ/mol, respectively. It is noteworthy that the binding of these peptides to the S protein remained stable during the MDSs. In conclusion, this research highlights the potential of peptides targeting the S1/S2 cleavage site as a means to prevent SARS-CoV-2 from entering cells, and contributes to the development of therapeutic interventions against COVID-19.
Collapse
Affiliation(s)
- Chian Ho
- Faculty of Science and Natural Resources, Universiti Malaysia Sabah, Kota Kinabalu 88400, Sabah, Malaysia; (C.H.); (W.F.W.M.N.)
| | - Wan Fahmi Wan Mohamad Nazarie
- Faculty of Science and Natural Resources, Universiti Malaysia Sabah, Kota Kinabalu 88400, Sabah, Malaysia; (C.H.); (W.F.W.M.N.)
| | - Ping-Chin Lee
- Faculty of Science and Natural Resources, Universiti Malaysia Sabah, Kota Kinabalu 88400, Sabah, Malaysia; (C.H.); (W.F.W.M.N.)
- Biotechnology Research Institute, Universiti Malaysia Sabah, Kota Kinabalu 88400, Sabah, Malaysia
| |
Collapse
|