1
|
Pang J, Xu D, Zhang X, Qu J, Jiang J, Suo J, Li T, Li Y, Peng Z. TIMP2-mediated mitochondrial fragmentation and glycolytic reprogramming drive renal fibrogenesis following ischemia-reperfusion injury. Free Radic Biol Med 2025; 232:244-259. [PMID: 39986488 DOI: 10.1016/j.freeradbiomed.2025.02.020] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/11/2024] [Revised: 02/11/2025] [Accepted: 02/14/2025] [Indexed: 02/24/2025]
Abstract
Acute kidney injury (AKI) triggers renal structural and functional abnormalities through inflammatory and fibrotic signaling pathways, ultimately progressing to chronic kidney disease (CKD). The mechanisms underlying AKI-to-CKD transition are complex, with hypoxia, mitochondrial dysfunction, and metabolic reprogramming as critical contributors. Public data analysis demonstrated significant upregulation of tissue inhibitors of metalloproteinases (Timp2) in renal biopsy tissues of CKD patients. In both ischemia/reperfusion (I/R) and unilateral ureteral obstruction (UUO) models, Timp2 upregulation was observed. Tubule-specific Timp2 knockout markedly attenuated renal fibrosis. RNA-sequencing revealed Timp2's association with mitochondrial dynamics and glycolysis in I/R mice. Timp2 deletion improved mitochondrial morphology and suppressed glycolytic enzyme expression. In vitro, TGF-β1-treated Timp2-knockdown HK-2 cells exhibited inhibited Drp1 expression, restored Mfn2 levels, alleviated mitochondrial fragmentation, and elevated mitochondrial membrane potential. Additionally, Pfkfb3 and HIF-1α were downregulated, accompanied by reduced extracellular acidification rate (ECAR), PFK activity, and lactate production. Mechanistically, Timp2 interacts with the extracellular domain of Sdc4 in an autocrine manner, activating the Hedgehog (Hh) signaling pathway. Cyclopamine partially rescued Timp2 overexpression-induced mitochondrial dysfunction, suppressed Pfkfb3-mediated glycolysis, and diminished collagen deposition. This study is the first to demonstrate that Timp2 in TECs exacerbates Hh signaling, promoting mitochondrial fragmentation and metabolic reprogramming to accelerate I/R-induced renal fibrosis.
Collapse
Affiliation(s)
- Jingjing Pang
- Department of Critical Care Medicine, Zhongnan Hospital of Wuhan University, Wuhan, China; Clinical Research Center of Hubei Critical Care Medicine, Wuhan, China
| | - Dongxue Xu
- Department of Critical Care Medicine, Zhongnan Hospital of Wuhan University, Wuhan, China; Clinical Research Center of Hubei Critical Care Medicine, Wuhan, China.
| | - Xiaoyu Zhang
- Department of Critical Care Medicine, Zhongnan Hospital of Wuhan University, Wuhan, China; Clinical Research Center of Hubei Critical Care Medicine, Wuhan, China
| | - Jiacheng Qu
- Department of Critical Care Medicine, Zhongnan Hospital of Wuhan University, Wuhan, China; Clinical Research Center of Hubei Critical Care Medicine, Wuhan, China
| | - Jun Jiang
- Department of Critical Care Medicine, Zhongnan Hospital of Wuhan University, Wuhan, China; Clinical Research Center of Hubei Critical Care Medicine, Wuhan, China
| | - Jinmeng Suo
- Department of Critical Care Medicine, Zhongnan Hospital of Wuhan University, Wuhan, China; Clinical Research Center of Hubei Critical Care Medicine, Wuhan, China
| | - Tianlong Li
- Department of Critical Care Medicine, Zhongnan Hospital of Wuhan University, Wuhan, China; Clinical Research Center of Hubei Critical Care Medicine, Wuhan, China
| | - Yiming Li
- Department of Critical Care Medicine, Zhongnan Hospital of Wuhan University, Wuhan, China; Clinical Research Center of Hubei Critical Care Medicine, Wuhan, China.
| | - Zhiyong Peng
- Clinical Research Center of Hubei Critical Care Medicine, Wuhan, China; Department of Critical Care Medicine, University of Pittsburgh, Pittsburgh, PA, USA; Intensive Care Unit of the Second Affiliated Hospital of Hainan Medical College, Haikou, Hainan, China.
| |
Collapse
|
2
|
Wong K, Subramanian I, Stevens E, Chakraborty S. Unveiling Interaction Signatures Across Viral Pathogens through VASCO: Viral Antigen-Antibody Structural COmplex dataset. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2025.03.11.642737. [PMID: 40161627 PMCID: PMC11952437 DOI: 10.1101/2025.03.11.642737] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 04/02/2025]
Abstract
Viral antigen-antibody (Ag-Ab) interactions shape immune responses, drive pathogen neutralization, and inform vaccine strategies. Understanding their structural basis is crucial for predicting immune recognition, optimizing immunogen design to induce broadly neutralizing antibodies (bnAbs), and developing antiviral therapeutics. However, curated structural benchmarks for viral Ag-Ab interactions remain scarce. To address this, we present VASCO (Viral Antibody-antigen Structural COmplex dataset), a high-resolution, non-redundant collection of ~1225 viral Ag-Ab complexes sourced from the Protein Data Bank (PDB) and refined via energy minimization. Spanning Coronaviruses, Influenza, Ebola, HIV, and others, VASCO provides a comprehensive structural reference for viral immune recognition. By comparing VASCO against general protein-protein interactions (GPPI), we identify distinct sequence and structural features that define viral Ag-Ab binding. While conventional descriptors show broad similarities across datasets, deeper analyses reveal key sequence-space interactions, secondary structure preferences, and manifold-derived latent features that distinguish viral complexes. These insights highlight the limitations of GPPI-trained predictive models and the need for specialized computational frameworks. VASCO serves as a critical resource for advancing viral immunology, improving predictive modeling, and guiding immunogen design to elicit protective antibody responses. By bridging sequence and structural immunological datasets, VASCO should enable better docking, affinity prediction, and antiviral therapeutic development-key to pandemic preparedness and emerging pathogen response.
Collapse
Affiliation(s)
- Kenny Wong
- Department of Chemical Engineering, Northeastern University, Boston, MA
| | | | - Emma Stevens
- Department of Chemistry and Chemical Biology, Northeastern University, Boston, MA
| | - Srirupa Chakraborty
- Department of Chemical Engineering, Northeastern University, Boston, MA
- Department of Chemistry and Chemical Biology, Northeastern University, Boston, MA
- Department of Physics, Northeastern University, Boston, MA
| |
Collapse
|
3
|
Pereira GP, Gouzien C, Souza PCT, Martin J. Challenges in predicting PROTAC-mediated protein-protein interfaces with AlphaFold reveal a general limitation on small interfaces. BIOINFORMATICS ADVANCES 2025; 5:vbaf056. [PMID: 40144455 PMCID: PMC11938821 DOI: 10.1093/bioadv/vbaf056] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/20/2024] [Revised: 03/06/2025] [Accepted: 03/12/2025] [Indexed: 03/28/2025]
Abstract
Motivation Proteolysis Targeting Chimeras (PROTACs) are heterobifunctional molecules composed by ligands binding to a target protein and a E3-ligase complex, connected by a linker, that induce proximity-based target protein degradation. PROTACs are promising alternatives to conventional drugs against cancer. Predicting PROTAC-mediated complexes is often the first step for in silico PROTAC design pipelines. We previously noted that AlphaFold2 (AF2) fails to predict PROTAC-mediated complexes. Results Here, we investigate the potential causes of this limitation. We consider a set of 326 protein heterodimers orthogonal to the AF2 training set, and evaluate AF2 models focusing on the interface size and presence of interface ligand. Our results show that AF2-multimer predictions are sensitive to the size of the interface to predict even in the absence of ligands, with the majority of models being incorrect for the smallest interfaces. We also benchmark both AF2 and AF3 on a set of 28 PROTAC-mediated dimers and show that AF3 does not significantly improve upon the accuracy of AF2. The low accuracy of AF2 on complexes with small interfaces has strong implications for computational pipelines for PROTAC design, as these stabilize typically small interfaces, and more generally on any prediction task that involves small interfaces. Availability and implementation All the models analyzed in this article are available in the Zenodo archive https://zenodo.org/records/14810843.
Collapse
Affiliation(s)
- Gilberto P Pereira
- Laboratoire de Biologie et Modelisation de la Cellule, Ecole Normale Superieure de Lyon, CNRS, UMR 5239, Universite Claude Bernard Lyon 1, Inserm, U1293, Lyon F-69364, France
- Centre Blaise Pascal de Simulation et de Modelisation Numerique, Ecole Normale Superieure de Lyon, Lyon 69364, France
| | - Corentin Gouzien
- Laboratoire d'Océanographie Microbienne, UMR 7621, CNRS-SU, Observatoire Océanologique de Banyuls, Banyuls-sur-Mer F-66650, France
| | - Paulo C T Souza
- Laboratoire de Biologie et Modelisation de la Cellule, Ecole Normale Superieure de Lyon, CNRS, UMR 5239, Universite Claude Bernard Lyon 1, Inserm, U1293, Lyon F-69364, France
- Centre Blaise Pascal de Simulation et de Modelisation Numerique, Ecole Normale Superieure de Lyon, Lyon 69364, France
| | - Juliette Martin
- Laboratoire de Biologie et Modelisation de la Cellule, Ecole Normale Superieure de Lyon, CNRS, UMR 5239, Universite Claude Bernard Lyon 1, Inserm, U1293, Lyon F-69364, France
| |
Collapse
|
4
|
Grassmann G, Di Rienzo L, Ruocco G, Miotto M, Milanetti E. Compact Assessment of Molecular Surface Complementarities Enhances Neural Network-Aided Prediction of Key Binding Residues. J Chem Inf Model 2025; 65:2695-2709. [PMID: 39982412 PMCID: PMC11898074 DOI: 10.1021/acs.jcim.4c02286] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2024] [Revised: 02/09/2025] [Accepted: 02/13/2025] [Indexed: 02/22/2025]
Abstract
Predicting interactions between proteins is fundamental for understanding the mechanisms underlying cellular processes, since protein-protein complexes are crucial in physiological conditions but also in many diseases, for example by seeding aggregates formation. Despite the many advancements made so far, the performance of docking protocols is deeply dependent on their capability to identify binding regions. From this, the importance of developing low-cost and computationally efficient methods in this field. We present an integrated novel protocol mainly based on compact modeling of protein surface patches via sets of orthogonal polynomials to identify regions of high shape/electrostatic complementarity. By incorporating both hydrophilic and hydrophobic contributions, we define new binding matrices, which serve as effective inputs for training a neural network. In this work, we propose a new Neural Network (NN)-based architecture, Core Interacting Residues Network (CIRNet), which achieves a performance in terms of Area Under the Receiver Operating Characteristic Curve (ROC AUC) of approximately 0.87 in identifying pairs of core interacting residues on a balanced data set. In a blind search for core interacting residues, CIRNet distinguishes them from random decoys with an ROC AUC of 0.72. We test this protocol to enhance docking algorithms by filtering the proposed poses, addressing one of the still open problems in computational biology. Notably, when applied to the top ten models from three widely used docking servers, CIRNet improves docking outcomes, significantly reducing the average RMSD between the selected poses and the native state. Compared to another state-of-the-art tool for rescaling docking poses, CIRNet more efficiently identified the worst poses generated by the three docking servers under consideration and achieved superior rescaling performance in two cases.
Collapse
Affiliation(s)
- Greta Grassmann
- Department
of Biochemical Sciences “Alessandro Rossi Fanelli”, Sapienza University of Rome, P.Le A. Moro 5, Rome 00185, Italy
- Center
for Life Nano & Neuro Science, Istituto
Italiano di Tecnologia, Viale Regina Elena 291, Rome 00161, Italy
| | - Lorenzo Di Rienzo
- Center
for Life Nano & Neuro Science, Istituto
Italiano di Tecnologia, Viale Regina Elena 291, Rome 00161, Italy
| | - Giancarlo Ruocco
- Center
for Life Nano & Neuro Science, Istituto
Italiano di Tecnologia, Viale Regina Elena 291, Rome 00161, Italy
- Department
of Physics, Sapienza University, Piazzale Aldo Moro 5, Rome 00185, Italy
| | - Mattia Miotto
- Center
for Life Nano & Neuro Science, Istituto
Italiano di Tecnologia, Viale Regina Elena 291, Rome 00161, Italy
| | - Edoardo Milanetti
- Center
for Life Nano & Neuro Science, Istituto
Italiano di Tecnologia, Viale Regina Elena 291, Rome 00161, Italy
- Department
of Physics, Sapienza University, Piazzale Aldo Moro 5, Rome 00185, Italy
| |
Collapse
|
5
|
Chaves EF, Sartori J, Santos WM, Cruz CHB, Mhrous EN, Nacimento-Filho M, Ferraz MVF, Lins RD. Estimating Absolute Protein-Protein Binding Free Energies by a Super Learner Model. J Chem Inf Model 2025; 65:2602-2609. [PMID: 39973292 PMCID: PMC11898044 DOI: 10.1021/acs.jcim.4c01641] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2024] [Revised: 02/05/2025] [Accepted: 02/10/2025] [Indexed: 02/21/2025]
Abstract
Protein-protein binding is central to most biochemical processes of all living beings. Its importance underlies mechanisms ranging from cell interactions to metabolic control, but also to ex vivo biotechnology, such as the development of therapeutic monoclonal antibodies, the engineering of enzymes for industrial biocatalysis, the development of biosensors for disease detection, and the assembly of artificial protein complexes for drug screening. Therefore, predicting the strength of their association allows for understanding the molecular mechanisms and ultimately controlling them. We devised a machine learning ensemble model that uses Rosetta-based quantities to predict binding free energies of protein-protein complexes with accuracy rivaling both computationally demanding methods and currently available ML/DL tools. The method was encoded into an application Python pipeline named PBEE, which stands for Protein Binding Energy Estimator, allowing a rapid calculation of the absolute binding free energies of protein complexes from their PDB coordinates.
Collapse
Affiliation(s)
- Elton
J. F. Chaves
- Aggeu
Magalhães Institute, Oswaldo Cruz
Foundation, Recife 50670-465, Brazil
| | - João Sartori
- Laboratory
for Applied Genomics and Bio-Innovations, Oswaldo Cruz Foundation, Rio de
Janeiro 21040-900, Brazil
| | - Whendel M. Santos
- Department
of Fundamental Chemistry, Federal University
of Pernambuco, Recife 50670-901, Brazil
| | - Carlos H. B. Cruz
- Institute
of Structural and Molecular Biology, University
College London, London WC1E 6BT, U.K.
| | - Emmanuel N. Mhrous
- Department
of Computer Science, Princeton University, Princeton, New Jersey 08544, United States
| | | | | | - Roberto D. Lins
- Aggeu
Magalhães Institute, Oswaldo Cruz
Foundation, Recife 50670-465, Brazil
- Department
of Fundamental Chemistry, Federal University
of Pernambuco, Recife 50670-901, Brazil
| |
Collapse
|
6
|
Ambreen S, Umar M, Noor A, Jain H, Ali R. Advanced AI and ML frameworks for transforming drug discovery and optimization: With innovative insights in polypharmacology, drug repurposing, combination therapy and nanomedicine. Eur J Med Chem 2025; 284:117164. [PMID: 39721292 DOI: 10.1016/j.ejmech.2024.117164] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2024] [Revised: 11/24/2024] [Accepted: 11/27/2024] [Indexed: 12/28/2024]
Abstract
Artificial Intelligence (AI) and Machine Learning (ML) are transforming drug discovery by overcoming traditional challenges like high costs, time-consuming, and frequent failures. AI-driven approaches streamline key phases, including target identification, lead optimization, de novo drug design, and drug repurposing. Frameworks such as deep neural networks (DNNs), convolutional neural networks (CNNs), and deep reinforcement learning (DRL) models have shown promise in identifying drug targets, optimizing delivery systems, and accelerating drug repurposing. Generative adversarial networks (GANs) and variational autoencoders (VAEs) aid de novo drug design by creating novel drug-like compounds with desired properties. Case studies, such as DDR1 kinase inhibitors designed using generative models and CDK20 inhibitors developed via structure-based methods, highlight AI's ability to produce highly specific therapeutics. Models like SNF-CVAE and DeepDR further advance drug repurposing by uncovering new therapeutic applications for existing drugs. Advanced ML algorithms enhance precision in predicting drug efficacy, toxicity, and ADME-Tox properties, reducing development costs and improving drug-target interactions. AI also supports polypharmacology by optimizing multi-target drug interactions and enhances combination therapy through predictions of drug synergies and antagonisms. In nanomedicine, AI models like CURATE.AI and the Hartung algorithm optimize personalized treatments by predicting toxicological risks and real-time dosing adjustments with high accuracy. Despite its potential, challenges like data quality, model interpretability, and ethical concerns must be addressed. High-quality datasets, transparent models, and unbiased algorithms are essential for reliable AI applications. As AI continues to evolve, it is poised to revolutionize drug discovery and personalized medicine, advancing therapeutic development and patient care.
Collapse
Affiliation(s)
- Subiya Ambreen
- Department of Pharmaceutical Chemistry, Delhi Institute of Pharmaceutical Sciences and Research (DIPSAR), DPSRU, Pushp Vihar, New Delhi, 110017, India
| | - Mohammad Umar
- Department of Pharmaceutical Chemistry, Delhi Institute of Pharmaceutical Sciences and Research (DIPSAR), DPSRU, Pushp Vihar, New Delhi, 110017, India
| | - Aaisha Noor
- Department of Pharmaceutical Chemistry, Delhi Institute of Pharmaceutical Sciences and Research (DIPSAR), DPSRU, Pushp Vihar, New Delhi, 110017, India
| | - Himangini Jain
- Department of Pharmaceutical Chemistry, Delhi Institute of Pharmaceutical Sciences and Research (DIPSAR), DPSRU, Pushp Vihar, New Delhi, 110017, India
| | - Ruhi Ali
- Department of Pharmaceutical Chemistry, Delhi Institute of Pharmaceutical Sciences and Research (DIPSAR), DPSRU, Pushp Vihar, New Delhi, 110017, India.
| |
Collapse
|
7
|
Papadopoulos AM, Axenopoulos A, Iatrou A, Stamatopoulos K, Alvarez F, Daras P. ParaSurf: a surface-based deep learning approach for paratope-antigen interaction prediction. Bioinformatics 2025; 41:btaf062. [PMID: 39921885 PMCID: PMC11855283 DOI: 10.1093/bioinformatics/btaf062] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2024] [Revised: 01/14/2025] [Accepted: 02/03/2025] [Indexed: 02/10/2025] Open
Abstract
MOTIVATION Identifying antibody binding sites, is crucial for developing vaccines and therapeutic antibodies, processes that are time-consuming and costly. Accurate prediction of the paratope's binding site can speed up the development by improving our understanding of antibody-antigen interactions. RESULTS We present ParaSurf, a deep learning model that significantly enhances paratope prediction by incorporating both surface geometric and non-geometric factors. Trained and tested on three prominent antibody-antigen benchmarks, ParaSurf achieves state-of-the-art results across nearly all metrics. Unlike models restricted to the variable region, ParaSurf demonstrates the ability to accurately predict binding scores across the entire Fab region of the antibody. Additionally, we conducted an extensive analysis using the largest of the three datasets employed, focusing on three key components: (i) a detailed evaluation of paratope prediction for each complementarity-determining region loop, (ii) the performance of models trained exclusively on the heavy chain, and (iii) the results of training models solely on the light chain without incorporating data from the heavy chain. AVAILABILITY AND IMPLEMENTATION Source code for ParaSurf, along with the datasets used, preprocessing pipeline, and trained model weights, are freely available at https://github.com/aggelos-michael-papadopoulos/ParaSurf.
Collapse
Affiliation(s)
- Angelos-Michael Papadopoulos
- Information Technologies Institute, Centre for Research and Technology Hellas, Thessaloniki 57001, Greece
- Universidad Politécnica de Madrid, Madrid 28040, Spain
| | - Apostolos Axenopoulos
- Information Technologies Institute, Centre for Research and Technology Hellas, Thessaloniki 57001, Greece
- Aristotle University of Thessaloniki, Thessaloniki 54124, Greece
| | - Anastasia Iatrou
- Institute of Applied Biosciences, Centre for Research and Technology Hellas, Thessaloniki 57001, Greece
| | - Kostas Stamatopoulos
- Institute of Applied Biosciences, Centre for Research and Technology Hellas, Thessaloniki 57001, Greece
| | | | - Petros Daras
- Information Technologies Institute, Centre for Research and Technology Hellas, Thessaloniki 57001, Greece
| |
Collapse
|
8
|
Shirali A, Stebliankin V, Karki U, Shi J, Chapagain P, Narasimhan G. A comprehensive survey of scoring functions for protein docking models. BMC Bioinformatics 2025; 26:25. [PMID: 39844036 PMCID: PMC11755896 DOI: 10.1186/s12859-024-05991-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2024] [Accepted: 11/18/2024] [Indexed: 01/24/2025] Open
Abstract
BACKGROUND While protein-protein docking is fundamental to our understanding of how proteins interact, scoring protein-protein complex conformations is a critical component of successful docking programs. Without accurate and efficient scoring functions to differentiate between native and non-native binding complexes, the accuracy of current docking tools cannot be guaranteed. Although many innovative scoring functions have been proposed, a good scoring function for docking remains elusive. Deep learning models offer alternatives to using explicit empirical or mathematical functions for scoring protein-protein complexes. RESULTS In this study, we perform a comprehensive survey of the state-of-the-art scoring functions by considering the most popular and highly performant approaches, both classical and deep learning-based, for scoring protein-protein complexes. The methods were also compared based on their runtime as it directly impacts their use in large-scale docking applications. CONCLUSIONS We evaluate the strengths and weaknesses of classical and deep learning-based approaches across seven public and popular datasets to aid researchers in understanding the progress made in this field.
Collapse
Affiliation(s)
- Azam Shirali
- Bioinformatics Research Group (BioRG), Knight Foundation School of Computing and Information Sciences, Florida International University, 11200 SW 8th 10 St, Miami, 33199, USA
| | - Vitalii Stebliankin
- Bioinformatics Research Group (BioRG), Knight Foundation School of Computing and Information Sciences, Florida International University, 11200 SW 8th 10 St, Miami, 33199, USA
| | - Ukesh Karki
- Department of Physics, Florida International University, 11200 SW 8th 10 St, Miami, 33199, USA
| | - Jimeng Shi
- Bioinformatics Research Group (BioRG), Knight Foundation School of Computing and Information Sciences, Florida International University, 11200 SW 8th 10 St, Miami, 33199, USA
| | - Prem Chapagain
- Department of Physics, Florida International University, 11200 SW 8th 10 St, Miami, 33199, USA
- Biomolecular Sciences Institute, Florida International University, 11200 SW 8th St, Miami, 33199, USA
| | - Giri Narasimhan
- Bioinformatics Research Group (BioRG), Knight Foundation School of Computing and Information Sciences, Florida International University, 11200 SW 8th 10 St, Miami, 33199, USA.
- Biomolecular Sciences Institute, Florida International University, 11200 SW 8th St, Miami, 33199, USA.
| |
Collapse
|
9
|
Gowthaman R, Park M, Yin R, Guest JD, Pierce BG. AlphaFold and Docking Approaches for Antibody-Antigen and Other Targets: Insights From CAPRI Rounds 47-55. Proteins 2025. [PMID: 39831331 DOI: 10.1002/prot.26801] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2024] [Revised: 12/26/2024] [Accepted: 01/10/2025] [Indexed: 01/22/2025]
Abstract
Accurate modeling of the structures of protein-protein complexes and other biomolecular interactions represents a longstanding and important challenge for computational biology. The Critical Assessment of PRedicted Interactions (CAPRI) experiment has served for over two decades as a key means to assess and compare current approaches and methods through blind predictive scenarios, highlighting useful strategies, and new developments. Here we describe the performance of our laboratory's team in recent CAPRI rounds, which included submissions for 10 modeling targets. Our team utilized a range of docking and modeling approaches, including ZDOCK, Rosetta, and ZRANK, to model, refine, and score protein-protein and protein-DNA complexes. For recent targets we utilized adaptations of AlphaFold to generate models, leading to near-native models for an antibody-peptide target, and a highly accurate (but low ranked) model for an antibody-MHC complex. These results underscore the utility of AlphaFold-based protocols for predictive protein complex modeling, including for immune recognition, and highlight considerations regarding the use of AlphaFold confidence metrics in model selection.
Collapse
Affiliation(s)
- Ragul Gowthaman
- University of Maryland Institute for Bioscience and Biotechnology Research, Rockville, Maryland, USA
- Department of Cell Biology and Molecular Genetics, University of Maryland, College Park, Maryland, USA
| | - Minjae Park
- University of Maryland Institute for Bioscience and Biotechnology Research, Rockville, Maryland, USA
- Department of Cell Biology and Molecular Genetics, University of Maryland, College Park, Maryland, USA
| | - Rui Yin
- University of Maryland Institute for Bioscience and Biotechnology Research, Rockville, Maryland, USA
- Department of Cell Biology and Molecular Genetics, University of Maryland, College Park, Maryland, USA
| | - Johnathan D Guest
- University of Maryland Institute for Bioscience and Biotechnology Research, Rockville, Maryland, USA
- Department of Cell Biology and Molecular Genetics, University of Maryland, College Park, Maryland, USA
| | - Brian G Pierce
- University of Maryland Institute for Bioscience and Biotechnology Research, Rockville, Maryland, USA
- Department of Cell Biology and Molecular Genetics, University of Maryland, College Park, Maryland, USA
| |
Collapse
|
10
|
Harmalkar A, Lyskov S, Gray JJ. Reliable protein-protein docking with AlphaFold, Rosetta, and replica-exchange. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2023.07.28.551063. [PMID: 37546760 PMCID: PMC10402144 DOI: 10.1101/2023.07.28.551063] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/08/2023]
Abstract
Despite the recent breakthrough of AlphaFold (AF) in the field of protein sequence-to-structure prediction, modeling protein interfaces and predicting protein complex structures remains challenging, especially when there is a significant conformational change in one or both binding partners. Prior studies have demonstrated that AF-multimer (AFm) can predict accurate protein complexes in only up to 43% of cases.1 In this work, we combine AlphaFold as a structural template generator with a physics-based replica exchange docking algorithm to better sample conformational changes. Using a curated collection of 254 available protein targets with both unbound and bound structures, we first demonstrate that AlphaFold confidence measures (pLDDT) can be repurposed for estimating protein flexibility and docking accuracy for multimers. We incorporate these metrics within our ReplicaDock 2.0 protocol2to complete a robust in-silico pipeline for accurate protein complex structure prediction. AlphaRED (AlphaFold-initiated Replica Exchange Docking) successfully docks failed AF predictions including 97 failure cases in Docking Benchmark Set 5.5. AlphaRED generates CAPRI acceptable-quality or better predictions for 63% of benchmark targets. Further, on a subset of antigen-antibody targets, which is challenging for AFm (20% success rate), AlphaRED demonstrates a success rate of 43%. This new strategy demonstrates the success possible by integrating deep-learning based architectures trained on evolutionary information with physics-based enhanced sampling. The pipeline is available at github.com/Graylab/AlphaRED.
Collapse
Affiliation(s)
- Ameya Harmalkar
- Department of Chemical and Biomolecular Engineering, The Johns Hopkins University, Baltimore, MD 21218, USA
| | - Sergey Lyskov
- Department of Chemical and Biomolecular Engineering, The Johns Hopkins University, Baltimore, MD 21218, USA
| | - Jeffrey J Gray
- Department of Chemical and Biomolecular Engineering, The Johns Hopkins University, Baltimore, MD 21218, USA
- Program in Molecular Biophysics, The Johns Hopkins University, Baltimore, MD 21218, USA
| |
Collapse
|
11
|
Olechnovič K, Banciul R, Dapkūnas J, Venclovas Č. FTDMP: A Framework for Protein-Protein, Protein-DNA, and Protein-RNA Docking and Scoring. Proteins 2025. [PMID: 39748638 DOI: 10.1002/prot.26792] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2024] [Revised: 11/27/2024] [Accepted: 12/18/2024] [Indexed: 01/04/2025]
Abstract
FTDMP is a software framework for biomolecular docking and scoring. It can perform docking of subunits containing one or more protein, DNA, or RNA chains, followed by subsequent scoring of the resulting models. FTDMP can also be used for the ranking of user-provided models of biomolecular complexes, generated by any structure prediction method. FTDMP evaluates models according to the consensus-based method VoroIF-jury, which combines individual scores derived from the Voronoi tessellation of biomolecular structures. In addition to the default scoring mode, FTDMP can easily adopt additional scores; thus, it may be used as a tool to assess newly developed scoring functions. FTDMP was evaluated during blind testing in recent CAPRI experiments and using protein-protein, protein-DNA, and protein-RNA docking benchmarks. It proved to be a useful tool for different research tasks, related to modeling biomolecular interactions. The software, cleaned docking benchmarks, and benchmarking results are available at https://bioinformatics.lt/software/ftdmp/.
Collapse
Affiliation(s)
- Kliment Olechnovič
- Institute of Biotechnology, Life Sciences Center, Vilnius University, Vilnius, Lithuania
- Université Grenoble Alpes, CNRS, Grenoble INP, LJK, Grenoble, France
| | - Rita Banciul
- Institute of Biotechnology, Life Sciences Center, Vilnius University, Vilnius, Lithuania
| | - Justas Dapkūnas
- Institute of Biotechnology, Life Sciences Center, Vilnius University, Vilnius, Lithuania
| | - Česlovas Venclovas
- Institute of Biotechnology, Life Sciences Center, Vilnius University, Vilnius, Lithuania
| |
Collapse
|
12
|
Rodrigues CHM, Ascher DB. CSM-Potential2: A comprehensive deep learning platform for the analysis of protein interacting interfaces. Proteins 2025; 93:209-216. [PMID: 37870486 PMCID: PMC11623435 DOI: 10.1002/prot.26615] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2023] [Revised: 10/04/2023] [Accepted: 10/05/2023] [Indexed: 10/24/2023]
Abstract
Proteins are molecular machinery that participate in virtually all essential biological functions within the cell, which are tightly related to their 3D structure. The importance of understanding protein structure-function relationship is highlighted by the exponential growth of experimental structures, which has been greatly expanded by recent breakthroughs in protein structure prediction, most notably RosettaFold, and AlphaFold2. These advances have prompted the development of several computational approaches that leverage these data sources to explore potential biological interactions. However, most methods are generally limited to analysis of single types of interactions, such as protein-protein or protein-ligand interactions, and their complexity limits the usability to expert users. Here we report CSM-Potential2, a deep learning platform for the analysis of binding interfaces on protein structures. In addition to prediction of protein-protein interactions binding sites and classification of biological ligands, our new platform incorporates prediction of interactions with nucleic acids at the residue level and allows for ligand transplantation based on sequence and structure similarity to experimentally determined structures. We anticipate our platform to be a valuable resource that provides easy access to a range of state-of-the-art methods to expert and non-expert users for the study of biological interactions. Our tool is freely available as an easy-to-use web server and API available at https://biosig.lab.uq.edu.au/csm_potential.
Collapse
Affiliation(s)
- Carlos H. M. Rodrigues
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes InstituteMelbourneVictoriaAustralia
- School of Chemistry and Molecular BiosciencesUniversity of QueenslandBrisbaneQueenslandAustralia
| | - David B. Ascher
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes InstituteMelbourneVictoriaAustralia
- School of Chemistry and Molecular BiosciencesUniversity of QueenslandBrisbaneQueenslandAustralia
| |
Collapse
|
13
|
Samanta R, Harmalkar A, Prathima P, Gray JJ. Advancing Membrane-Associated Protein Docking with Improved Sampling and Scoring in Rosetta. J Chem Theory Comput 2024; 20:10740-10749. [PMID: 39574325 DOI: 10.1021/acs.jctc.4c00927] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2024]
Abstract
The oligomerization of protein macromolecules on cell membranes plays a fundamental role in regulating cellular function. From modulating signal transduction to directing immune response, membrane proteins (MPs) play a crucial role in biological processes and are often the target of many pharmaceutical drugs. Despite their biological relevance, the challenges in experimental determination have hampered the structural availability of membrane proteins and their complexes. Computational docking provides a promising alternative to model membrane protein complex structures. Here, we present Rosetta-MPDock, a flexible transmembrane (TM) protein docking protocol that captures binding-induced conformational changes. Rosetta-MPDock samples large conformational ensembles of flexible monomers and docks them within an implicit membrane environment. We benchmarked this method on 29 TM-protein complexes of variable backbone flexibility. These complexes are classified based on the root-mean-square deviation between the unbound and bound states (RMSDUB) as rigid (RMSDUB < 1.2 Å), moderately flexible (RMSDUB ∈ [1.2, 2.2] Å), and flexible targets (RMSDUB > 2.2 Å). In a local docking scenario, i.e. with membrane protein partners starting ≈10 Å apart embedded in the membrane in their unbound conformations, Rosetta-MPDock successfully predicts the correct interface (success defined as achieving 3 near-native structures in the 5 top-ranked models) for 67% moderately flexible targets and 60% of the highly flexible targets, a substantial improvement from the existing membrane protein docking methods. Further, by integrating AlphaFold2-multimer for structure determination and using Rosetta-MPDock for docking and refinement, we demonstrate improved success rates over the benchmark targets from 64% to 73%. Rosetta-MPDock advances the capabilities for membrane protein complex structure prediction and modeling to tackle key biological questions and elucidate functional mechanisms in the membrane environment. The benchmark set and the code is available for public use at github.com/Graylab/MPDock.
Collapse
Affiliation(s)
- Rituparna Samanta
- Department of Chemical and Biomolecular Engineering, The Johns Hopkins University, Baltimore, Maryland 21218, United States
| | - Ameya Harmalkar
- Department of Chemical and Biomolecular Engineering, The Johns Hopkins University, Baltimore, Maryland 21218, United States
| | - Priyamvada Prathima
- Department of Chemical and Biomolecular Engineering, The Johns Hopkins University, Baltimore, Maryland 21218, United States
| | - Jeffrey J Gray
- Department of Chemical and Biomolecular Engineering, The Johns Hopkins University, Baltimore, Maryland 21218, United States
| |
Collapse
|
14
|
Han J, Zhang S, Guan M, Li Q, Gao X, Liu J. GeoNet enables the accurate prediction of protein-ligand binding sites through interpretable geometric deep learning. Structure 2024; 32:2435-2448.e5. [PMID: 39488202 DOI: 10.1016/j.str.2024.10.011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2024] [Revised: 09/13/2024] [Accepted: 10/08/2024] [Indexed: 11/04/2024]
Abstract
The identification of protein binding residues is essential for understanding their functions in vivo. However, it remains a computational challenge to accurately identify binding sites due to the lack of known residue binding patterns. Local residue spatial distribution and its interactive biophysical environment both determine binding patterns. Previous methods could not capture both information simultaneously, resulting in unsatisfactory performance. Here, we present GeoNet, an interpretable geometric deep learning model for predicting DNA, RNA, and protein binding sites by learning the latent residue binding patterns. GeoNet achieves this by introducing a coordinate-free geometric representation to characterize local residue distributions and generating an eigenspace to depict local interactive biophysical environments. Evaluation shows that GeoNet is superior compared to other leading predictors and it shows a strong interpretability of learned representations. We present three test cases, where interaction interfaces were successfully identified with GeoNet.
Collapse
Affiliation(s)
- Jiyun Han
- School of Mathematics and Statistics, Shandong University, Weihai 264209, China
| | - Shizhuo Zhang
- School of Mathematics and Statistics, Shandong University, Weihai 264209, China
| | - Mingming Guan
- School of Mathematics and Statistics, Shandong University, Weihai 264209, China
| | - Qiuyu Li
- School of Mathematics and Statistics, Shandong University, Weihai 264209, China
| | - Xin Gao
- Computer Science Program, Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955, Saudi Arabia; Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955, Saudi Arabia.
| | - Juntao Liu
- School of Mathematics and Statistics, Shandong University, Weihai 264209, China.
| |
Collapse
|
15
|
Liu H, Chen P, Zhai X, Huo KG, Zhou S, Han L, Fan G. PPB-Affinity: Protein-Protein Binding Affinity dataset for AI-based protein drug discovery. Sci Data 2024; 11:1316. [PMID: 39627219 PMCID: PMC11615212 DOI: 10.1038/s41597-024-03997-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2024] [Accepted: 10/11/2024] [Indexed: 12/06/2024] Open
Abstract
Prediction of protein-protein binding (PPB) affinity plays an important role in large-molecular drug discovery. Deep learning (DL) has been adopted to predict the changes of PPB binding affinities upon mutations, but there was a scarcity of studies predicting the PPB affinity itself. The major reason is the paucity of open-source dataset with PPB affinity data. To address this gap, the current study introduced a large comprehensive PPB affinity (PPB-Affinity) dataset. The PPB-Affinity dataset contains key information such as crystal structures of protein-protein complexes (with or without protein mutation patterns), PPB affinity, receptor protein chain, ligand protein chain, etc. To the best of our knowledge, this is the largest publicly available PPB affinity dataset, and we believe it will significantly advance drug discovery by streamlining the screening of potential large-molecule drugs. We also developed a deep-learning benchmark model with this dataset to predict the PPB affinity, providing a foundational comparison for the research community.
Collapse
Affiliation(s)
- Huaqing Liu
- Artificial Intelligence Innovation Center, Research Institute of Tsinghua, Pearl River Delta, Guangzhou, 510700, China
| | - Peiyi Chen
- Artificial Intelligence Innovation Center, Research Institute of Tsinghua, Pearl River Delta, Guangzhou, 510700, China
| | - Xiaochen Zhai
- Cyagen Biosciences (Suzhou) Inc., Guangzhou, 215000, China
| | - Ku-Geng Huo
- Cyagen Biosciences (Guangzhou) Inc., Guangzhou, 510700, China
| | - Shuxian Zhou
- Artificial Intelligence Innovation Center, Research Institute of Tsinghua, Pearl River Delta, Guangzhou, 510700, China
| | - Lanqing Han
- Artificial Intelligence Innovation Center, Research Institute of Tsinghua, Pearl River Delta, Guangzhou, 510700, China.
- Cyagen Biomodels (Guangzhou) Co., Ltd, Guangzhou, 510700, China.
| | - Guoxin Fan
- Department of Pain Medicine, Shenzhen Nanshan People's Hospital, Shenzhen University Medical School, Shenzhen, 518056, China.
| |
Collapse
|
16
|
Zheng F, Jiang X, Wen Y, Yang Y, Li M. Systematic investigation of machine learning on limited data: A study on predicting protein-protein binding strength. Comput Struct Biotechnol J 2024; 23:460-472. [PMID: 38235359 PMCID: PMC10792694 DOI: 10.1016/j.csbj.2023.12.018] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2023] [Revised: 12/14/2023] [Accepted: 12/16/2023] [Indexed: 01/19/2024] Open
Abstract
The application of machine learning techniques in biological research, especially when dealing with limited data availability, poses significant challenges. In this study, we leveraged advancements in method development for predicting protein-protein binding strength to conduct a systematic investigation into the application of machine learning on limited data. The binding strength, quantitatively measured as binding affinity, is vital for understanding the processes of recognition, association, and dysfunction that occur within protein complexes. By incorporating transfer learning, integrating domain knowledge, and employing both deep learning and traditional machine learning algorithms, we mitigated the impact of data limitations and made significant advancements in predicting protein-protein binding affinity. In particular, we developed over 20 models, ultimately selecting three representative best-performing ones that belong to distinct categories. The first model is structure-based, consisting of a random forest regression and thirteen handcrafted features. The second model is sequence-based, employing an architecture that combines transferred embedding features with a multilayer perceptron. Finally, we created an ensemble model by averaging the predictions of the two aforementioned models. The comparison with other predictors on three independent datasets confirms the significant improvements achieved by our models in predicting protein-protein binding affinity. The programs for running these three models are available at https://github.com/minghuilab/BindPPI.
Collapse
Affiliation(s)
- Feifan Zheng
- MOE Key Laboratory of Geriatric Diseases and Immunology, School of Biology and Basic Medical Sciences, Suzhou Medical College of Soochow University, Suzhou, Jiangsu Province 215123, China
| | - Xin Jiang
- MOE Key Laboratory of Geriatric Diseases and Immunology, School of Biology and Basic Medical Sciences, Suzhou Medical College of Soochow University, Suzhou, Jiangsu Province 215123, China
| | - Yuhao Wen
- MOE Key Laboratory of Geriatric Diseases and Immunology, School of Biology and Basic Medical Sciences, Suzhou Medical College of Soochow University, Suzhou, Jiangsu Province 215123, China
| | - Yan Yang
- MOE Key Laboratory of Geriatric Diseases and Immunology, School of Biology and Basic Medical Sciences, Suzhou Medical College of Soochow University, Suzhou, Jiangsu Province 215123, China
| | - Minghui Li
- MOE Key Laboratory of Geriatric Diseases and Immunology, School of Biology and Basic Medical Sciences, Suzhou Medical College of Soochow University, Suzhou, Jiangsu Province 215123, China
| |
Collapse
|
17
|
McFee M, Kim J, Kim PM. EuDockScore: Euclidean graph neural networks for scoring protein-protein interfaces. Bioinformatics 2024; 40:btae636. [PMID: 39441796 PMCID: PMC11543620 DOI: 10.1093/bioinformatics/btae636] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2024] [Revised: 10/16/2024] [Accepted: 10/21/2024] [Indexed: 10/25/2024] Open
Abstract
MOTIVATION Protein-protein interactions are essential for a variety of biological phenomena including mediating biochemical reactions, cell signaling, and the immune response. Proteins seek to form interfaces which reduce overall system energy. Although determination of single polypeptide chain protein structures has been revolutionized by deep learning techniques, complex prediction has still not been perfected. Additionally, experimentally determining structures is incredibly resource and time expensive. An alternative is the technique of computational docking, which takes the solved individual structures of proteins to produce candidate interfaces (decoys). Decoys are then scored using a mathematical function that assess the quality of the system, known as scoring functions. Beyond docking, scoring functions are a critical component of assessing structures produced by many protein generative models. Scoring models are also used as a final filtering in many generative deep learning models including those that generate antibody binders, and those which perform docking. RESULTS In this work, we present improved scoring functions for protein-protein interactions which utilizes cutting-edge Euclidean graph neural network architectures, to assess protein-protein interfaces. These Euclidean docking score models are known as EuDockScore, and EuDockScore-Ab with the latter being antibody-antigen dock specific. Finally, we provided EuDockScore-AFM a model trained on antibody-antigen outputs from AlphaFold-Multimer (AFM) which proves useful in reranking large numbers of AFM outputs. AVAILABILITY AND IMPLEMENTATION The code for these models is available at https://gitlab.com/mcfeemat/eudockscore.
Collapse
Affiliation(s)
- Matthew McFee
- Department of Molecular Genetics, The University of Toronto, Toronto, ON M5S 1A8, Canada
- Donnelly Centre for Cellular and Biomolecular Research, The University of Toronto, Toronto, ON M5S 3E1, Canada
| | - Jisun Kim
- Donnelly Centre for Cellular and Biomolecular Research, The University of Toronto, Toronto, ON M5S 3E1, Canada
| | - Philip M Kim
- Department of Molecular Genetics, The University of Toronto, Toronto, ON M5S 1A8, Canada
- Donnelly Centre for Cellular and Biomolecular Research, The University of Toronto, Toronto, ON M5S 3E1, Canada
- Department of Computer Science, The University of Toronto, Toronto, ON M5S 2E4, Canada
| |
Collapse
|
18
|
Bhadra-Lobo S, Derevyanko G, Lamoureux G. Dock2D: Synthetic Data for the Molecular Recognition Problem. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2024; 21:2580-2586. [PMID: 38814763 DOI: 10.1109/tcbb.2024.3407477] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/01/2024]
Abstract
Predicting the physical interaction of proteins is a cornerstone problem in computational biology. New classes of learning-based algorithms are actively being developed, and are typically trained end-to-end on protein complex structures extracted from the Protein Data Bank. These training datasets tend to be large and difficult to use for prototyping and, unlike image or natural language datasets, they are not easily interpretable by non-experts. We present Dock2D-IP and Dock2D-IF, two "toy" datasets that can be used to select algorithms predicting protein-protein interactions-or any other type of molecular interactions. Using two-dimensional shapes as input, each example from Dock2D-IP ("interaction pose") describes the interaction pose of two shapes known to interact and each example from Dock2D-IF ("interaction fact") describes whether two shapes form a stable complex or not, regardless of how they bind. We propose a number of baseline solutions to the problem and show that the same underlying energy function can be learned either by solving the interaction pose task (formulated as an energy-minimization "docking" problem) or the fact-of-interaction task (formulated as a binding free energy estimation problem).
Collapse
|
19
|
Kumar A K, Rathore RS. Categorization of hotspots into three types - weak, moderate and strong to distinguish protein-protein versus protein-peptide interactions. J Biomol Struct Dyn 2024; 42:9348-9360. [PMID: 37649387 DOI: 10.1080/07391102.2023.2252077] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2023] [Accepted: 08/18/2023] [Indexed: 09/01/2023]
Abstract
Protein-protein and protein-peptide interactions (PPI and PPepI) belong to a similar category of interactions, yet seemingly subtle differences exist among them. To characterize differences between protein-protein (PP) and protein-peptide (PPep) interactions, we have focussed on two important classes of residues-hotspot and anchor residues. Using implicit solvation-based free energy calculations, a very large-scale alanine scanning has been performed on benchmark datasets, consisting of over 5700 interface residues. The differences in the two categories are more pronounced, if the data were divided into three distinct types, namely - weak hotspots (having binding free energy loss upon Ala mutation, ΔΔG, ∼2-10 kcal/mol), moderate hotspots (ΔΔG, ∼10-20 kcal/mol) and strong hotspots (ΔΔG ≥ ∼20 kcal/mol). The analysis suggests that for PPI, weak hotspots are predominantly populated by polar and hydrophobic residues. The distribution shifts towards charged and polar residues for moderate hotspot and charged residues (principally Arg) are overwhelmingly present in the strong hotspot. On the other hand, in the PPepI dataset, the distribution shifts from predominantly hydrophobic and polar (in the weak type) to almost similar preference for polar, hydrophobic and charged residues (in moderate type) and finally the charged residue (Arg) and Trp are mostly occupied in the strong type. The preferred anchor residues in both categories are Arg, Tyr and Leu, possessing bulky side chain and which also strike a delicate balance between side chain flexibility and rigidity. The present knowledge should aid in effective design of biologics, by augmentation or disruption of PPIs with peptides or peptidomimetics.Communicated by Ramaswamy H. Sarma.
Collapse
Affiliation(s)
- Kiran Kumar A
- Department of Bioinformatics, School of Earth, Biological and Environmental Sciences, Central University of South Bihar, Gaya, India
| | - R S Rathore
- Department of Bioinformatics, School of Earth, Biological and Environmental Sciences, Central University of South Bihar, Gaya, India
| |
Collapse
|
20
|
Zhao W, Xu G, Wang L, Cui Z, Zhang T, Yang J. Intra-Inter Graph Representation Learning for Protein-Protein Binding Sites Prediction. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2024; 21:1685-1696. [PMID: 38896523 DOI: 10.1109/tcbb.2024.3416341] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/21/2024]
Abstract
Graph neural networks have drawn increasing attention and achieved remarkable progress recently due to their potential applications for a large amount of irregular data. It is a natural way to represent protein as a graph. In this work, we focus on protein-protein binding sites prediction between the ligand and receptor proteins. Previous work just simply adopts graph convolution to learn residue representations of ligand and receptor proteins, then concatenates them and feeds the concatenated representation into a fully connected layer to make predictions, losing much of the information contained in complexes and failing to obtain an optimal prediction. In this paper, we present Intra-Inter Graph Representation Learning for protein-protein binding sites prediction (IIGRL). Specifically, for intra-graph learning, we maximize the mutual information between local node representation and global graph summary to encourage node representation to embody the global information of protein graph. Then we explore fusing two separate ligand and receptor graphs as a whole graph and learning affinities between their residues/nodes to propagate information to each other, which could effectively capture inter-protein information and further enhance the discrimination of residue pairs. Extensive experiments on multiple benchmarks demonstrate that the proposed IIGRL model outperforms state-of-the-art methods.
Collapse
|
21
|
Giulini M, Schneider C, Cutting D, Desai N, Deane CM, Bonvin AMJJ. Towards the accurate modelling of antibody-antigen complexes from sequence using machine learning and information-driven docking. Bioinformatics 2024; 40:btae583. [PMID: 39348157 PMCID: PMC11483107 DOI: 10.1093/bioinformatics/btae583] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2023] [Revised: 07/31/2024] [Accepted: 09/27/2024] [Indexed: 10/01/2024] Open
Abstract
MOTIVATION Antibody-antigen complex modelling is an important step in computational workflows for therapeutic antibody design. While experimentally determined structures of both antibody and the cognate antigen are often not available, recent advances in machine learning-driven protein modelling have enabled accurate prediction of both antibody and antigen structures. Here, we analyse the ability of protein-protein docking tools to use machine learning generated input structures for information-driven docking. RESULTS In an information-driven scenario, we find that HADDOCK can generate accurate models of antibody-antigen complexes using an ensemble of antibody structures generated by machine learning tools and AlphaFold2 predicted antigen structures. Targeted docking using knowledge of the complementary determining regions on the antibody and some information about the targeted epitope allows the generation of high-quality models of the complex with reduced sampling, resulting in a computationally cheap protocol that outperforms the ZDOCK baseline. AVAILABILITY AND IMPLEMENTATION The source code of HADDOCK3 is freely available at github.com/haddocking/haddock3. The code to generate and analyse the data is available at github.com/haddocking/ai-antibodies. The full runs, including docking models from all modules of a workflow have been deposited in our lab collection (data.sbgrid.org/labs/32/1139) at the SBGRID data repository.
Collapse
Affiliation(s)
- Marco Giulini
- Bijvoet Centre for Biomolecular Research, Faculty of Science—Chemistry, Utrecht University, Utrecht CH 3584, The Netherlands
| | | | | | | | | | - Alexandre M J J Bonvin
- Bijvoet Centre for Biomolecular Research, Faculty of Science—Chemistry, Utrecht University, Utrecht CH 3584, The Netherlands
| |
Collapse
|
22
|
Alam R, Mahbub S, Bayzid MS. Pair-EGRET: enhancing the prediction of protein-protein interaction sites through graph attention networks and protein language models. Bioinformatics 2024; 40:btae588. [PMID: 39360982 PMCID: PMC11495673 DOI: 10.1093/bioinformatics/btae588] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2024] [Revised: 09/03/2024] [Accepted: 10/01/2024] [Indexed: 10/05/2024] Open
Abstract
MOTIVATION Proteins are responsible for most biological functions, many of which require the interaction of more than one protein molecule. However, accurately predicting protein-protein interaction (PPI) sites (the interfacial residues of a protein that interact with other protein molecules) remains a challenge. The growing demand and cost associated with the reliable identification of PPI sites using conventional experimental methods call for computational tools for automated prediction and understanding of PPIs. RESULTS We present Pair-EGRET, an edge-aggregated graph attention network that leverages the features extracted from pretrained transformer-like models to accurately predict PPI sites. Pair-EGRET works on a k-nearest neighbor graph, representing the 3D structure of a protein, and utilizes the cross-attention mechanism for accurate identification of interfacial residues of a pair of proteins. Through an extensive evaluation study using a diverse array of experimental data, evaluation metrics, and case studies on representative protein sequences, we demonstrate that Pair-EGRET can achieve remarkable performance in predicting PPI sites. Moreover, Pair-EGRET can provide interpretable insights from the learned cross-attention matrix. AVAILABILITY AND IMPLEMENTATION Pair-EGRET is freely available in open source form at the GitHub Repository https://github.com/1705004/Pair-EGRET.
Collapse
Affiliation(s)
- Ramisa Alam
- Department of Computer Science and Engineering, Bangladesh University of Engineering and Technology, Dhaka 1205, Bangladesh
| | - Sazan Mahbub
- Department of Computer Science and Engineering, Bangladesh University of Engineering and Technology, Dhaka 1205, Bangladesh
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, United States
| | - Md Shamsuzzoha Bayzid
- Department of Computer Science and Engineering, Bangladesh University of Engineering and Technology, Dhaka 1205, Bangladesh
| |
Collapse
|
23
|
Chu LS, Sarma S, Gray JJ. Unified Sampling and Ranking for Protein Docking with DFMDock. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.09.27.615401. [PMID: 39386449 PMCID: PMC11463455 DOI: 10.1101/2024.09.27.615401] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/12/2024]
Abstract
Diffusion models have shown promise in addressing the protein docking problem. Traditionally, these models are used solely for sampling docked poses, with a separate confidence model for ranking. We introduce DFMDock (Denoising Force Matching Dock), a diffusion model that unifies sampling and ranking within a single framework. DFMDock features two output heads: one for predicting forces and the other for predicting energies. The forces are trained using a denoising force matching objective, while the energy gradients are trained to align with the forces. This design enables our model to sample using the predicted forces and rank poses using the predicted energies, thereby eliminating the need for an additional confidence model. Our approach outperforms the previous diffusion model for protein docking, DiffDock-PP, with a sampling success rate of 44% compared to its 8%, and a Top- 1 ranking success rate of 16% compared to 0% on the Docking Benchmark 5.5 test set. In successful decoy cases, the DFMDock Energy forms a binding funnel similar to the physics-based Rosetta Energy, suggesting that DFMDock can capture the underlying energy landscape.
Collapse
Affiliation(s)
- Lee-Shin Chu
- Department of Chemical and Biomolecular Engineering, Johns Hopkins University, Baltimore, MD 21218, USA
| | - Sudeep Sarma
- Department of Chemical and Biomolecular Engineering, Johns Hopkins University, Baltimore, MD 21218, USA
| | - Jeffrey J Gray
- Department of Chemical and Biomolecular Engineering, Johns Hopkins University, Baltimore, MD 21218, USA
| |
Collapse
|
24
|
Wang C, Wang J, Song W, Luo G, Jiang T. EpiScan: accurate high-throughput mapping of antibody-specific epitopes using sequence information. NPJ Syst Biol Appl 2024; 10:101. [PMID: 39251627 PMCID: PMC11383971 DOI: 10.1038/s41540-024-00432-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2023] [Accepted: 08/27/2024] [Indexed: 09/11/2024] Open
Abstract
The identification of antibody-specific epitopes on virus proteins is crucial for vaccine development and drug design. Nonetheless, traditional wet-lab approaches for the identification of epitopes are both costly and labor-intensive, underscoring the need for the development of efficient and cost-effective computational tools. Here, EpiScan, an attention-based deep learning framework for predicting antibody-specific epitopes, is presented. EpiScan adopts a multi-input and single-output strategy by designing independent blocks for different parts of antibodies, including variable heavy chain (VH), variable light chain (VL), complementary determining regions (CDRs), and framework regions (FRs). The block predictions are weighted and integrated for the prediction of potential epitopes. Using multiple experimental data samples, we show that EpiScan, which only uses antibody sequence information, can accurately map epitopes on specific antigen structures. The antibody-specific epitopes on the receptor binding domain (RBD) of SARS coronavirus 2 (SARS-CoV-2) were located by EpiScan, and the potentially valuable vaccine epitope was identified. EpiScan can expedite the epitope mapping process for high-throughput antibody sequencing data, supporting vaccine design and drug development. Availability: For the convenience of related wet-experimental researchers, the source code and web server of EpiScan are publicly available at https://github.com/gzBiomedical/EpiScan .
Collapse
Affiliation(s)
- Chuan Wang
- School of Life Sciences, Sun Yat-sen University, Guangzhou, China
- Guangzhou National Laboratory, Guangzhou, China
| | | | - Wenjun Song
- Guangzhou National Laboratory, Guangzhou, China
- Institute of Integration of Traditional and Western Medicine, The First Affiliated Hospital of Guangzhou Medical University, Guangzhou, China
| | - Guanzheng Luo
- School of Life Sciences, Sun Yat-sen University, Guangzhou, China.
| | - Taijiao Jiang
- Guangzhou National Laboratory, Guangzhou, China.
- State Key Laboratory of Respiratory Disease, The Key laboratory of Advanced Interdisciplinary Studies Center, the First Affiliated Hospital of Guangzhou Medical University, Guangzhou, China.
| |
Collapse
|
25
|
Li L, Li H, Su T, Ming D. Quantitative Characterization of the Impact of Protein-Protein Interactions on Ligand-Protein Binding: A Multi-Chain Dynamics Perturbation Analysis Method. Int J Mol Sci 2024; 25:9172. [PMID: 39273122 PMCID: PMC11394879 DOI: 10.3390/ijms25179172] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2024] [Revised: 08/14/2024] [Accepted: 08/22/2024] [Indexed: 09/15/2024] Open
Abstract
Many protein-protein interactions (PPIs) affect the ways in which small molecules bind to their constituent proteins, which can impact drug efficacy and regulatory mechanisms. While recent advances have improved our ability to independently predict both PPIs and ligand-protein interactions (LPIs), a comprehensive understanding of how PPIs affect LPIs is still lacking. Here, we examined 63 pairs of ligand-protein complexes in a benchmark dataset for protein-protein docking studies and quantified six typical effects of PPIs on LPIs. A multi-chain dynamics perturbation analysis method, called mcDPA, was developed to model these effects and used to predict small-molecule binding regions in protein-protein complexes. Our results illustrated that the mcDPA can capture the impact of PPI on LPI to varying degrees, with six similar changes in its predicted ligand-binding region. The calculations showed that 52% of the examined complexes had prediction accuracy at or above 50%, and 55% of the predictions had a recall of not less than 50%. When applied to 33 FDA-approved protein-protein-complex-targeting drugs, these numbers improved to 60% and 57% for the same accuracy and recall rates, respectively. The method developed in this study may help to design drug-target interactions in complex environments, such as in the case of protein-protein interactions.
Collapse
Affiliation(s)
- Lu Li
- College of Biotechnology and Pharmaceutical Engineering, Nanjing Tech University, 30 South Puzhu Road, Jiangbei New District, Nanjing 211816, China
| | - Hao Li
- College of Biotechnology and Pharmaceutical Engineering, Nanjing Tech University, 30 South Puzhu Road, Jiangbei New District, Nanjing 211816, China
| | - Ting Su
- College of Biotechnology and Pharmaceutical Engineering, Nanjing Tech University, 30 South Puzhu Road, Jiangbei New District, Nanjing 211816, China
| | - Dengming Ming
- College of Biotechnology and Pharmaceutical Engineering, Nanjing Tech University, 30 South Puzhu Road, Jiangbei New District, Nanjing 211816, China
| |
Collapse
|
26
|
Carroll M, Rosenbaum E, Viswanathan R. Computational Methods to Predict Conformational B-Cell Epitopes. Biomolecules 2024; 14:983. [PMID: 39199371 PMCID: PMC11352882 DOI: 10.3390/biom14080983] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2024] [Revised: 08/04/2024] [Accepted: 08/08/2024] [Indexed: 09/01/2024] Open
Abstract
Accurate computational prediction of B-cell epitopes can greatly enhance biomedical research and rapidly advance efforts to develop therapeutics, monoclonal antibodies, vaccines, and immunodiagnostic reagents. Previous research efforts have primarily focused on the development of computational methods to predict linear epitopes rather than conformational epitopes; however, the latter is much more biologically predominant. Several conformational B-cell epitope prediction methods have recently been published, but their predictive performances are weak. Here, we present a review of the latest computational methods and assess their performances on a diverse test set of 29 non-redundant unbound antigen structures. Our results demonstrate that ISPIPab performs better than most methods and compares favorably with other recent antigen-specific methods. Finally, we suggest new strategies and opportunities to improve computational predictions of conformational B-cell epitopes.
Collapse
Affiliation(s)
| | | | - R. Viswanathan
- Department of Chemistry and Biochemistry, Yeshiva College, Yeshiva University, New York, NY 10033, USA; (M.C.); (E.R.)
| |
Collapse
|
27
|
Biswas G, Mukherjee D, Basu S. Combining Complementarity and Binding Energetics in the Assessment of Protein Interactions: EnCPdock-A Practical Manual. J Comput Biol 2024; 31:769-781. [PMID: 38885081 DOI: 10.1089/cmb.2024.0554] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/20/2024] Open
Abstract
The combined effect of shape and electrostatic complementarities (Sc, EC) at the interface of the interacting protein partners (PPI) serves as the physical basis for such associations and is a strong determinant of their binding energetics. EnCPdock (https://www.scinetmol.in/EnCPdock/) presents a comprehensive web platform for the direct conjoint comparative analyses of complementarity and binding energetics in PPIs. It elegantly interlinks the dual nature of local (Sc) and nonlocal complementarity (EC) in PPIs using the complementarity plot. It further derives an AI-based ΔGbinding with a prediction accuracy comparable to the state of the art. This book chapter presents a practical manual to conceptualize and implement EnCPdock with its various features and functionalities, collectively having the potential to serve as a valuable protein engineering tool in the design of novel protein interfaces.
Collapse
Affiliation(s)
- Gargi Biswas
- Department of Chemical and Structural Biology, Weizmann Institute of Science, Rehovot, Israel
| | | | - Sankar Basu
- Department of Microbiology, Asutosh College, University of Calcutta, Kolkata, India
| |
Collapse
|
28
|
Krapp LF, Meireles FA, Abriata LA, Devillard J, Vacle S, Marcaida MJ, Dal Peraro M. Context-aware geometric deep learning for protein sequence design. Nat Commun 2024; 15:6273. [PMID: 39054322 PMCID: PMC11272779 DOI: 10.1038/s41467-024-50571-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2023] [Accepted: 07/15/2024] [Indexed: 07/27/2024] Open
Abstract
Protein design and engineering are evolving at an unprecedented pace leveraging the advances in deep learning. Current models nonetheless cannot natively consider non-protein entities within the design process. Here, we introduce a deep learning approach based solely on a geometric transformer of atomic coordinates and element names that predicts protein sequences from backbone scaffolds aware of the restraints imposed by diverse molecular environments. To validate the method, we show that it can produce highly thermostable, catalytically active enzymes with high success rates. This concept is anticipated to improve the versatility of protein design pipelines for crafting desired functions.
Collapse
Affiliation(s)
- Lucien F Krapp
- Laboratory for Biomolecular Modeling, Institute of Bioengineering, School of Life Sciences, Ecole Fédérale de Lausanne (EPFL), Lausanne, Switzerland
- Swiss Institute of Bioinformatics (SIB), Lausanne, Switzerland
| | - Fernando A Meireles
- Laboratory for Biomolecular Modeling, Institute of Bioengineering, School of Life Sciences, Ecole Fédérale de Lausanne (EPFL), Lausanne, Switzerland
- Swiss Institute of Bioinformatics (SIB), Lausanne, Switzerland
| | - Luciano A Abriata
- Laboratory for Biomolecular Modeling, Institute of Bioengineering, School of Life Sciences, Ecole Fédérale de Lausanne (EPFL), Lausanne, Switzerland
- Swiss Institute of Bioinformatics (SIB), Lausanne, Switzerland
| | - Jean Devillard
- Laboratory for Biomolecular Modeling, Institute of Bioengineering, School of Life Sciences, Ecole Fédérale de Lausanne (EPFL), Lausanne, Switzerland
| | - Sarah Vacle
- Laboratory for Biomolecular Modeling, Institute of Bioengineering, School of Life Sciences, Ecole Fédérale de Lausanne (EPFL), Lausanne, Switzerland
- Swiss Institute of Bioinformatics (SIB), Lausanne, Switzerland
| | - Maria J Marcaida
- Laboratory for Biomolecular Modeling, Institute of Bioengineering, School of Life Sciences, Ecole Fédérale de Lausanne (EPFL), Lausanne, Switzerland
- Swiss Institute of Bioinformatics (SIB), Lausanne, Switzerland
| | - Matteo Dal Peraro
- Laboratory for Biomolecular Modeling, Institute of Bioengineering, School of Life Sciences, Ecole Fédérale de Lausanne (EPFL), Lausanne, Switzerland.
- Swiss Institute of Bioinformatics (SIB), Lausanne, Switzerland.
| |
Collapse
|
29
|
Samanta R, Harmalkar A, Prathima P, Gray JJ. Advancing membrane-associated protein docking with improved sampling and scoring in Rosetta. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.07.09.602802. [PMID: 39026849 PMCID: PMC11257521 DOI: 10.1101/2024.07.09.602802] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/20/2024]
Abstract
The oligomerization of protein macromolecules on cell membranes plays a fundamental role in regulating cellular function. From modulating signal transduction to directing immune response, membrane proteins (MPs) play a crucial role in biological processes and are often the target of many pharmaceutical drugs. Despite their biological relevance, the challenges in experimental determination have hampered the structural availability of membrane proteins and their complexes. Computational docking provides a promising alternative to model membrane protein complex structures. Here, we present Rosetta-MPDock, a flexible transmembrane (TM) protein docking protocol that captures binding-induced conformational changes. Rosetta-MPDock samples large conformational ensembles of flexible monomers and docks them within an implicit membrane environment. We benchmarked this method on 29 TM-protein complexes of variable backbone flexibility. These complexes are classified based on the root-mean-square deviation between the unbound and bound states (RMSDUB) as: rigid (RMSDUB <1.2 Å), moderately-flexible (RMSDUB ∈ [1.2, 2.2) Å), and flexible targets (RMSDUB > 2.2 Å). In a local docking scenario, i.e. with membrane protein partners starting ≈10 Å apart embedded in the membrane in their unbound conformations, Rosetta-MPDock successfully predicts the correct interface (success defined as achieving 3 near-native structures in the 5 top-ranked models) for 67% moderately flexible targets and 60% of the highly flexible targets, a substantial improvement from the existing membrane protein docking methods. Further, by integrating AlphaFold2-multimer for structure determination and using Rosetta-MPDock for docking and refinement, we demonstrate improved success rates over the benchmark targets from 64% to 73%. Rosetta-MPDock advances the capabilities for membrane protein complex structure prediction and modeling to tackle key biological questions and elucidate functional mechanisms in the membrane environment. The benchmark set and the code is available for public use at github.com/Graylab/MPDock.
Collapse
Affiliation(s)
- Rituparna Samanta
- Department of Chemical and Biomolecular Engineering, The Johns Hopkins University, Baltimore, MD 21218, USA
- Current affiliation: University of South Florida, Tampa, FL, USA
| | - Ameya Harmalkar
- Department of Chemical and Biomolecular Engineering, The Johns Hopkins University, Baltimore, MD 21218, USA
- Current affiliation: Generate Biomedicines Inc., Cambridge, MA, USA
| | - Priyamvada Prathima
- Department of Chemical and Biomolecular Engineering, The Johns Hopkins University, Baltimore, MD 21218, USA
- Current affiliation: Department of Immunology, Blavatnik Institute, Harvard Medical School, Boston, MA, USA
| | - Jeffrey J. Gray
- Department of Chemical and Biomolecular Engineering, The Johns Hopkins University, Baltimore, MD 21218, USA
| |
Collapse
|
30
|
Pegoraro M, Dominé C, Rodolà E, Veličković P, Deac A. Geometric epitope and paratope prediction. Bioinformatics 2024; 40:btae405. [PMID: 38984742 PMCID: PMC11245313 DOI: 10.1093/bioinformatics/btae405] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2023] [Revised: 05/14/2024] [Accepted: 07/09/2024] [Indexed: 07/11/2024] Open
Abstract
MOTIVATION Identifying the binding sites of antibodies is essential for developing vaccines and synthetic antibodies. In this article, we investigate the optimal representation for predicting the binding sites in the two molecules and emphasize the importance of geometric information. RESULTS Specifically, we compare different geometric deep learning methods applied to proteins' inner (I-GEP) and outer (O-GEP) structures. We incorporate 3D coordinates and spectral geometric descriptors as input features to fully leverage the geometric information. Our research suggests that different geometrical representation information is useful for different tasks. Surface-based models are more efficient in predicting the binding of the epitope, while graph models are better in paratope prediction, both achieving significant performance improvements. Moreover, we analyze the impact of structural changes in antibodies and antigens resulting from conformational rearrangements or reconstruction errors. Through this investigation, we showcase the robustness of geometric deep learning methods and spectral geometric descriptors to such perturbations. AVAILABILITY AND IMPLEMENTATION The python code for the models, together with the data and the processing pipeline, is open-source and available at https://github.com/Marco-Peg/GEP.
Collapse
Affiliation(s)
- Marco Pegoraro
- Department of Computer Science, Sapienza University of Rome, 00185, Italy
| | - Clémentine Dominé
- Gatsby Computational Neuroscience Unit, University College London, W1T 4JG, United-Kingdom
| | - Emanuele Rodolà
- Department of Computer Science, Sapienza University of Rome, 00185, Italy
| | | | - Andreea Deac
- Département d’informatique et de recherche opérationelle, Université de Montréal, QC H2S 3H1, Canada
| |
Collapse
|
31
|
Kousaka S, Ishikawa T. Quantum Chemistry-Based Protein-Protein Docking without Empirical Parameters. J Chem Theory Comput 2024; 20:5164-5175. [PMID: 38845143 DOI: 10.1021/acs.jctc.4c00531] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/26/2024]
Abstract
This study developed a novel protein-protein docking approach based on quantum chemistry. To judge the appropriateness of complex structures, we introduced two criterion values, EV1 and EV2, computed using the fragment molecular orbital method without any empirical parameters. These criterion values enable us to search complex structures in which patterns of the electrostatic potential of the two proteins are optimally aligned at their interface. The performance of our method was validated using 53 complexes in a benchmark set provided for protein-protein docking. When employing bound state structures, docking success rates reached 64% for EV1 and 76% for EV2. On the other hand, when employing unbound state structures, docking success rates reached 13% for EV1 and 17% for EV2.
Collapse
Affiliation(s)
- Sumire Kousaka
- Department of Chemistry, Biotechnology, and Chemical Engineering, Graduate School of Science and Engineering, Kagoshima University, 1-21-40 Korimoto, Kagoshima 890-0065, Japan
| | - Takeshi Ishikawa
- Department of Chemistry, Biotechnology, and Chemical Engineering, Graduate School of Science and Engineering, Kagoshima University, 1-21-40 Korimoto, Kagoshima 890-0065, Japan
| |
Collapse
|
32
|
Zhao N, Wu T, Wang W, Zhang L, Gong X. Review and Comparative Analysis of Methods and Advancements in Predicting Protein Complex Structure. Interdiscip Sci 2024; 16:261-288. [PMID: 38955920 DOI: 10.1007/s12539-024-00626-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2023] [Revised: 02/29/2024] [Accepted: 03/01/2024] [Indexed: 07/04/2024]
Abstract
Protein complexes perform diverse biological functions, and obtaining their three-dimensional structure is critical to understanding and grasping their functions. In many cases, it's not just two proteins interacting to form a dimer; instead, multiple proteins interact to form a multimer. Experimentally resolving protein complex structures can be quite challenging. Recently, there have been efforts and methods that build upon prior predictions of dimer structures to attempt to predict multimer structures. However, in comparison to monomeric protein structure prediction, the accuracy of protein complex structure prediction remains relatively low. This paper provides an overview of recent advancements in efficient computational models for predicting protein complex structures. We introduce protein-protein docking methods in detail and summarize their main ideas, applicable modes, and related information. To enhance prediction accuracy, other critical protein-related information is also integrated, such as predicting interchain residue contact, utilizing experimental data like cryo-EM experiments, and considering protein interactions and non-interactions. In addition, we comprehensively review computational approaches for end-to-end prediction of protein complex structures based on artificial intelligence (AI) technology and describe commonly used datasets and representative evaluation metrics in protein complexes. Finally, we analyze the formidable challenges faced in current protein complex structure prediction tasks, including the structure prediction of heteromeric complex, disordered regions in complex, antibody-antigen complex, and RNA-related complex, as well as the evaluation metrics for complex assessment. We hope that this work will provide comprehensive knowledge of complex structure predictions to contribute to future advanced predictions.
Collapse
Affiliation(s)
- Nan Zhao
- Institute for Mathematical Sciences, Renmin University of China, Beijing, 100872, China
- School of Mathematics, Renmin University of China, Beijing, 100872, China
| | - Tong Wu
- Institute for Mathematical Sciences, Renmin University of China, Beijing, 100872, China
- School of Mathematics, Renmin University of China, Beijing, 100872, China
| | - Wenda Wang
- Institute for Mathematical Sciences, Renmin University of China, Beijing, 100872, China
- School of Mathematics, Renmin University of China, Beijing, 100872, China
| | - Lunchuan Zhang
- School of Mathematics, Renmin University of China, Beijing, 100872, China.
| | - Xinqi Gong
- Institute for Mathematical Sciences, Renmin University of China, Beijing, 100872, China.
- School of Mathematics, Renmin University of China, Beijing, 100872, China.
- Beijing Academy of Artificial Intelligence, Beijing, 100084, China.
| |
Collapse
|
33
|
Joubbi S, Micheli A, Milazzo P, Maccari G, Ciano G, Cardamone D, Medini D. Antibody design using deep learning: from sequence and structure design to affinity maturation. Brief Bioinform 2024; 25:bbae307. [PMID: 38960409 PMCID: PMC11221890 DOI: 10.1093/bib/bbae307] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2024] [Revised: 05/20/2024] [Accepted: 06/12/2024] [Indexed: 07/05/2024] Open
Abstract
Deep learning has achieved impressive results in various fields such as computer vision and natural language processing, making it a powerful tool in biology. Its applications now encompass cellular image classification, genomic studies and drug discovery. While drug development traditionally focused deep learning applications on small molecules, recent innovations have incorporated it in the discovery and development of biological molecules, particularly antibodies. Researchers have devised novel techniques to streamline antibody development, combining in vitro and in silico methods. In particular, computational power expedites lead candidate generation, scaling and potential antibody development against complex antigens. This survey highlights significant advancements in protein design and optimization, specifically focusing on antibodies. This includes various aspects such as design, folding, antibody-antigen interactions docking and affinity maturation.
Collapse
Affiliation(s)
- Sara Joubbi
- Department of Computer Science, University of Pisa, Largo B. Pontecorvo, 3, 56127, Pisa, Italy
- Data Science for Health (DaScH) Lab, Fondazione Toscana Life Sciences, Via Fiorentina, 1, 53100, Siena, Italy
| | - Alessio Micheli
- Department of Computer Science, University of Pisa, Largo B. Pontecorvo, 3, 56127, Pisa, Italy
| | - Paolo Milazzo
- Department of Computer Science, University of Pisa, Largo B. Pontecorvo, 3, 56127, Pisa, Italy
| | - Giuseppe Maccari
- Data Science for Health (DaScH) Lab, Fondazione Toscana Life Sciences, Via Fiorentina, 1, 53100, Siena, Italy
| | - Giorgio Ciano
- Data Science for Health (DaScH) Lab, Fondazione Toscana Life Sciences, Via Fiorentina, 1, 53100, Siena, Italy
| | - Dario Cardamone
- Data Science for Health (DaScH) Lab, Fondazione Toscana Life Sciences, Via Fiorentina, 1, 53100, Siena, Italy
| | - Duccio Medini
- Data Science for Health (DaScH) Lab, Fondazione Toscana Life Sciences, Via Fiorentina, 1, 53100, Siena, Italy
| |
Collapse
|
34
|
Chen X, Liu J, Park N, Cheng J. A Survey of Deep Learning Methods for Estimating the Accuracy of Protein Quaternary Structure Models. Biomolecules 2024; 14:574. [PMID: 38785981 PMCID: PMC11117562 DOI: 10.3390/biom14050574] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2024] [Revised: 04/07/2024] [Accepted: 05/09/2024] [Indexed: 05/25/2024] Open
Abstract
The quality prediction of quaternary structure models of a protein complex, in the absence of its true structure, is known as the Estimation of Model Accuracy (EMA). EMA is useful for ranking predicted protein complex structures and using them appropriately in biomedical research, such as protein-protein interaction studies, protein design, and drug discovery. With the advent of more accurate protein complex (multimer) prediction tools, such as AlphaFold2-Multimer and ESMFold, the estimation of the accuracy of protein complex structures has attracted increasing attention. Many deep learning methods have been developed to tackle this problem; however, there is a noticeable absence of a comprehensive overview of these methods to facilitate future development. Addressing this gap, we present a review of deep learning EMA methods for protein complex structures developed in the past several years, analyzing their methodologies, data and feature construction. We also provide a prospective summary of some potential new developments for further improving the accuracy of the EMA methods.
Collapse
Affiliation(s)
- Xiao Chen
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO 65211, USA
| | - Jian Liu
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO 65211, USA
- NextGen Precision Health Institute, University of Missouri, Columbia, MO 65211, USA
| | - Nolan Park
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO 65211, USA
| | - Jianlin Cheng
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO 65211, USA
- NextGen Precision Health Institute, University of Missouri, Columbia, MO 65211, USA
| |
Collapse
|
35
|
Graef J, Ehrt C, Reim T, Rarey M. Database-Driven Identification of Structurally Similar Protein-Protein Interfaces. J Chem Inf Model 2024; 64:3332-3349. [PMID: 38470439 PMCID: PMC11040719 DOI: 10.1021/acs.jcim.3c01462] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2023] [Revised: 02/26/2024] [Accepted: 02/26/2024] [Indexed: 03/13/2024]
Abstract
Analyzing the similarity of protein interfaces in protein-protein interactions gives new insights into protein function and assists in discovering new drugs. Usually, tools that assess the similarity focus on the interactions between two protein interfaces, while sometimes we only have one predicted interface. Herein, we present PiMine, a database-driven protein interface similarity search. It compares interface residues of one or two interacting chains by calculating and searching tetrahedral geometric patterns of α-carbon atoms and calculating physicochemical and shape-based similarity. On a dedicated, tailor-made dataset, we show that PiMine outperforms commonly used comparison tools in terms of early enrichment when considering interfaces of sequentially and structurally unrelated proteins. In an application example, we demonstrate its usability for protein interaction partner prediction by comparing predicted interfaces to known protein-protein interfaces.
Collapse
Affiliation(s)
- Joel Graef
- Universität Hamburg, ZBH—Center
for Bioinformatics , Albert-Einstein-Ring 8-10, 22761 Hamburg, Germany
| | - Christiane Ehrt
- Universität Hamburg, ZBH—Center
for Bioinformatics , Albert-Einstein-Ring 8-10, 22761 Hamburg, Germany
| | - Thorben Reim
- Universität Hamburg, ZBH—Center
for Bioinformatics , Albert-Einstein-Ring 8-10, 22761 Hamburg, Germany
| | - Matthias Rarey
- Universität Hamburg, ZBH—Center
for Bioinformatics , Albert-Einstein-Ring 8-10, 22761 Hamburg, Germany
| |
Collapse
|
36
|
Ovek D, Keskin O, Gursoy A. ProInterVal: Validation of Protein-Protein Interfaces through Learned Interface Representations. J Chem Inf Model 2024; 64:2979-2987. [PMID: 38526504 PMCID: PMC11040718 DOI: 10.1021/acs.jcim.3c01788] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2023] [Revised: 02/21/2024] [Accepted: 02/22/2024] [Indexed: 03/26/2024]
Abstract
Proteins are vital components of the biological world and serve a multitude of functions. They interact with other molecules through their interfaces and participate in crucial cellular processes. Disruption of these interactions can have negative effects on organisms, highlighting the importance of studying protein-protein interfaces for developing targeted therapies for diseases. Therefore, the development of a reliable method for investigating protein-protein interactions is of paramount importance. In this work, we present an approach for validating protein-protein interfaces using learned interface representations. The approach involves using a graph-based contrastive autoencoder architecture and a transformer to learn representations of protein-protein interaction interfaces from unlabeled data and then validating them through learned representations with a graph neural network. Our method achieves an accuracy of 0.91 for the test set, outperforming existing GNN-based methods. We demonstrate the effectiveness of our approach on a benchmark data set and show that it provides a promising solution for validating protein-protein interfaces.
Collapse
Affiliation(s)
- Damla Ovek
- KUIS
AI Center, Koç University, Istanbul 34450, Turkey
- Computer
Engineering, Koç University, Istanbul 34450, Turkey
| | - Ozlem Keskin
- Chemical
and Biological Engineering, Koç University, Istanbul 34450, Turkey
| | - Attila Gursoy
- Computer
Engineering, Koç University, Istanbul 34450, Turkey
| |
Collapse
|
37
|
Jia P, Zhang F, Wu C, Li M. A comprehensive review of protein-centric predictors for biomolecular interactions: from proteins to nucleic acids and beyond. Brief Bioinform 2024; 25:bbae162. [PMID: 38739759 PMCID: PMC11089422 DOI: 10.1093/bib/bbae162] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/01/2024] [Revised: 02/17/2024] [Accepted: 03/31/2024] [Indexed: 05/16/2024] Open
Abstract
Proteins interact with diverse ligands to perform a large number of biological functions, such as gene expression and signal transduction. Accurate identification of these protein-ligand interactions is crucial to the understanding of molecular mechanisms and the development of new drugs. However, traditional biological experiments are time-consuming and expensive. With the development of high-throughput technologies, an increasing amount of protein data is available. In the past decades, many computational methods have been developed to predict protein-ligand interactions. Here, we review a comprehensive set of over 160 protein-ligand interaction predictors, which cover protein-protein, protein-nucleic acid, protein-peptide and protein-other ligands (nucleotide, heme, ion) interactions. We have carried out a comprehensive analysis of the above four types of predictors from several significant perspectives, including their inputs, feature profiles, models, availability, etc. The current methods primarily rely on protein sequences, especially utilizing evolutionary information. The significant improvement in predictions is attributed to deep learning methods. Additionally, sequence-based pretrained models and structure-based approaches are emerging as new trends.
Collapse
Affiliation(s)
- Pengzhen Jia
- School of Computer Science and Engineering, Central South University, 932 Lushan Road(S), Changsha 410083, China
| | - Fuhao Zhang
- School of Computer Science and Engineering, Central South University, 932 Lushan Road(S), Changsha 410083, China
- College of Information Engineering, Northwest A&F University, No. 3 Taicheng Road, Yangling, Shaanxi 712100, China
| | - Chaojin Wu
- School of Computer Science and Engineering, Central South University, 932 Lushan Road(S), Changsha 410083, China
| | - Min Li
- School of Computer Science and Engineering, Central South University, 932 Lushan Road(S), Changsha 410083, China
| |
Collapse
|
38
|
Stary-Weinzinger A. In silico models of the macromolecular Na V1.5-K IR2.1 complex. Front Physiol 2024; 15:1362964. [PMID: 38468705 PMCID: PMC10925717 DOI: 10.3389/fphys.2024.1362964] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2023] [Accepted: 02/07/2024] [Indexed: 03/13/2024] Open
Abstract
In cardiac cells, the expression of the cardiac voltage-gated Na+ channel (NaV1.5) is reciprocally regulated with the inward rectifying K+ channel (KIR2.1). These channels can form macromolecular complexes that pre-assemble early during forward trafficking (transport to the cell membrane). In this study, we present in silico 3D models of NaV1.5-KIR2.1, generated by rigid-body protein-protein docking programs and deep learning-based AlphaFold-Multimer software. Modeling revealed that the two channels could physically interact with each other along the entire transmembrane region. Structural mapping of disease-associated mutations revealed a hotspot at this interface with several trafficking-deficient variants in close proximity. Thus, examining the role of disease-causing variants is important not only in isolated channels but also in the context of macromolecular complexes. These findings may contribute to a better understanding of the life-threatening cardiovascular diseases underlying KIR2.1 and NaV1.5 malfunctions.
Collapse
Affiliation(s)
- Anna Stary-Weinzinger
- Division of Pharmacology and Toxicology, Department of Pharmaceutical Sciences, University of Vienna, Vienna, Austria
| |
Collapse
|
39
|
Chu L, Ruffolo JA, Harmalkar A, Gray JJ. Flexible protein-protein docking with a multitrack iterative transformer. Protein Sci 2024; 33:e4862. [PMID: 38148272 PMCID: PMC10804679 DOI: 10.1002/pro.4862] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2023] [Revised: 11/17/2023] [Accepted: 12/06/2023] [Indexed: 12/28/2023]
Abstract
Conventional protein-protein docking algorithms usually rely on heavy candidate sampling and reranking, but these steps are time-consuming and hinder applications that require high-throughput complex structure prediction, for example, structure-based virtual screening. Existing deep learning methods for protein-protein docking, despite being much faster, suffer from low docking success rates. In addition, they simplify the problem to assume no conformational changes within any protein upon binding (rigid docking). This assumption precludes applications when binding-induced conformational changes play a role, such as allosteric inhibition or docking from uncertain unbound model structures. To address these limitations, we present GeoDock, a multitrack iterative transformer network to predict a docked structure from separate docking partners. Unlike deep learning models for protein structure prediction that input multiple sequence alignments, GeoDock inputs just the sequences and structures of the docking partners, which suits the tasks when the individual structures are given. GeoDock is flexible at the protein residue level, allowing the prediction of conformational changes upon binding. On the Database of Interacting Protein Structures (DIPS) test set, GeoDock achieves a 43% top-1 success rate, outperforming all other tested methods. However, in the standard DIPS train/test splits, we discovered contamination of close homologs in the training set. After decontaminating the training set, the success rate is 31%. On the DB5.5 test set and a benchmark dataset of antibody-antigen complexes, GeoDock outperforms the deep learning models trained using the same dataset but falls behind most of the conventional methods and AlphaFold-Multimer. GeoDock attains an average inference speed of under 1 s on a single GPU, enabling its application in large-scale structure screening. Although binding-induced conformational changes are still a challenge owing to limited training and evaluation data, our architecture sets up the foundation to capture this backbone flexibility. Code and a demonstration Jupyter notebook are available at https://github.com/Graylab/GeoDock.
Collapse
Affiliation(s)
- Lee‐Shin Chu
- Department of Chemical and Biomolecular EngineeringJohns Hopkins UniversityBaltimoreMarylandUSA
| | - Jeffrey A. Ruffolo
- Program in Molecular BiophysicsJohns Hopkins UniversityBaltimoreMarylandUSA
| | - Ameya Harmalkar
- Department of Chemical and Biomolecular EngineeringJohns Hopkins UniversityBaltimoreMarylandUSA
| | - Jeffrey J. Gray
- Department of Chemical and Biomolecular EngineeringJohns Hopkins UniversityBaltimoreMarylandUSA
- Program in Molecular BiophysicsJohns Hopkins UniversityBaltimoreMarylandUSA
| |
Collapse
|
40
|
Zhao N, Han B, Zhao C, Xu J, Gong X. ABAG-docking benchmark: a non-redundant structure benchmark dataset for antibody-antigen computational docking. Brief Bioinform 2024; 25:bbae048. [PMID: 38385879 PMCID: PMC10883643 DOI: 10.1093/bib/bbae048] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2023] [Revised: 01/05/2024] [Accepted: 01/15/2024] [Indexed: 02/23/2024] Open
Abstract
Accurate prediction of antibody-antigen complex structures is pivotal in drug discovery, vaccine design and disease treatment and can facilitate the development of more effective therapies and diagnostics. In this work, we first review the antibody-antigen docking (ABAG-docking) datasets. Then, we present the creation and characterization of a comprehensive benchmark dataset of antibody-antigen complexes. We categorize the dataset based on docking difficulty, interface properties and structural characteristics, to provide a diverse set of cases for rigorous evaluation. Compared with Docking Benchmark 5.5, we have added 112 cases, including 14 single-domain antibody (sdAb) cases and 98 monoclonal antibody (mAb) cases, and also increased the proportion of Difficult cases. Our dataset contains diverse cases, including human/humanized antibodies, sdAbs, rodent antibodies and other types, opening the door to better algorithm development. Furthermore, we provide details on the process of building the benchmark dataset and introduce a pipeline for periodic updates to keep it up to date. We also utilize multiple complex prediction methods including ZDOCK, ClusPro, HDOCK and AlphaFold-Multimer for testing and analyzing this dataset. This benchmark serves as a valuable resource for evaluating and advancing docking computational methods in the analysis of antibody-antigen interaction, enabling researchers to develop more accurate and effective tools for predicting and designing antibody-antigen complexes. The non-redundant ABAG-docking structure benchmark dataset is available at https://github.com/Zhaonan99/Antibody-antigen-complex-structure-benchmark-dataset.
Collapse
Affiliation(s)
- Nan Zhao
- Institute for Mathematical Sciences, School of Mathematics, Renmin University of China, Beijing, China
| | - Bingqing Han
- Institute for Mathematical Sciences, School of Mathematics, Renmin University of China, Beijing, China
| | - Cuicui Zhao
- Institute for Mathematical Sciences, School of Mathematics, Renmin University of China, Beijing, China
| | - Jinbo Xu
- MoleculeMind Ltd., Beijing, China
| | - Xinqi Gong
- Institute for Mathematical Sciences, School of Mathematics, Renmin University of China, Beijing, China
- Beijing Academy of Artificial Intelligence, Beijing, China
| |
Collapse
|
41
|
Giulini M, Honorato RV, Rivera JL, Bonvin AMJJ. ARCTIC-3D: automatic retrieval and clustering of interfaces in complexes from 3D structural information. Commun Biol 2024; 7:49. [PMID: 38184711 PMCID: PMC10771469 DOI: 10.1038/s42003-023-05718-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2023] [Accepted: 12/18/2023] [Indexed: 01/08/2024] Open
Abstract
The formation of a stable complex between proteins lies at the core of a wide variety of biological processes and has been the focus of countless experiments. The huge amount of information contained in the protein structural interactome in the Protein Data Bank can now be used to characterise and classify the existing biological interfaces. We here introduce ARCTIC-3D, a fast and user-friendly data mining and clustering software to retrieve data and rationalise the interface information associated with the protein input data. We demonstrate its use by various examples ranging from showing the increased interaction complexity of eukaryotic proteins, 20% of which on average have more than 3 different interfaces compared to only 10% for prokaryotes, to associating different functions to different interfaces. In the context of modelling biomolecular assemblies, we introduce the concept of "recognition entropy", related to the number of possible interfaces of the components of a protein-protein complex, which we demonstrate to correlate with the modelling difficulty in classical docking approaches. The identified interface clusters can also be used to generate various combinations of interface-specific restraints for integrative modelling. The ARCTIC-3D software is freely available at github.com/haddocking/arctic3d and can be accessed as a web-service at wenmr.science.uu.nl/arctic3d.
Collapse
Affiliation(s)
- Marco Giulini
- Bijvoet Centre for Biomolecular Research, Faculty of Science - Chemistry, Utrecht University, Padualaan 8, 3584, Utrecht, CH, The Netherlands
| | - Rodrigo V Honorato
- Bijvoet Centre for Biomolecular Research, Faculty of Science - Chemistry, Utrecht University, Padualaan 8, 3584, Utrecht, CH, The Netherlands
| | - Jesús L Rivera
- Bijvoet Centre for Biomolecular Research, Faculty of Science - Chemistry, Utrecht University, Padualaan 8, 3584, Utrecht, CH, The Netherlands
| | - Alexandre M J J Bonvin
- Bijvoet Centre for Biomolecular Research, Faculty of Science - Chemistry, Utrecht University, Padualaan 8, 3584, Utrecht, CH, The Netherlands.
| |
Collapse
|
42
|
Xu X, Bonvin AMJJ. DeepRank-GNN-esm: a graph neural network for scoring protein-protein models using protein language model. BIOINFORMATICS ADVANCES 2024; 4:vbad191. [PMID: 38213822 PMCID: PMC10782804 DOI: 10.1093/bioadv/vbad191] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/04/2023] [Revised: 12/19/2023] [Indexed: 01/13/2024]
Abstract
Motivation Protein-Protein interactions (PPIs) play critical roles in numerous cellular processes. By modelling the 3D structures of the correspond protein complexes valuable insights can be obtained, providing, e.g. starting points for drug and protein design. One challenge in the modelling process is however the identification of near-native models from the large pool of generated models. To this end we have previously developed DeepRank-GNN, a graph neural network that integrates structural and sequence information to enable effective pattern learning at PPI interfaces. Its main features are related to the Position Specific Scoring Matrices (PSSMs), which are computationally expensive to generate, significantly limits the algorithm's usability. Results We introduce here DeepRank-GNN-esm that includes as additional features protein language model embeddings from the ESM-2 model. We show that the ESM-2 embeddings can actually replace the PSSM features at no cost in-, or even better performance on two PPI-related tasks: scoring docking poses and detecting crystal artifacts. This new DeepRank version bypasses thus the need of generating PSSM, greatly improving the usability of the software and opening new application opportunities for systems for which PSSM profiles cannot be obtained or are irrelevant (e.g. antibody-antigen complexes). Availability and implementation DeepRank-GNN-esm is freely available from https://github.com/DeepRank/DeepRank-GNN-esm.
Collapse
Affiliation(s)
- Xiaotong Xu
- Department of Chemistry, Faculty of Science, Computational Structural Biology Group, Bijvoet Centre for Biomolecular Research, Utrecht 3584 CS, The Netherlands
| | - Alexandre M J J Bonvin
- Department of Chemistry, Faculty of Science, Computational Structural Biology Group, Bijvoet Centre for Biomolecular Research, Utrecht 3584 CS, The Netherlands
| |
Collapse
|
43
|
Zhang S, Han J, Liu J. Protein-protein and protein-nucleic acid binding site prediction via interpretable hierarchical geometric deep learning. Gigascience 2024; 13:giae080. [PMID: 39484977 PMCID: PMC11528319 DOI: 10.1093/gigascience/giae080] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2024] [Revised: 08/29/2024] [Accepted: 09/25/2024] [Indexed: 11/03/2024] Open
Abstract
Identification of protein-protein and protein-nucleic acid binding sites provides insights into biological processes related to protein functions and technical guidance for disease diagnosis and drug design. However, accurate predictions by computational approaches remain highly challenging due to the limited knowledge of residue binding patterns. The binding pattern of a residue should be characterized by the spatial distribution of its neighboring residues combined with their physicochemical information interaction, which yet cannot be achieved by previous methods. Here, we design GraphRBF, a hierarchical geometric deep learning model to learn residue binding patterns from big data. To achieve it, GraphRBF describes physicochemical information interactions by designing an enhanced graph neural network and characterizes residue spatial distributions by introducing a prioritized radial basis function neural network. After training and testing, GraphRBF shows great improvements over existing state-of-the-art methods and strong interpretability of its learned representations. Applying GraphRBF to the SARS-CoV-2 omicron spike protein, it successfully identifies known epitopes of the protein. Moreover, it predicts multiple potential binding regions for new nanobodies or even new drugs with strong evidence. A user-friendly online server for GraphRBF is freely available at http://liulab.top/GraphRBF/server.
Collapse
Affiliation(s)
- Shizhuo Zhang
- School of Mathematics and Statistics, Shandong University (Weihai), Weihai 264209, China
| | - Jiyun Han
- School of Mathematics and Statistics, Shandong University (Weihai), Weihai 264209, China
| | - Juntao Liu
- School of Mathematics and Statistics, Shandong University (Weihai), Weihai 264209, China
| |
Collapse
|
44
|
Kuder KJ. Docking Foundations: From Rigid to Flexible Docking. Methods Mol Biol 2024; 2780:3-14. [PMID: 38987460 DOI: 10.1007/978-1-0716-3985-6_1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/12/2024]
Abstract
Despite the development of methods for the experimental determination of protein structures, the dissonance between the number of known sequences and their solved structures is still enormous. This is particularly evident in protein-protein complexes. To fill this gap, diverse technologies have been developed to study protein-protein interactions (PPIs) in a cellular context including a range of biological and computational methods. The latter derive from techniques originally published and applied almost half a century ago and are based on interdisciplinary knowledge from the nexus of the fields of biology, chemistry, and physics about protein sequences, structures, and their folding. Protein-protein docking, the main protagonist of this chapter, is routinely treated as an integral part of protein research. Herein, we describe the basic foundations of the whole process in general terms, but step by step from protein representations through docking methods and evaluation of complexes to their final validation.
Collapse
Affiliation(s)
- Kamil J Kuder
- Department of Technology and Biotechnology of Drugs, Faculty of Pharmacy, Jagiellonian University Medical College, Kraków, Poland.
| |
Collapse
|
45
|
Kiani YS, Jabeen I. Challenges of Protein-Protein Docking of the Membrane Proteins. Methods Mol Biol 2024; 2780:203-255. [PMID: 38987471 DOI: 10.1007/978-1-0716-3985-6_12] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/12/2024]
Abstract
Despite the recent advances in the determination of high-resolution membrane protein (MP) structures, the structural and functional characterization of MPs remains extremely challenging, mainly due to the hydrophobic nature, low abundance, poor expression, purification, and crystallization difficulties associated with MPs. Whereby the major challenges/hurdles for MP structure determination are associated with the expression, purification, and crystallization procedures. Although there have been significant advances in the experimental determination of MP structures, only a limited number of MP structures (approximately less than 1% of all) are available in the Protein Data Bank (PDB). Therefore, the structures of a large number of MPs still remain unresolved, which leads to the availability of widely unplumbed structural and functional information related to MPs. As a result, recent developments in the drug discovery realm and the significant biological contemplation have led to the development of several novel, low-cost, and time-efficient computational methods that overcome the limitations of experimental approaches, supplement experiments, and provide alternatives for the characterization of MPs. Whereby the fine tuning and optimizations of these computational approaches remains an ongoing endeavor.Computational methods offer a potential way for the elucidation of structural features and the augmentation of currently available MP information. However, the use of computational modeling can be extremely challenging for MPs mainly due to insufficient knowledge of (or gaps in) atomic structures of MPs. Despite the availability of numerous in silico methods for 3D structure determination the applicability of these methods to MPs remains relatively low since all methods are not well-suited or adequate for MPs. However, sophisticated methods for MP structure predictions are constantly being developed and updated to integrate the modifications required for MPs. Currently, different computational methods for (1) MP structure prediction, (2) stability analysis of MPs through molecular dynamics simulations, (3) modeling of MP complexes through docking, (4) prediction of interactions between MPs, and (5) MP interactions with its soluble partner are extensively used. Towards this end, MP docking is widely used. It is notable that the MP docking methods yet few in number might show greater potential in terms of filling the knowledge gap. In this chapter, MP docking methods and associated challenges have been reviewed to improve the applicability, accuracy, and the ability to model macromolecular complexes.
Collapse
Affiliation(s)
- Yusra Sajid Kiani
- School of Interdisciplinary Engineering and Sciences (SINES), National University of Sciences and Technology (NUST), Islamabad, Pakistan
| | - Ishrat Jabeen
- School of Interdisciplinary Engineering and Sciences (SINES), National University of Sciences and Technology (NUST), Islamabad, Pakistan.
| |
Collapse
|
46
|
Jarończyk M. Software for Predicting Binding Free Energy of Protein-Protein Complexes and Their Mutants. Methods Mol Biol 2024; 2780:139-147. [PMID: 38987468 DOI: 10.1007/978-1-0716-3985-6_9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/12/2024]
Abstract
Protein-protein binding affinity prediction is important for understanding complex biochemical pathways and to uncover protein interaction networks. Quantitative estimation of the binding affinity changes caused by mutations can provide critical information for protein function annotation and genetic disease diagnoses. The binding free energies of protein-protein complexes can be predicted using several computational tools. This chapter is a summary of software developed for the prediction of binding free energies for protein-protein complexes and their mutants.
Collapse
|
47
|
Yin R, Pierce BG. Evaluation of AlphaFold antibody-antigen modeling with implications for improving predictive accuracy. Protein Sci 2024; 33:e4865. [PMID: 38073135 PMCID: PMC10751731 DOI: 10.1002/pro.4865] [Citation(s) in RCA: 35] [Impact Index Per Article: 35.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2023] [Revised: 12/01/2023] [Accepted: 12/07/2023] [Indexed: 12/26/2023]
Abstract
High resolution antibody-antigen structures provide critical insights into immune recognition and can inform therapeutic design. The challenges of experimental structural determination and the diversity of the immune repertoire underscore the necessity of accurate computational tools for modeling antibody-antigen complexes. Initial benchmarking showed that despite overall success in modeling protein-protein complexes, AlphaFold and AlphaFold-Multimer have limited success in modeling antibody-antigen interactions. In this study, we performed a thorough analysis of AlphaFold's antibody-antigen modeling performance on 427 nonredundant antibody-antigen complex structures, identifying useful confidence metrics for predicting model quality, and features of complexes associated with improved modeling success. Notably, we found that the latest version of AlphaFold improves near-native modeling success to over 30%, versus approximately 20% for a previous version, while increased AlphaFold sampling gives approximately 50% success. With this improved success, AlphaFold can generate accurate antibody-antigen models in many cases, while additional training or other optimization may further improve performance.
Collapse
Affiliation(s)
- Rui Yin
- University of Maryland Institute for Bioscience and Biotechnology ResearchRockvilleMarylandUSA
- Department of Cell Biology and Molecular GeneticsUniversity of MarylandCollege ParkMarylandUSA
| | - Brian G. Pierce
- University of Maryland Institute for Bioscience and Biotechnology ResearchRockvilleMarylandUSA
- Department of Cell Biology and Molecular GeneticsUniversity of MarylandCollege ParkMarylandUSA
| |
Collapse
|
48
|
Meng Q, Guo F, Wang E, Tang J. ComDock: A novel approach for protein-protein docking with an efficient fusing strategy. Comput Biol Med 2023; 167:107660. [PMID: 37944303 DOI: 10.1016/j.compbiomed.2023.107660] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2023] [Revised: 10/08/2023] [Accepted: 10/31/2023] [Indexed: 11/12/2023]
Abstract
Protein-protein interaction plays an important role in studying the mechanism of protein functions from the structural perspective. Molecular docking is a powerful approach to detect protein-protein complexes using computational tools, due to the high cost and time-consuming of the traditional experimental methods. Among existing technologies, the template-based method utilizes the structural information of known homologous 3D complexes as available and reliable templates to achieve high accuracy and low computational complexity. However, the performance of the template-based method depends on the quality and quantity of templates. When insufficient or even no templates, the ab initio docking method is necessary and largely enriches the docking conformations. Therefore, it's a feasible strategy to fuse the effectivity of the template-based model and the universality of ab initio model to improve the docking performance. In this study, we construct a new, diverse, comprehensive template library derived from PDB, containing 77,685 complexes. We propose a template-based method (named TemDock), which retrieves the evolutionary relationship between the target sequence and samples in the template library and transfers similar structural information. Then, the target structure is built by superposing on the homologous template complex with TM-align. Moreover, we develop a consensus-based method (named ComDock) to integrate our TemDock and an existing ab initio method (ZDOCK). On 105 targets with templates from Benchmark 5.0, the TemDock and ComDock achieve a success rate of 68.57 % and 71.43 % in the top 10 conformations, respectively. Compared with the HDOCK, ComDock obtains better I-RMSD of hit configurations on 9 targets and more hit models in the top 100 conformations. As an efficient method for protein-protein docking, the ComDock is expected to study protein-protein recognition and reveal the various biological passways that are critical for developing drug discovery. The final results are stored at https://github.com/guofei-tju/mqz_ComDock_docking.
Collapse
Affiliation(s)
- Qiaozhen Meng
- College of Intelligence and Computing, Tianjin University, Tianjin, China
| | - Fei Guo
- School of Computer Science and Engineering, Central South University, Changsha, China.
| | - Ercheng Wang
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang, China; Zhejiang Laboratory, Hangzhou, Zhejiang, China.
| | - Jijun Tang
- Shenzhen Institute of Advanced Technology of Chinese Academy of Sciences, Shenzhen, China.
| |
Collapse
|
49
|
Onyango OH, Mwenda CM, Gitau G, Muoma J, Okoth P. In-silico analysis of potent Mosquirix vaccine adjuvant leads. J Genet Eng Biotechnol 2023; 21:155. [PMID: 38032502 PMCID: PMC10689608 DOI: 10.1186/s43141-023-00590-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2023] [Accepted: 11/06/2023] [Indexed: 12/01/2023]
Abstract
BACKGROUND World Health Organization recommend the use of malaria vaccine, Mosquirix, as a malaria prevention strategy. However, Mosquirix has failed to reduce the global burden of malaria because of its inefficacy. The Mosquirix vaccine's modest effectiveness against malaria, 36% among kids aged 5 to 17 months who need at least four doses, fails to aid malaria eradication. Therefore, highly effective and efficacious malaria vaccines are required. The well-characterized P. falciparum circumsporozoite surface protein can be used to discover adjuvants that can increase the efficacy of Mosquirix. Therefore, the study sought to undertake an in-silico discovery of Plasmodium falciparum circumsporozoite surface protein inhibitors with pharmacological properties on Mosquirix using hierarchical virtual screening and molecular dynamics simulation. RESULTS Monoclonal antibody L9, an anti-Plasmodium falciparum circumsporozoite surface protein molecule, was used to identify Plasmodium falciparum circumsporozoite surface protein inhibitors with pharmacological properties on Mosquirix during a virtual screening process in ZINCPHARMER that yielded 23 hits. After drug-likeness and absorption, distribution, metabolism, excretion, and toxicity property analysis in the SwissADME web server, only 9 of the 23 hits satisfied the requirements. The 9 compounds were docked with Plasmodium falciparum circumsporozoite surface protein using the PyRx software to understand their interactions. ZINC25374360 (-8.1 kcal/mol), ZINC40144754 (-8.3 kcal/mol), and ZINC71996727 (-8.9 kcal/mol) bound strongly to Plasmodium falciparum circumsporozoite surface protein with binding affinities of less than -8.0 kcal/mol. The stability of these molecularly docked Plasmodium falciparum circumsporozoite surface protein-inhibitor complexes were assessed through molecular dynamics simulation using GROMACS 2022. ZINC25374360 and ZINC71996727 formed stable complexes with Plasmodium falciparum circumsporozoite surface protein. They were subjected to in vitro validation for their inhibitory potential. The IC50 values ranging between 250 and 350 ng/ml suggest inhibition of parasite development. CONCLUSION Therefore, the two Plasmodium falciparum circumsporozoite surface protein inhibitors can be used as vaccine adjuvants to increase the efficacy of the existing Mosquirix vaccine. Nevertheless, additional in vivo tests, structural optimization studies, and homogenization analysis are essential to determine the anti-plasmodial action of these adjuvants in humans.
Collapse
Affiliation(s)
- Okello Harrison Onyango
- Department of Biological Sciences (Molecular Biology, Computational Biology, and Bioinformatics Section), School of Natural and Applied Sciences, Masinde Muliro University of Science and Technology, P.O. BOX 190-50100, Kakamega, Kenya.
| | - Cynthia Mugo Mwenda
- Department of Biological Sciences, School of Pure and Applied Sciences, Meru University of Science and Technology, P.O. BOX 972-60200, Meru, Kenya
| | - Grace Gitau
- Department of Biochemistry and Biotechnology, School of Biological and Life Sciences, The Technical University of Kenya, P.O. BOX 52428-00200, Nairobi, Kenya
| | - John Muoma
- Department of Biological Sciences (Molecular Biology, Computational Biology, and Bioinformatics Section), School of Natural and Applied Sciences, Masinde Muliro University of Science and Technology, P.O. BOX 190-50100, Kakamega, Kenya
| | - Patrick Okoth
- Department of Biological Sciences (Molecular Biology, Computational Biology, and Bioinformatics Section), School of Natural and Applied Sciences, Masinde Muliro University of Science and Technology, P.O. BOX 190-50100, Kakamega, Kenya
| |
Collapse
|
50
|
Tsishyn M, Pucci F, Rooman M. Quantification of biases in predictions of protein-protein binding affinity changes upon mutations. Brief Bioinform 2023; 25:bbad491. [PMID: 38197311 PMCID: PMC10777193 DOI: 10.1093/bib/bbad491] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2023] [Revised: 10/02/2023] [Accepted: 12/05/2023] [Indexed: 01/11/2024] Open
Abstract
Understanding the impact of mutations on protein-protein binding affinity is a key objective for a wide range of biotechnological applications and for shedding light on disease-causing mutations, which are often located at protein-protein interfaces. Over the past decade, many computational methods using physics-based and/or machine learning approaches have been developed to predict how protein binding affinity changes upon mutations. They all claim to achieve astonishing accuracy on both training and test sets, with performances on standard benchmarks such as SKEMPI 2.0 that seem overly optimistic. Here we benchmarked eight well-known and well-used predictors and identified their biases and dataset dependencies, using not only SKEMPI 2.0 as a test set but also deep mutagenesis data on the severe acute respiratory syndrome coronavirus 2 spike protein in complex with the human angiotensin-converting enzyme 2. We showed that, even though most of the tested methods reach a significant degree of robustness and accuracy, they suffer from limited generalizability properties and struggle to predict unseen mutations. Interestingly, the generalizability problems are more severe for pure machine learning approaches, while physics-based methods are less affected by this issue. Moreover, undesirable prediction biases toward specific mutation properties, the most marked being toward destabilizing mutations, are also observed and should be carefully considered by method developers. We conclude from our analyses that there is room for improvement in the prediction models and suggest ways to check, assess and improve their generalizability and robustness.
Collapse
Affiliation(s)
- Matsvei Tsishyn
- Computational Biology and Bioinformatics, Université Libre de Bruxelles, Roosevelt Ave, 1050, Brussels, Belgium
- Interuniversity Institute of Bioinformatics in Brussels, Brussels, Belgium
| | - Fabrizio Pucci
- Computational Biology and Bioinformatics, Université Libre de Bruxelles, Roosevelt Ave, 1050, Brussels, Belgium
- Interuniversity Institute of Bioinformatics in Brussels, Brussels, Belgium
| | - Marianne Rooman
- Computational Biology and Bioinformatics, Université Libre de Bruxelles, Roosevelt Ave, 1050, Brussels, Belgium
- Interuniversity Institute of Bioinformatics in Brussels, Brussels, Belgium
| |
Collapse
|