1
|
El-Assaad AM, Hamieh T. SARS-CoV-2: Prediction of critical ionic amino acid mutations. Comput Biol Med 2024; 178:108688. [PMID: 38870723 DOI: 10.1016/j.compbiomed.2024.108688] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2023] [Revised: 05/26/2024] [Accepted: 06/01/2024] [Indexed: 06/15/2024]
Abstract
Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), that caused coronavirus disease 2019 (COVID-19), has been studied thoroughly, and several variants are revealed across the world with their corresponding mutations. Studies and vaccines development focus on the genetic mutations of the S protein due to its vital role in allowing the virus attach and fuse with the membrane of a host cell. In this perspective, we study the effects of all ionic amino acid mutations of the SARS-CoV-2 viral spike protein S1 when bound to Antibody CC12.1 within the SARS-CoV-2:CC12.1 complex model. Binding free energy calculations between SARS-CoV-2 and antibody CC12.1 are based on the Analysis of Electrostatic Similarities of Proteins (AESOP) framework, where the electrostatic potentials are calculated using Adaptive Poisson-Boltzmann Solver (APBS). The atomic radii and charges that feed into the APBS calculations are calculated using the PDB2PQR software. Our results are the first to propose in silico potential life-threatening mutations of SARS-CoV-2 beyond the present mutations found in the five common variants worldwide. We find each of the following mutations: K378A, R408A, K424A, R454A, R457A, K458A, and K462A, to play significant roles in the binding to Antibody CC12.1, since they are turned into strong inhibitors on both chains of the S1 protein, whereas the mutations D405A, D420A, and D427A, show to play important roles in this binding, as they are turned into mild inhibitors on both chains of the S1 protein.
Collapse
Affiliation(s)
- Atlal M El-Assaad
- Department of Electrical Engineering & Computer Science, University of Toledo (UT), Toledo OH 43606, USA; Department of Computer Science, Lebanese International University (LIU), Bekaa, Lebanon.
| | - Tayssir Hamieh
- Faculty of Science and Engineering, Maastricht University, P.O. Box 616, 6200 MD Maastricht, the Netherlands; Laboratory of Materials, Catalysis, Environment and Analytical Methods (MCEMA), Faculty of Sciences, Lebanese University, Hadath, Lebanon.
| |
Collapse
|
2
|
Li P, Liu ZP. MuToN Quantifies Binding Affinity Changes upon Protein Mutations by Geometric Deep Learning. ADVANCED SCIENCE (WEINHEIM, BADEN-WURTTEMBERG, GERMANY) 2024:e2402918. [PMID: 38995072 DOI: 10.1002/advs.202402918] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/20/2024] [Revised: 06/04/2024] [Indexed: 07/13/2024]
Abstract
Assessing changes in protein-protein binding affinity due to mutations helps understanding a wide range of crucial biological processes within cells. Despite significant efforts to create accurate computational models, predicting how mutations affect affinity remains challenging due to the complexity of the biological mechanisms involved. In the present work, a geometric deep learning framework called MuToN is introduced for quantifying protein binding affinity change upon residue mutations. The method, designed with geometric attention networks, is mechanism-aware. It captures changes in the protein binding interfaces of mutated complexes and assesses the allosteric effects of amino acids. Experimental results highlight MuToN's superiority compared to existing methods. Additionally, MuToN's flexibility and effectiveness are illustrated by its precise predictions of binding affinity changes between SARS-CoV-2 variants and the ACE2 complex.
Collapse
Affiliation(s)
- Pengpai Li
- Department of Biomedical Engineering, School of Control Science and Engineering, Shandong University, Jinan, Shandong, 250061, China
| | - Zhi-Ping Liu
- Department of Biomedical Engineering, School of Control Science and Engineering, Shandong University, Jinan, Shandong, 250061, China
| |
Collapse
|
3
|
Zhou Y, Myung Y, Rodrigues CM, Ascher D. DDMut-PPI: predicting effects of mutations on protein-protein interactions using graph-based deep learning. Nucleic Acids Res 2024; 52:W207-W214. [PMID: 38783112 PMCID: PMC11223791 DOI: 10.1093/nar/gkae412] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2024] [Revised: 04/30/2024] [Accepted: 05/02/2024] [Indexed: 05/25/2024] Open
Abstract
Protein-protein interactions (PPIs) play a vital role in cellular functions and are essential for therapeutic development and understanding diseases. However, current predictive tools often struggle to balance efficiency and precision in predicting the effects of mutations on these complex interactions. To address this, we present DDMut-PPI, a deep learning model that efficiently and accurately predicts changes in PPI binding free energy upon single and multiple point mutations. Building on the robust Siamese network architecture with graph-based signatures from our prior work, DDMut, the DDMut-PPI model was enhanced with a graph convolutional network operated on the protein interaction interface. We used residue-specific embeddings from ProtT5 protein language model as node features, and a variety of molecular interactions as edge features. By integrating evolutionary context with spatial information, this framework enables DDMut-PPI to achieve a robust Pearson correlation of up to 0.75 (root mean squared error: 1.33 kcal/mol) in our evaluations, outperforming most existing methods. Importantly, the model demonstrated consistent performance across mutations that increase or decrease binding affinity. DDMut-PPI offers a significant advancement in the field and will serve as a valuable tool for researchers probing the complexities of protein interactions. DDMut-PPI is freely available as a web server and an application programming interface at https://biosig.lab.uq.edu.au/ddmut_ppi.
Collapse
Affiliation(s)
- Yunzhuo Zhou
- The Australian Centre for Ecogenomics, School of Chemistry and Molecular Biosciences, University of Queensland, St Lucia, Queensland 4072, Australia
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, Victoria 3004, Australia
| | - YooChan Myung
- The Australian Centre for Ecogenomics, School of Chemistry and Molecular Biosciences, University of Queensland, St Lucia, Queensland 4072, Australia
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, Victoria 3004, Australia
| | - Carlos H M Rodrigues
- The Australian Centre for Ecogenomics, School of Chemistry and Molecular Biosciences, University of Queensland, St Lucia, Queensland 4072, Australia
| | - David B Ascher
- The Australian Centre for Ecogenomics, School of Chemistry and Molecular Biosciences, University of Queensland, St Lucia, Queensland 4072, Australia
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, Victoria 3004, Australia
| |
Collapse
|
4
|
Song L, Zuo X, Li M. Concept and Development of Algebraic Topological Framework Nucleic Acids. Chempluschem 2024; 89:e202300760. [PMID: 38529703 DOI: 10.1002/cplu.202300760] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2023] [Revised: 03/06/2024] [Accepted: 03/25/2024] [Indexed: 03/27/2024]
Abstract
Nucleic acids are considered as promising materials for developing exquisite nanostructures from one to three dimensions. The advances of DNA nanotechnology facilitate ingenious design of DNA nanostructures with diverse shapes and sizes. Especially, the algebraic topological framework nucleic acids (ATFNAs) are functional DNA nanostructures that engineer guest molecules (e. g., nucleic acids, proteins, small molecules, and nanoparticles) stoichiometrically and spatially. The intrinsic precise properties and tailorable functionalities of ATFNAs hold great promise for biological applications, such as cell recognition and immunotherapy. This Perspective highlights the concept and development of precisely assembled ATFNAs, and outlines the new frontiers and opportunities for exploiting the structural advantages of ATFNAs for biological applications.
Collapse
Affiliation(s)
- Lu Song
- Institute of Molecular Medicine, Renji Hospital, School of Medicine, Shanghai Jiao Tong University, 200127, Shanghai, China
| | - Xiaolei Zuo
- Institute of Molecular Medicine, Renji Hospital, School of Medicine, Shanghai Jiao Tong University, 200127, Shanghai, China
| | - Min Li
- Institute of Molecular Medicine, Renji Hospital, School of Medicine, Shanghai Jiao Tong University, 200127, Shanghai, China
| |
Collapse
|
5
|
Yu G, Zhao Q, Bi X, Wang J. DDAffinity: predicting the changes in binding affinity of multiple point mutations using protein 3D structure. Bioinformatics 2024; 40:i418-i427. [PMID: 38940145 PMCID: PMC11211828 DOI: 10.1093/bioinformatics/btae232] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/29/2024] Open
Abstract
MOTIVATION Mutations are the crucial driving force for biological evolution as they can disrupt protein stability and protein-protein interactions which have notable impacts on protein structure, function, and expression. However, existing computational methods for protein mutation effects prediction are generally limited to single point mutations with global dependencies, and do not systematically take into account the local and global synergistic epistasis inherent in multiple point mutations. RESULTS To this end, we propose a novel spatial and sequential message passing neural network, named DDAffinity, to predict the changes in binding affinity caused by multiple point mutations based on protein 3D structures. Specifically, instead of being on the whole protein, we perform message passing on the k-nearest neighbor residue graphs to extract pocket features of the protein 3D structures. Furthermore, to learn global topological features, a two-step additive Gaussian noising strategy during training is applied to blur out local details of protein geometry. We evaluate DDAffinity on benchmark datasets and external validation datasets. Overall, the predictive performance of DDAffinity is significantly improved compared with state-of-the-art baselines on multiple point mutations, including end-to-end and pre-training based methods. The ablation studies indicate the reasonable design of all components of DDAffinity. In addition, applications in nonredundant blind testing, predicting mutation effects of SARS-CoV-2 RBD variants, and optimizing human antibody against SARS-CoV-2 illustrate the effectiveness of DDAffinity. AVAILABILITY AND IMPLEMENTATION DDAffinity is available at https://github.com/ak422/DDAffinity.
Collapse
Affiliation(s)
- Guanglei Yu
- School of Computer Science and Engineering, Central South University, Changsha 410083, China
- Hunan Provincial Key Lab on Bioinformatics, Central South University, Changsha 410083, China
- Medical Engineering and Technology College, Xinjiang Medical University, Urumqi 830017, China
| | - Qichang Zhao
- School of Computer Science and Engineering, Central South University, Changsha 410083, China
- Hunan Provincial Key Lab on Bioinformatics, Central South University, Changsha 410083, China
| | - Xuehua Bi
- School of Computer Science and Engineering, Central South University, Changsha 410083, China
- Hunan Provincial Key Lab on Bioinformatics, Central South University, Changsha 410083, China
- Medical Engineering and Technology College, Xinjiang Medical University, Urumqi 830017, China
| | - Jianxin Wang
- School of Computer Science and Engineering, Central South University, Changsha 410083, China
- Hunan Provincial Key Lab on Bioinformatics, Central South University, Changsha 410083, China
| |
Collapse
|
6
|
Chen H, Revennaugh B, Fu H, Ivanov AA. AVERON notebook to discover actionable cancer vulnerabilities enabled by neomorph protein-protein interactions. iScience 2024; 27:110035. [PMID: 38883827 PMCID: PMC11179073 DOI: 10.1016/j.isci.2024.110035] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2024] [Revised: 04/30/2024] [Accepted: 05/16/2024] [Indexed: 06/18/2024] Open
Abstract
Genomic alterations, such as missense mutations, often lead to the activation of oncogenic pathways and cell transformation by rewiring protein-protein interaction (PPI) networks. Understanding how mutant-directed neomorph PPIs (neoPPIs) drive cancer is vital to developing new personalized clinical strategies. However, the experimental interrogation of neoPPI functions in patients with cancer is highly challenging. To address this challenge, we developed a computational platform, termed AVERON for discovering actionable vulnerabilities enabled by rewired oncogenic networks. AVERON enables rapid systematic profiling of the clinical significance of neomorph PPIs across different cancer types, informing molecular mechanisms of neoPPI-driven tumorigenesis, and revealing therapeutically actionable neoPPI-regulated genes. We demonstrated the application of the AVERON platform by evaluating the biological functions and clinical significance of 130 neomorph interactions, experimentally determined for oncogenic BRAFV600E. The AVERON application to broad sets of mutant-directed PPIs may inform new testable biological models and clinical strategies in cancer.
Collapse
Affiliation(s)
- Hongyue Chen
- Department of Pharmacology and Chemical Biology, Emory University School of Medicine, Emory University, Atlanta, GA, USA
| | - Brian Revennaugh
- Department of Pharmacology and Chemical Biology, Emory University School of Medicine, Emory University, Atlanta, GA, USA
| | - Haian Fu
- Department of Pharmacology and Chemical Biology, Emory University School of Medicine, Emory University, Atlanta, GA, USA
- Emory Chemical Biology Discovery Center, Emory University School of Medicine, Emory University, Atlanta, GA, USA
- Winship Cancer Institute, Emory University, Atlanta, GA, USA
- Department of Hematology, Medical Oncology Emory University, Atlanta, GA, USA
| | - Andrey A Ivanov
- Department of Pharmacology and Chemical Biology, Emory University School of Medicine, Emory University, Atlanta, GA, USA
- Emory Chemical Biology Discovery Center, Emory University School of Medicine, Emory University, Atlanta, GA, USA
- Winship Cancer Institute, Emory University, Atlanta, GA, USA
| |
Collapse
|
7
|
Biswas G, Mukherjee D, Basu S. Combining Complementarity and Binding Energetics in the Assessment of Protein Interactions: EnCPdock-A Practical Manual. J Comput Biol 2024. [PMID: 38885081 DOI: 10.1089/cmb.2024.0554] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/20/2024] Open
Abstract
The combined effect of shape and electrostatic complementarities (Sc, EC) at the interface of the interacting protein partners (PPI) serves as the physical basis for such associations and is a strong determinant of their binding energetics. EnCPdock (https://www.scinetmol.in/EnCPdock/) presents a comprehensive web platform for the direct conjoint comparative analyses of complementarity and binding energetics in PPIs. It elegantly interlinks the dual nature of local (Sc) and nonlocal complementarity (EC) in PPIs using the complementarity plot. It further derives an AI-based ΔGbinding with a prediction accuracy comparable to the state of the art. This book chapter presents a practical manual to conceptualize and implement EnCPdock with its various features and functionalities, collectively having the potential to serve as a valuable protein engineering tool in the design of novel protein interfaces.
Collapse
Affiliation(s)
- Gargi Biswas
- Department of Chemical and Structural Biology, Weizmann Institute of Science, Rehovot, Israel
| | | | - Sankar Basu
- Department of Microbiology, Asutosh College, University of Calcutta, Kolkata, India
| |
Collapse
|
8
|
Wee J, Wei GW. Benchmarking AlphaFold3's protein-protein complex accuracy and machine learning prediction reliability for binding free energy changes upon mutation. ARXIV 2024:arXiv:2406.03979v1. [PMID: 38883239 PMCID: PMC11177964] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/18/2024]
Abstract
AlphaFold 3 (AF3), the latest version of protein structure prediction software, goes beyond its predecessors by predicting protein-protein complexes. It could revolutionize drug discovery and protein engineering, marking a major step towards comprehensive, automated protein structure prediction. However, independent validation of AF3's predictions is necessary. Evaluated using the SKEMPI 2.0 database which involves 317 protein-protein complexes and 8338 mutations, AF3 complex structures give rise to a very good Pearson correlation coefficient of 0.86 for predicting protein-protein binding free energy changes upon mutation, slightly less than the 0.88 achieved earlier with the Protein Data Bank (PDB) structures. Nonetheless, AF3 complex structures led to a 8.6% increase in the prediction RMSE compared to original PDB complex structures. Additionally, some of AF3's complex structures have large errors, which were not captured in its ipTM performance metric. Finally, it is found that AF3's complex structures are not reliable for intrinsically flexible regions or domains.
Collapse
Affiliation(s)
- JunJie Wee
- Department of Mathematics, Michigan State University, East Lansing, MI 48824, USA
| | - Guo-Wei Wei
- Department of Mathematics, Michigan State University, East Lansing, MI 48824, USA
- Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, MI 48824, USA
- Department of Electrical and Computer Engineering, Michigan State University, East Lansing, MI 48824, USA
| |
Collapse
|
9
|
Sun X, Yang S, Wu Z, Su J, Hu F, Chang F, Li C. PMSPcnn: Predicting protein stability changes upon single point mutations with convolutional neural network. Structure 2024; 32:838-848.e3. [PMID: 38508191 DOI: 10.1016/j.str.2024.02.016] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2023] [Revised: 12/19/2023] [Accepted: 02/22/2024] [Indexed: 03/22/2024]
Abstract
Protein missense mutations and resulting protein stability changes are important causes for many human genetic diseases. However, the accurate prediction of stability changes due to mutations remains a challenging problem. To address this problem, we have developed an unbiased effective model: PMSPcnn that is based on a convolutional neural network. We have included an anti-symmetry property to build a balanced training dataset, which improves the prediction, in particular for stabilizing mutations. Persistent homology, which is an effective approach for characterizing protein structures, is used to obtain topological features. Additionally, a regression stratification cross-validation scheme has been proposed to improve the prediction for mutations with extreme ΔΔG. For three test datasets: Ssym, p53, and myoglobin, PMSPcnn achieves a better performance than currently existing predictors. PMSPcnn also outperforms currently available methods for membrane proteins. Overall, PMSPcnn is a promising method for the prediction of protein stability changes caused by single point mutations.
Collapse
Affiliation(s)
- Xiaohan Sun
- College of Chemistry and Life Science, Beijing University of Technology, Beijing 100124, China
| | - Shuang Yang
- College of Chemistry and Life Science, Beijing University of Technology, Beijing 100124, China
| | - Zhixiang Wu
- College of Chemistry and Life Science, Beijing University of Technology, Beijing 100124, China
| | - Jingjie Su
- College of Chemistry and Life Science, Beijing University of Technology, Beijing 100124, China
| | - Fangrui Hu
- College of Chemistry and Life Science, Beijing University of Technology, Beijing 100124, China
| | - Fubin Chang
- College of Chemistry and Life Science, Beijing University of Technology, Beijing 100124, China
| | - Chunhua Li
- College of Chemistry and Life Science, Beijing University of Technology, Beijing 100124, China.
| |
Collapse
|
10
|
Jin R, Ye Q, Wang J, Cao Z, Jiang D, Wang T, Kang Y, Xu W, Hsieh CY, Hou T. AttABseq: an attention-based deep learning prediction method for antigen-antibody binding affinity changes based on protein sequences. Brief Bioinform 2024; 25:bbae304. [PMID: 38960407 PMCID: PMC11221889 DOI: 10.1093/bib/bbae304] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2023] [Revised: 04/15/2024] [Accepted: 06/11/2024] [Indexed: 07/05/2024] Open
Abstract
The optimization of therapeutic antibodies through traditional techniques, such as candidate screening via hybridoma or phage display, is resource-intensive and time-consuming. In recent years, computational and artificial intelligence-based methods have been actively developed to accelerate and improve the development of therapeutic antibodies. In this study, we developed an end-to-end sequence-based deep learning model, termed AttABseq, for the predictions of the antigen-antibody binding affinity changes connected with antibody mutations. AttABseq is a highly efficient and generic attention-based model by utilizing diverse antigen-antibody complex sequences as the input to predict the binding affinity changes of residue mutations. The assessment on the three benchmark datasets illustrates that AttABseq is 120% more accurate than other sequence-based models in terms of the Pearson correlation coefficient between the predicted and experimental binding affinity changes. Moreover, AttABseq also either outperforms or competes favorably with the structure-based approaches. Furthermore, AttABseq consistently demonstrates robust predictive capabilities across a diverse array of conditions, underscoring its remarkable capacity for generalization across a wide spectrum of antigen-antibody complexes. It imposes no constraints on the quantity of altered residues, rendering it particularly applicable in scenarios where crystallographic structures remain unavailable. The attention-based interpretability analysis indicates that the causal effects of point mutations on antibody-antigen binding affinity changes can be visualized at the residue level, which might assist automated antibody sequence optimization. We believe that AttABseq provides a fiercely competitive answer to therapeutic antibody optimization.
Collapse
Affiliation(s)
- Ruofan Jin
- College of Pharmaceutical Science, Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, Zhejiang University, Yuhangtang Road 866, Hangzhou 310058, Zhejiang, China
- College of Life Science, Zhejiang University, Yuhangtang Road 866, Hangzhou 310058, Zhejiang, China
| | - Qing Ye
- College of Pharmaceutical Science, Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, Zhejiang University, Yuhangtang Road 866, Hangzhou 310058, Zhejiang, China
| | - Jike Wang
- College of Pharmaceutical Science, Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, Zhejiang University, Yuhangtang Road 866, Hangzhou 310058, Zhejiang, China
| | - Zheng Cao
- College of Computer Science and Technology, Zhejiang University, Yuhangtang Road 866, Hangzhou 310058, Zhejiang, China
| | - Dejun Jiang
- College of Pharmaceutical Science, Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, Zhejiang University, Yuhangtang Road 866, Hangzhou 310058, Zhejiang, China
| | - Tianyue Wang
- College of Pharmaceutical Science, Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, Zhejiang University, Yuhangtang Road 866, Hangzhou 310058, Zhejiang, China
| | - Yu Kang
- College of Pharmaceutical Science, Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, Zhejiang University, Yuhangtang Road 866, Hangzhou 310058, Zhejiang, China
| | - Wanting Xu
- College of Pharmaceutical Science, Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, Zhejiang University, Yuhangtang Road 866, Hangzhou 310058, Zhejiang, China
| | - Chang-Yu Hsieh
- College of Pharmaceutical Science, Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, Zhejiang University, Yuhangtang Road 866, Hangzhou 310058, Zhejiang, China
| | - Tingjun Hou
- College of Pharmaceutical Science, Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, Zhejiang University, Yuhangtang Road 866, Hangzhou 310058, Zhejiang, China
| |
Collapse
|
11
|
Tang J, Hu R, Liu Y, Liu J, Wang G, Lv J, Cheng L, He T, Liu Y, Shao PL, Zhang B. Deciphering ACE2-RBD binding affinity through peptide scanning: A molecular dynamics simulation approach. Comput Biol Med 2024; 173:108325. [PMID: 38513389 DOI: 10.1016/j.compbiomed.2024.108325] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2023] [Revised: 03/15/2024] [Accepted: 03/15/2024] [Indexed: 03/23/2024]
Abstract
Rapid discovery of target information for protein-protein interactions (PPIs) is significant in drug design, diagnostics, vaccine development, antibody therapy, etc. Peptide microarray is an ideal tool for revealing epitope information of PPIs. In this work, the Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) spike receptor-binding domain (RBD) and the host cell receptor angiotensin-converting enzyme 2 (ACE2) were introduced as a model to study the epitope information of RBD-specific binding to ACE2 via a combination of theoretical calculations and experimental validation. Through dock and molecular dynamics simulations, it was found that among the 22 peptide fragments that consist of RBD, #14 (YNYLYRLFRKSNLKP) has the highest binding strength. Subsequently, the experiments of peptide microarray constructed based on plasmonic materials chip also confirmed the theoretical calculation data. Compared to other methods, such as phage display technology and surface plasmon resonance (SPR), this method is rapid and cost-effective, providing insights into the investigation of pathogen invasion processes and the timely development of peptide drugs and other fields.
Collapse
Affiliation(s)
- Jiahu Tang
- Guangdong Provincial Key Laboratory of Advanced Biomaterials, Department of Biomedical Engineering, Southern University of Science and Technology, Shenzhen, 518055, China; Key Laboratory of Molecular Target & Clinical Pharmacology and the State Key Laboratory of Respiratory Disease, School of Pharmaceutical Sciences & the Fifth Affiliated Hospital, Guangzhou Medical University, Guangzhou, 511436, China
| | - Ruibin Hu
- Guangdong Provincial Key Laboratory of Advanced Biomaterials, Department of Biomedical Engineering, Southern University of Science and Technology, Shenzhen, 518055, China; Xianghu Laboratory, Hangzhou, 311231, China
| | - Yiyi Liu
- Guangdong Provincial Key Laboratory of Advanced Biomaterials, Department of Biomedical Engineering, Southern University of Science and Technology, Shenzhen, 518055, China
| | - Jingchao Liu
- Institute of Forestry and Pomology, Tianjin Academy of Agricultural Sciences, Tianjin, 300384, China
| | - Guanghui Wang
- Guangdong Provincial Key Laboratory of Advanced Biomaterials, Department of Biomedical Engineering, Southern University of Science and Technology, Shenzhen, 518055, China
| | - Jiahui Lv
- Guangdong Provincial Key Laboratory of Advanced Biomaterials, Department of Biomedical Engineering, Southern University of Science and Technology, Shenzhen, 518055, China
| | - Li Cheng
- Guangdong Provincial Key Laboratory of Advanced Biomaterials, Department of Biomedical Engineering, Southern University of Science and Technology, Shenzhen, 518055, China
| | - Tingzhen He
- Guangdong Provincial Key Laboratory of Advanced Biomaterials, Department of Biomedical Engineering, Southern University of Science and Technology, Shenzhen, 518055, China
| | - Ying Liu
- Guangdong Provincial Key Laboratory of Advanced Biomaterials, Department of Biomedical Engineering, Southern University of Science and Technology, Shenzhen, 518055, China.
| | - Pan-Lin Shao
- Key Laboratory of Molecular Target & Clinical Pharmacology and the State Key Laboratory of Respiratory Disease, School of Pharmaceutical Sciences & the Fifth Affiliated Hospital, Guangzhou Medical University, Guangzhou, 511436, China.
| | - Bo Zhang
- Guangdong Provincial Key Laboratory of Advanced Biomaterials, Department of Biomedical Engineering, Southern University of Science and Technology, Shenzhen, 518055, China.
| |
Collapse
|
12
|
Wu Z, Wang C, Li C, Xu N, Cao X, Chen S, Shi Y, He Y, Zhang P, Ji J. Integrated Computational Pipeline for the High-Throughput Discovery of Cell Adhesion Peptides. J Phys Chem Lett 2024; 15:3748-3756. [PMID: 38551401 DOI: 10.1021/acs.jpclett.4c00393] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/12/2024]
Abstract
Cell adhesion peptides (CAPs) often play a critical role in tissue engineering research. However, the discovery of novel CAPs for diverse applications remains a challenging and time-intensive process. This study presents an efficient computational pipeline integrating sequence embeddings, binding predictors, and molecular dynamics simulations to expedite the discovery of new CAPs. A Pro2vec model, trained on vast CAP data sets, was built to identify RGD-similar tripeptide candidates. These candidates were further evaluated for their binding affinity with integrin receptors using the Mutabind2 machine learning model. Additionally, molecular dynamics simulations were applied to model receptor-peptide interactions and calculate their binding free energies, providing a quantitative assessment of the binding strength for further screening. The resulting peptide demonstrated performance comparable to that of RGD in endothelial cell adhesion and spreading experimental assays, validating the efficacy of the integrated computational pipeline.
Collapse
Affiliation(s)
- Zhiyu Wu
- College of Chemical and Biological Engineering, Zhejiang University, Hangzhou 310058, China
- Institute of Zhejiang University-Quzhou, Quzhou 324000, China
| | - Cong Wang
- MOE Key Laboratory of Macromolecular Synthesis and Functionalization, Department of Polymer Science and Engineering, Zhejiang University, Hangzhou 310058, China
| | - Chen Li
- College of Chemical and Biological Engineering, Zhejiang University, Hangzhou 310058, China
- Institute of Zhejiang University-Quzhou, Quzhou 324000, China
| | - Nan Xu
- College of Chemical and Biological Engineering, Zhejiang University, Hangzhou 310058, China
- Institute of Zhejiang University-Quzhou, Quzhou 324000, China
| | - Xiaoyong Cao
- College of Chemical and Biological Engineering, Zhejiang University, Hangzhou 310058, China
- Institute of Zhejiang University-Quzhou, Quzhou 324000, China
| | - Shengfu Chen
- College of Chemical and Biological Engineering, Zhejiang University, Hangzhou 310058, China
| | - Yao Shi
- College of Chemical and Biological Engineering, Zhejiang University, Hangzhou 310058, China
- Key Laboratory of Biomass Chemical Engineering of Ministry of Education, Zhejiang University, Hangzhou 310058, China
| | - Yi He
- College of Chemical and Biological Engineering, Zhejiang University, Hangzhou 310058, China
- Institute of Zhejiang University-Quzhou, Quzhou 324000, China
- Department of Chemical Engineering, University of Washington, Seattle, Washington 98195, United States
| | - Peng Zhang
- MOE Key Laboratory of Macromolecular Synthesis and Functionalization, Department of Polymer Science and Engineering, Zhejiang University, Hangzhou 310058, China
- State Key Laboratory of Transvascular Implantation Devices, Qidi Road 456, Hangzhou 310058, China
| | - Jian Ji
- MOE Key Laboratory of Macromolecular Synthesis and Functionalization, Department of Polymer Science and Engineering, Zhejiang University, Hangzhou 310058, China
- State Key Laboratory of Transvascular Implantation Devices, Qidi Road 456, Hangzhou 310058, China
| |
Collapse
|
13
|
Kim DN, McNaughton AD, Kumar N. Leveraging Artificial Intelligence to Expedite Antibody Design and Enhance Antibody-Antigen Interactions. Bioengineering (Basel) 2024; 11:185. [PMID: 38391671 PMCID: PMC10886287 DOI: 10.3390/bioengineering11020185] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2023] [Revised: 01/30/2024] [Accepted: 02/06/2024] [Indexed: 02/24/2024] Open
Abstract
This perspective sheds light on the transformative impact of recent computational advancements in the field of protein therapeutics, with a particular focus on the design and development of antibodies. Cutting-edge computational methods have revolutionized our understanding of protein-protein interactions (PPIs), enhancing the efficacy of protein therapeutics in preclinical and clinical settings. Central to these advancements is the application of machine learning and deep learning, which offers unprecedented insights into the intricate mechanisms of PPIs and facilitates precise control over protein functions. Despite these advancements, the complex structural nuances of antibodies pose ongoing challenges in their design and optimization. Our review provides a comprehensive exploration of the latest deep learning approaches, including language models and diffusion techniques, and their role in surmounting these challenges. We also present a critical analysis of these methods, offering insights to drive further progress in this rapidly evolving field. The paper includes practical recommendations for the application of these computational techniques, supplemented with independent benchmark studies. These studies focus on key performance metrics such as accuracy and the ease of program execution, providing a valuable resource for researchers engaged in antibody design and development. Through this detailed perspective, we aim to contribute to the advancement of antibody design, equipping researchers with the tools and knowledge to navigate the complexities of this field.
Collapse
Affiliation(s)
- Doo Nam Kim
- Pacific Northwest National Laboratory, 902 Battelle Blvd., Richland, WA 99352, USA
| | - Andrew D McNaughton
- Pacific Northwest National Laboratory, 902 Battelle Blvd., Richland, WA 99352, USA
| | - Neeraj Kumar
- Pacific Northwest National Laboratory, 902 Battelle Blvd., Richland, WA 99352, USA
| |
Collapse
|
14
|
Wee J, Chen J, Xia K, Wei GW. Integration of persistent Laplacian and pre-trained transformer for protein solubility changes upon mutation. Comput Biol Med 2024; 169:107918. [PMID: 38194782 PMCID: PMC10922365 DOI: 10.1016/j.compbiomed.2024.107918] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2023] [Revised: 12/21/2023] [Accepted: 01/01/2024] [Indexed: 01/11/2024]
Abstract
Protein mutations can significantly influence protein solubility, which results in altered protein functions and leads to various diseases. Despite tremendous effort, machine learning prediction of protein solubility changes upon mutation remains a challenging task as indicated by the poor scores of normalized Correct Prediction Ratio (CPR). Part of the challenge stems from the fact that there is no three-dimensional (3D) structures for the wild-type and mutant proteins. This work integrates persistent Laplacians and pre-trained Transformer for the task. The Transformer, pretrained with hundreds of millions of protein sequences, embeds wild-type and mutant sequences, while persistent Laplacians track the topological invariant change and homotopic shape evolution induced by mutations in 3D protein structures, which are rendered from AlphaFold2. The resulting machine learning model was trained on an extensive data set labeled with three solubility types. Our model outperforms all existing predictive methods and improves the state-of-the-art up to 15%.
Collapse
Affiliation(s)
- JunJie Wee
- Department of Mathematics, Michigan State University, East Lansing, MI 48824, USA
| | - Jiahui Chen
- Department of Mathematical Sciences, University of Arkansas, Fayetteville, AR 72701, USA
| | - Kelin Xia
- Division of Mathematical Sciences, School of Physical and Mathematical Sciences, Nanyang Technological University, Singapore 637371, Singapore.
| | - Guo-Wei Wei
- Department of Mathematics, Michigan State University, East Lansing, MI 48824, USA; Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, MI 48824, USA; Department of Electrical and Computer Engineering, Michigan State University, East Lansing, MI 48824, USA.
| |
Collapse
|
15
|
Yi C, Taylor ML, Ziebarth J, Wang Y. Predictive Models and Impact of Interfacial Contacts and Amino Acids on Protein-Protein Binding Affinity. ACS OMEGA 2024; 9:3454-3468. [PMID: 38284090 PMCID: PMC10809705 DOI: 10.1021/acsomega.3c06996] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/13/2023] [Revised: 12/11/2023] [Accepted: 12/14/2023] [Indexed: 01/30/2024]
Abstract
Protein-protein interactions (PPIs) play a central role in nearly all cellular processes. The strength of the binding in a PPI is characterized by the binding affinity (BA) and is a key factor in controlling protein-protein complex formation and defining the structure-function relationship. Despite advancements in understanding protein-protein binding, much remains unknown about the interfacial region and its association with BA. New models are needed to predict BA with improved accuracy for therapeutic design. Here, we use machine learning approaches to examine how well different types of interfacial contacts can be used to predict experimentally determined BA and to reveal the impact of the specific amino acids at the binding interface on BA. We create a series of multivariate linear regression models incorporating different contact features at both residue and atomic levels and examine how different methods of identifying and characterizing these properties impact the performance of these models. Particularly, we introduce a new and simple approach to predict BA based on the quantities of specific amino acids at the protein-protein interface. We found that the numbers of specific amino acids at the protein-protein interface were correlated with BA. We show that the interfacial numbers of amino acids can be used to produce models with consistently good performance across different data sets, indicating the importance of the identities of interfacial amino acids in underlying BA. When trained on a diverse set of complexes from two benchmark data sets, the best performing BA model was generated with an explicit linear equation involving six amino acids. Tyrosine, in particular, was identified as the key amino acid in controlling BA, as it had the strongest correlation with BA and was consistently identified as the most important amino acid in feature importance studies. Glycine and serine were identified as the next two most important amino acids in predicting BA. The results from this study further our understanding of PPIs and can be used to make improved predictions of BA, giving them implications for drug design and screening in the pharmaceutical industry.
Collapse
Affiliation(s)
- Carey
Huang Yi
- Department of Chemistry, The University of Memphis, Memphis, Tennessee 38152, United States
| | - Mitchell Lee Taylor
- Department of Chemistry, The University of Memphis, Memphis, Tennessee 38152, United States
| | - Jesse Ziebarth
- Department of Chemistry, The University of Memphis, Memphis, Tennessee 38152, United States
| | - Yongmei Wang
- Department of Chemistry, The University of Memphis, Memphis, Tennessee 38152, United States
| |
Collapse
|
16
|
Shen L, Sun X, Chen Z, Guo Y, Shen Z, Song Y, Xin W, Ding H, Ma X, Xu W, Zhou W, Che J, Tan L, Chen L, Chen S, Dong X, Fang L, Zhu F. ADCdb: the database of antibody-drug conjugates. Nucleic Acids Res 2024; 52:D1097-D1109. [PMID: 37831118 PMCID: PMC10768060 DOI: 10.1093/nar/gkad831] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2023] [Revised: 09/07/2023] [Accepted: 09/28/2023] [Indexed: 10/14/2023] Open
Abstract
Antibody-drug conjugates (ADCs) are a class of innovative biopharmaceutical drugs, which, via their antibody (mAb) component, deliver and release their potent warhead (a.k.a. payload) at the disease site, thereby simultaneously improving the efficacy of delivered therapy and reducing its off-target toxicity. To design ADCs of promising efficacy, it is crucial to have the critical data of pharma-information and biological activities for each ADC. However, no such database has been constructed yet. In this study, a database named ADCdb focusing on providing ADC information (especially its pharma-information and biological activities) from multiple perspectives was thus developed. Particularly, a total of 6572 ADCs (359 approved by FDA or in clinical trial pipeline, 501 in preclinical test, 819 with in-vivo testing data, 1868 with cell line/target testing data, 3025 without in-vivo/cell line/target testing data) together with their explicit pharma-information was collected and provided. Moreover, a total of 9171 literature-reported activities were discovered, which were identified from diverse clinical trial pipelines, model organisms, patient/cell-derived xenograft models, etc. Due to the significance of ADCs and their relevant data, this new database was expected to attract broad interests from diverse research fields of current biopharmaceutical drug discovery. The ADCdb is now publicly accessible at: https://idrblab.org/adcdb/.
Collapse
Affiliation(s)
- Liteng Shen
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou 310058, China
- Department of Pharmacy, Zhejiang Cancer Hospital, Institute of Basic Medicine and Cancer (IBMC), Chinese Academy of Sciences, Hangzhou 310005, China
- Postgraduate Training Base Alliance of Wenzhou Medical University (Zhejiang Cancer Hospital), Hangzhou 310022, China
- College of Pharmaceutical Science, Zhejiang University of Technology, Hangzhou 310014, China
| | - Xiuna Sun
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou 310058, China
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, Alibaba-Zhejiang University Joint Research Center of Future Digital Healthcare, Hangzhou 330110, China
| | - Zhen Chen
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou 310058, China
| | - Yu Guo
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou 310058, China
| | - Zheyuan Shen
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou 310058, China
| | - Yi Song
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou 310058, China
| | - Wenxiu Xin
- Department of Pharmacy, Zhejiang Cancer Hospital, Institute of Basic Medicine and Cancer (IBMC), Chinese Academy of Sciences, Hangzhou 310005, China
| | - Haiying Ding
- Department of Pharmacy, Zhejiang Cancer Hospital, Institute of Basic Medicine and Cancer (IBMC), Chinese Academy of Sciences, Hangzhou 310005, China
| | - Xinyue Ma
- Department of Pharmacy, Zhejiang Cancer Hospital, Institute of Basic Medicine and Cancer (IBMC), Chinese Academy of Sciences, Hangzhou 310005, China
- Postgraduate Training Base Alliance of Wenzhou Medical University (Zhejiang Cancer Hospital), Hangzhou 310022, China
| | - Weiben Xu
- Department of Pharmacy, Zhejiang Cancer Hospital, Institute of Basic Medicine and Cancer (IBMC), Chinese Academy of Sciences, Hangzhou 310005, China
- College of Pharmaceutical Science, Zhejiang University of Technology, Hangzhou 310014, China
| | - Wanying Zhou
- Department of Pharmacy, Zhejiang Cancer Hospital, Institute of Basic Medicine and Cancer (IBMC), Chinese Academy of Sciences, Hangzhou 310005, China
- Postgraduate Training Base Alliance of Wenzhou Medical University (Zhejiang Cancer Hospital), Hangzhou 310022, China
| | - Jinxin Che
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou 310058, China
| | - Lili Tan
- Department of Pharmacy, Zhejiang Cancer Hospital, Institute of Basic Medicine and Cancer (IBMC), Chinese Academy of Sciences, Hangzhou 310005, China
- Postgraduate Training Base Alliance of Wenzhou Medical University (Zhejiang Cancer Hospital), Hangzhou 310022, China
| | - Liangsheng Chen
- Department of Pharmacy, Zhejiang Cancer Hospital, Institute of Basic Medicine and Cancer (IBMC), Chinese Academy of Sciences, Hangzhou 310005, China
- Postgraduate Training Base Alliance of Wenzhou Medical University (Zhejiang Cancer Hospital), Hangzhou 310022, China
| | - Siqi Chen
- School of Pharmaceutical Science, Zhejiang Chinese Medical University, Hangzhou 310053, China
| | - Xiaowu Dong
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou 310058, China
- College of Pharmaceutical Science, Zhejiang University of Technology, Hangzhou 310014, China
| | - Luo Fang
- Department of Pharmacy, Zhejiang Cancer Hospital, Institute of Basic Medicine and Cancer (IBMC), Chinese Academy of Sciences, Hangzhou 310005, China
- Postgraduate Training Base Alliance of Wenzhou Medical University (Zhejiang Cancer Hospital), Hangzhou 310022, China
- College of Pharmaceutical Science, Zhejiang University of Technology, Hangzhou 310014, China
- School of Pharmaceutical Science, Zhejiang Chinese Medical University, Hangzhou 310053, China
| | - Feng Zhu
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou 310058, China
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, Alibaba-Zhejiang University Joint Research Center of Future Digital Healthcare, Hangzhou 330110, China
| |
Collapse
|
17
|
Rana MM, Nguyen DD. Geometric Graph Learning to Predict Changes in Binding Free Energy and Protein Thermodynamic Stability upon Mutation. J Phys Chem Lett 2023; 14:10870-10879. [PMID: 38032742 DOI: 10.1021/acs.jpclett.3c02679] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/02/2023]
Abstract
Accurate prediction of binding free energy changes upon mutations is vital for optimizing drugs, designing proteins, understanding genetic diseases, and cost-effective virtual screening. While machine learning methods show promise in this domain, achieving accuracy and generalization across diverse data sets remains a challenge. This study introduces Geometric Graph Learning for Protein-Protein Interactions (GGL-PPI), a novel approach integrating geometric graph representation and machine learning to forecast mutation-induced binding free energy changes. GGL-PPI leverages atom-level graph coloring and multiscale weighted colored geometric subgraphs to capture structural features of biomolecules, demonstrating superior performance on three standard data sets, namely, AB-Bind, SKEMPI 1.0, and SKEMPI 2.0 data sets. The model's efficacy extends to predicting protein thermodynamic stability in a blind test set, providing unbiased predictions for both direct and reverse mutations and showcasing notable generalization. GGL-PPI's precision in predicting changes in binding free energy and stability due to mutations enhances our comprehension of protein complexes, offering valuable insights for drug design endeavors.
Collapse
Affiliation(s)
- Md Masud Rana
- Department of Mathematics, University of Kentucky, Lexington, Kentucky 40506, United States
| | - Duc Duy Nguyen
- Department of Mathematics, University of Kentucky, Lexington, Kentucky 40506, United States
| |
Collapse
|
18
|
Tsishyn M, Pucci F, Rooman M. Quantification of biases in predictions of protein-protein binding affinity changes upon mutations. Brief Bioinform 2023; 25:bbad491. [PMID: 38197311 PMCID: PMC10777193 DOI: 10.1093/bib/bbad491] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2023] [Revised: 10/02/2023] [Accepted: 12/05/2023] [Indexed: 01/11/2024] Open
Abstract
Understanding the impact of mutations on protein-protein binding affinity is a key objective for a wide range of biotechnological applications and for shedding light on disease-causing mutations, which are often located at protein-protein interfaces. Over the past decade, many computational methods using physics-based and/or machine learning approaches have been developed to predict how protein binding affinity changes upon mutations. They all claim to achieve astonishing accuracy on both training and test sets, with performances on standard benchmarks such as SKEMPI 2.0 that seem overly optimistic. Here we benchmarked eight well-known and well-used predictors and identified their biases and dataset dependencies, using not only SKEMPI 2.0 as a test set but also deep mutagenesis data on the severe acute respiratory syndrome coronavirus 2 spike protein in complex with the human angiotensin-converting enzyme 2. We showed that, even though most of the tested methods reach a significant degree of robustness and accuracy, they suffer from limited generalizability properties and struggle to predict unseen mutations. Interestingly, the generalizability problems are more severe for pure machine learning approaches, while physics-based methods are less affected by this issue. Moreover, undesirable prediction biases toward specific mutation properties, the most marked being toward destabilizing mutations, are also observed and should be carefully considered by method developers. We conclude from our analyses that there is room for improvement in the prediction models and suggest ways to check, assess and improve their generalizability and robustness.
Collapse
Affiliation(s)
- Matsvei Tsishyn
- Computational Biology and Bioinformatics, Université Libre de Bruxelles, Roosevelt Ave, 1050, Brussels, Belgium
- Interuniversity Institute of Bioinformatics in Brussels, Brussels, Belgium
| | - Fabrizio Pucci
- Computational Biology and Bioinformatics, Université Libre de Bruxelles, Roosevelt Ave, 1050, Brussels, Belgium
- Interuniversity Institute of Bioinformatics in Brussels, Brussels, Belgium
| | - Marianne Rooman
- Computational Biology and Bioinformatics, Université Libre de Bruxelles, Roosevelt Ave, 1050, Brussels, Belgium
- Interuniversity Institute of Bioinformatics in Brussels, Brussels, Belgium
| |
Collapse
|
19
|
Shen C, Luo J, Xia K. Molecular geometric deep learning. CELL REPORTS METHODS 2023; 3:100621. [PMID: 37875121 PMCID: PMC10694498 DOI: 10.1016/j.crmeth.2023.100621] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/19/2023] [Revised: 06/16/2023] [Accepted: 09/28/2023] [Indexed: 10/26/2023]
Abstract
Molecular representation learning plays an important role in molecular property prediction. Existing molecular property prediction models rely on the de facto standard of covalent-bond-based molecular graphs for representing molecular topology at the atomic level and totally ignore the non-covalent interactions within the molecule. In this study, we propose a molecular geometric deep learning model to predict the properties of molecules that aims to comprehensively consider the information of covalent and non-covalent interactions of molecules. The essential idea is to incorporate a more general molecular representation into geometric deep learning (GDL) models. We systematically test molecular GDL (Mol-GDL) on fourteen commonly used benchmark datasets. The results show that Mol-GDL can achieve a better performance than state-of-the-art (SOTA) methods. Extensive tests have demonstrated the important role of non-covalent interactions in molecular property prediction and the effectiveness of Mol-GDL models.
Collapse
Affiliation(s)
- Cong Shen
- College of Computer Science and Electronic Engineering, Hunan University, Changsha 410000, China; School of Physical and Mathematical Sciences, Nanyang Technological University, Singapore 637371, Singapore
| | - Jiawei Luo
- College of Computer Science and Electronic Engineering, Hunan University, Changsha 410000, China.
| | - Kelin Xia
- School of Physical and Mathematical Sciences, Nanyang Technological University, Singapore 637371, Singapore.
| |
Collapse
|
20
|
Long Y, Donald BR. Predicting Affinity Through Homology (PATH): Interpretable Binding Affinity Prediction with Persistent Homology. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.11.16.567384. [PMID: 38014181 PMCID: PMC10680814 DOI: 10.1101/2023.11.16.567384] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/29/2023]
Abstract
Accurate binding affinity prediction is crucial to structure-based drug design. Recent work used computational topology to obtain an effective representation of protein-ligand interactions. Although persistent homology encodes geometric features, previous works on binding affinity prediction using persistent homology employed uninterpretable machine learning models and failed to explain the underlying geometric and topological features that drive accurate binding affinity prediction. In this work, we propose a novel, interpretable algorithm for protein-ligand binding affinity prediction. Our algorithm achieves interpretability through an effective embedding of distances across bipartite matchings of the protein and ligand atoms into real-valued functions by summing Gaussians centered at features constructed by persistent homology. We name these functions internuclear persistent contours (IPCs) . Next, we introduce persistence fingerprints , a vector with 10 components that sketches the distances of different bipartite matching between protein and ligand atoms, refined from IPCs. Let the number of protein atoms in the protein-ligand complex be n , number of ligand atoms be m , and ω ≈ 2.4 be the matrix multiplication exponent. We show that for any 0 < ε < 1, after an 𝒪 ( mn log( mn )) preprocessing procedure, we can compute an ε -accurate approximation to the persistence fingerprint in 𝒪 ( m log 6 ω ( m/" )) time, independent of protein size. This is an improvement in time complexity by a factor of 𝒪 (( m + n ) 3 ) over any previous binding affinity prediction that uses persistent homology. We show that the representational power of persistence fingerprint generalizes to protein-ligand binding datasets beyond the training dataset. Then, we introduce PATH , Predicting Affinity Through Homology, an interpretable, small ensemble of shallow regression trees for binding affinity prediction from persistence fingerprints. We show that despite using 1,400-fold fewer features, PATH has comparable performance to a previous state-of-the-art binding affinity prediction algorithm that uses persistent homology features. Moreover, PATH has the advantage of being interpretable. Finally, we visualize the features captured by persistence fingerprint for variant HIV-1 protease complexes and show that persistence fingerprint captures binding-relevant structural mutations. The source code for PATH is released open-source as part of the osprey protein design software package.
Collapse
|
21
|
Wee J, Chen J, Xia K, Wei GW. Integration of persistent Laplacian and pre-trained transformer for protein solubility changes upon mutation. ARXIV 2023:arXiv:2310.18760v2. [PMID: 37961732 PMCID: PMC10635294] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Subscribe] [Scholar Register] [Indexed: 11/15/2023]
Abstract
Protein mutations can significantly influence protein solubility, which results in altered protein functions and leads to various diseases. Despite of tremendous effort, machine learning prediction of protein solubility changes upon mutation remains a challenging task as indicated by the poor scores of normalized Correct Prediction Ratio (CPR). Part of the challenge stems from the fact that there is no three-dimensional (3D) structures for the wild-type and mutant proteins. This work integrates persistent Laplacians and pre-trained Transformer for the task. The Transformer, pretrained with hunderds of millions of protein sequences, embeds wild-type and mutant sequences, while persistent Laplacians track the topological invariant change and homotopic shape evolution induced by mutations in 3D protein structures, which are rendered from AlphaFold2. The resulting machine learning model was trained on an extensive data set labeled with three solubility types. Our model outperforms all existing predictive methods and improves the state-of-the-art up to 15%.
Collapse
Affiliation(s)
- JunJie Wee
- Department of Mathematics, Michigan State University, East Lansing, MI 48824, USA
| | - Jiahui Chen
- Department of Mathematical Sciences, University of Arkansas, Fayetteville, AR 72701, USA
| | - Kelin Xia
- Division of Mathematical Sciences, School of Physical and Mathematical Sciences, Nanyang Technological University, Singapore 637371
| | - Guo-Wei Wei
- Department of Mathematics, Michigan State University, East Lansing, MI 48824, USA
- Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, MI 48824, USA
- Department of Electrical and Computer Engineering, Michigan State University, East Lansing, MI 48824, USA
| |
Collapse
|
22
|
Lin X, Gao Y, Lei F. An application of topological data analysis in predicting sumoylation sites. PeerJ 2023; 11:e16204. [PMID: 37846308 PMCID: PMC10576966 DOI: 10.7717/peerj.16204] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2023] [Accepted: 09/08/2023] [Indexed: 10/18/2023] Open
Abstract
Sumoylation is a reversible post-translational modification that regulates certain significant biochemical functions in proteins. The protein alterations caused by sumoylation are associated with the incidence of some human diseases. Therefore, identifying the sites of sumoylation in proteins may provide a direction for mechanistic research and drug development. Here, we propose a new computational approach for identifying sumoylation sites using an encoding method based on topological data analysis. The features of our model captured the key physical and biological properties of proteins at multiple scales. In a 10-fold cross validation, the outcomes of our model showed 96.45% of sensitivity (Sn), 94.65% of accuracy (Acc), 0.8946 of Matthew's correlation coefficient (MCC), and 0.99 of area under curve (AUC). The proposed predictor with only topological features achieves the best MCC and AUC in comparison to the other released methods. Our results suggest that topological information is an additional parameter that can assist in the prediction of sumoylation sites and provide a novel perspective for further research in protein sumoylation.
Collapse
Affiliation(s)
- Xiaoxi Lin
- School of Mathematical Sciences, Dalian University of Technology, Dalian, Liaoning, China
| | - Yaru Gao
- School of Mathematical Sciences, Dalian University of Technology, Dalian, Liaoning, China
| | - Fengchun Lei
- School of Mathematical Sciences, Dalian University of Technology, Dalian, Liaoning, China
| |
Collapse
|
23
|
Yue Y, Li S, Wang L, Liu H, Tong HHY, He S. MpbPPI: a multi-task pre-training-based equivariant approach for the prediction of the effect of amino acid mutations on protein-protein interactions. Brief Bioinform 2023; 24:bbad310. [PMID: 37651610 PMCID: PMC10516393 DOI: 10.1093/bib/bbad310] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2023] [Revised: 07/12/2023] [Accepted: 08/04/2023] [Indexed: 09/02/2023] Open
Abstract
The accurate prediction of the effect of amino acid mutations for protein-protein interactions (PPI $\Delta \Delta G$) is a crucial task in protein engineering, as it provides insight into the relevant biological processes underpinning protein binding and provides a basis for further drug discovery. In this study, we propose MpbPPI, a novel multi-task pre-training-based geometric equivariance-preserving framework to predict PPI $\Delta \Delta G$. Pre-training on a strictly screened pre-training dataset is employed to address the scarcity of protein-protein complex structures annotated with PPI $\Delta \Delta G$ values. MpbPPI employs a multi-task pre-training technique, forcing the framework to learn comprehensive backbone and side chain geometric regulations of protein-protein complexes at different scales. After pre-training, MpbPPI can generate high-quality representations capturing the effective geometric characteristics of labeled protein-protein complexes for downstream $\Delta \Delta G$ predictions. MpbPPI serves as a scalable framework supporting different sources of mutant-type (MT) protein-protein complexes for flexible application. Experimental results on four benchmark datasets demonstrate that MpbPPI is a state-of-the-art framework for PPI $\Delta \Delta G$ predictions. The data and source code are available at https://github.com/arantir123/MpbPPI.
Collapse
Affiliation(s)
- Yang Yue
- School of Computer Science from the University of Birmingham, UK
| | - Shu Li
- Centre for Artificial Intelligence Driven Drug Discovery at Macao Polytechnic University
| | - Lingling Wang
- Centre for Artificial Intelligence Driven Drug Discovery at Macao Polytechnic University
| | - Huanxiang Liu
- Centre for Artificial Intelligence Driven Drug Discovery at Macao Polytechnic University
| | - Henry H Y Tong
- Centre for Artificial Intelligence Driven Drug Discovery at Macao Polytechnic University
| | - Shan He
- School of Computer Science, the University of Birmingham, Edgbaston, Birmingham, B15 2TT, UK
| |
Collapse
|
24
|
Shirvanizadeh N, Vihinen M. VariBench, new variation benchmark categories and data sets. FRONTIERS IN BIOINFORMATICS 2023; 3:1248732. [PMID: 37795169 PMCID: PMC10546188 DOI: 10.3389/fbinf.2023.1248732] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2023] [Accepted: 09/08/2023] [Indexed: 10/06/2023] Open
Affiliation(s)
| | - Mauno Vihinen
- Department of Experimental Medical Science, Lund University, Lund, Sweden
| |
Collapse
|
25
|
Chen J, Woldring DR, Huang F, Huang X, Wei GW. Topological deep learning based deep mutational scanning. Comput Biol Med 2023; 164:107258. [PMID: 37506452 PMCID: PMC10528359 DOI: 10.1016/j.compbiomed.2023.107258] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2023] [Revised: 06/28/2023] [Accepted: 07/08/2023] [Indexed: 07/30/2023]
Abstract
High-throughput deep mutational scanning (DMS) experiments have significantly impacted protein engineering, drug discovery, immunology, cancer biology, and evolutionary biology by enabling the systematic understanding of protein functions. However, the mutational space associated with proteins is astronomically large, making it overwhelming for current experimental capabilities. Therefore, alternative methods for DMS are imperative. We propose a topological deep learning (TDL) paradigm to facilitate in silico DMS. We utilize a new topological data analysis (TDA) technique based on the persistent spectral theory, also known as persistent Laplacian, to capture both topological invariants and the homotopic shape evolution of data. To validate our TDL-DMS model, we use SARS-CoV-2 datasets and show excellent accuracy and reliability for binding interface mutations. This finding is significant for SARS-CoV-2 variant forecasting and designing effective antibodies and vaccines. Our proposed model is expected to have a significant impact on drug discovery, vaccine design, precision medicine, and protein engineering.
Collapse
Affiliation(s)
- Jiahui Chen
- Department of Mathematical Sciences, University of Arkansas, Fayetteville, AR 72701, USA
| | - Daniel R Woldring
- Department of Chemical Engineering, Michigan State University, East Lansing, MI 48824, USA
| | - Faqing Huang
- Department of Chemistry and Biochemistry, University of Southern Mississippi, Hattiesburg, MS 39406, USA
| | - Xuefei Huang
- Department of Chemistry, Michigan State University, MI 48824, USA; Department of Biomedical Engineering, Michigan State University, East Lansing, MI 48824, USA; The Institute for Quantitative Health Science and Engineering, Michigan State University, East Lansing, MI 48824, USA
| | - Guo-Wei Wei
- Department of Mathematics, Michigan State University, East Lansing, MI 48824, USA; Department of Electrical and Computer Engineering, Michigan State University, East Lansing, MI 48824, USA; Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, MI 48824, USA.
| |
Collapse
|
26
|
Guarra F, Colombo G. Computational Methods in Immunology and Vaccinology: Design and Development of Antibodies and Immunogens. J Chem Theory Comput 2023; 19:5315-5333. [PMID: 37527403 PMCID: PMC10448727 DOI: 10.1021/acs.jctc.3c00513] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2023] [Indexed: 08/03/2023]
Abstract
The design of new biomolecules able to harness immune mechanisms for the treatment of diseases is a prime challenge for computational and simulative approaches. For instance, in recent years, antibodies have emerged as an important class of therapeutics against a spectrum of pathologies. In cancer, immune-inspired approaches are witnessing a surge thanks to a better understanding of tumor-associated antigens and the mechanisms of their engagement or evasion from the human immune system. Here, we provide a summary of the main state-of-the-art computational approaches that are used to design antibodies and antigens, and in parallel, we review key methodologies for epitope identification for both B- and T-cell mediated responses. A special focus is devoted to the description of structure- and physics-based models, privileged over purely sequence-based approaches. We discuss the implications of novel methods in engineering biomolecules with tailored immunological properties for possible therapeutic uses. Finally, we highlight the extraordinary challenges and opportunities presented by the possible integration of structure- and physics-based methods with emerging Artificial Intelligence technologies for the prediction and design of novel antigens, epitopes, and antibodies.
Collapse
Affiliation(s)
- Federica Guarra
- Department of Chemistry, University
of Pavia, Via Taramelli 12, 27100 Pavia, Italy
| | - Giorgio Colombo
- Department of Chemistry, University
of Pavia, Via Taramelli 12, 27100 Pavia, Italy
| |
Collapse
|
27
|
Hagg A, Kirschner KN. Open-Source Machine Learning in Computational Chemistry. J Chem Inf Model 2023; 63:4505-4532. [PMID: 37466636 PMCID: PMC10430767 DOI: 10.1021/acs.jcim.3c00643] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2023] [Indexed: 07/20/2023]
Abstract
The field of computational chemistry has seen a significant increase in the integration of machine learning concepts and algorithms. In this Perspective, we surveyed 179 open-source software projects, with corresponding peer-reviewed papers published within the last 5 years, to better understand the topics within the field being investigated by machine learning approaches. For each project, we provide a short description, the link to the code, the accompanying license type, and whether the training data and resulting models are made publicly available. Based on those deposited in GitHub repositories, the most popular employed Python libraries are identified. We hope that this survey will serve as a resource to learn about machine learning or specific architectures thereof by identifying accessible codes with accompanying papers on a topic basis. To this end, we also include computational chemistry open-source software for generating training data and fundamental Python libraries for machine learning. Based on our observations and considering the three pillars of collaborative machine learning work, open data, open source (code), and open models, we provide some suggestions to the community.
Collapse
Affiliation(s)
- Alexander Hagg
- Institute
of Technology, Resource and Energy-Efficient Engineering (TREE), University of Applied Sciences Bonn-Rhein-Sieg, 53757 Sankt Augustin, Germany
- Department
of Electrical Engineering, Mechanical Engineering and Technical Journalism, University of Applied Sciences Bonn-Rhein-Sieg, 53757 Sankt Augustin, Germany
| | - Karl N. Kirschner
- Institute
of Technology, Resource and Energy-Efficient Engineering (TREE), University of Applied Sciences Bonn-Rhein-Sieg, 53757 Sankt Augustin, Germany
- Department
of Computer Science, University of Applied
Sciences Bonn-Rhein-Sieg, 53757 Sankt Augustin, Germany
| |
Collapse
|
28
|
Wei X, Chen J, Wei GW. Persistent topological Laplacian analysis of SARS-CoV-2 variants. JOURNAL OF COMPUTATIONAL BIOPHYSICS AND CHEMISTRY 2023; 22:569-587. [PMID: 37829318 PMCID: PMC10569362 DOI: 10.1142/s2737416523500278] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/14/2023]
Abstract
Topological data analysis (TDA) is an emerging field in mathematics and data science. Its central technique, persistent homology, has had tremendous success in many science and engineering disciplines. However, persistent homology has limitations, including its inability to handle heterogeneous information, such as multiple types of geometric objects; being qualitative rather than quantitative, e.g., counting a 5-member ring the same as a 6-member ring, and a failure to describe non-topological changes, such as homotopic changes in protein-protein binding. Persistent topological Laplacians (PTLs), such as persistent Laplacian and persistent sheaf Laplacian, were proposed to overcome the limitations of persistent homology. In this work, we examine the modeling and analysis power of PTLs in the study of the protein structures of the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) spike receptor binding domain (RBD). First, we employ PTLs to study how the RBD mutation-induced structural changes of RBD-angiotensin-converting enzyme 2 (ACE2) binding complexes are captured in the changes of spectra of the PTLs among SARS-CoV-2 variants. Additionally, we use PTLs to analyze the binding of RBD and ACE2-induced structural changes of various SARS-CoV-2 variants. Finally, we explore the impacts of computationally generated RBD structures on a topological deep learning paradigm and predictions of deep mutational scanning datasets for the SARS-CoV-2 Omicron BA.2 variant. Our results indicate that PTLs have advantages over persistent homology in analyzing protein structural changes and provide a powerful new TDA tool for data science.
Collapse
Affiliation(s)
- Xiaoqi Wei
- Department of Mathematics, Michigan State University, MI 48824, USA
| | - Jiahui Chen
- Department of Mathematics, Michigan State University, MI 48824, USA
| | - Guo-Wei Wei
- Department of Mathematics, Michigan State University, MI 48824, USA
- Department of Electrical and Computer Engineering, Michigan State University, MI 48824, USA
- Department of Biochemistry and Molecular Biology, Michigan State University, MI 48824, USA
| |
Collapse
|
29
|
Qiu Y, Wei GW. Artificial intelligence-aided protein engineering: from topological data analysis to deep protein language models. ARXIV 2023:arXiv:2307.14587v1. [PMID: 37547662 PMCID: PMC10402185] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Subscribe] [Scholar Register] [Indexed: 08/08/2023]
Abstract
Protein engineering is an emerging field in biotechnology that has the potential to revolutionize various areas, such as antibody design, drug discovery, food security, ecology, and more. However, the mutational space involved is too vast to be handled through experimental means alone. Leveraging accumulative protein databases, machine learning (ML) models, particularly those based on natural language processing (NLP), have considerably expedited protein engineering. Moreover, advances in topological data analysis (TDA) and artificial intelligence-based protein structure prediction, such as AlphaFold2, have made more powerful structure-based ML-assisted protein engineering strategies possible. This review aims to offer a comprehensive, systematic, and indispensable set of methodological components, including TDA and NLP, for protein engineering and to facilitate their future development.
Collapse
Affiliation(s)
- Yuchi Qiu
- Department of Mathematics, Michigan State University, East Lansing, 48824, MI, USA
| | - Guo-Wei Wei
- Department of Mathematics, Michigan State University, East Lansing, 48824, MI, USA
- Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, 48824, MI, USA
- Department of Electrical and Computer Engineering, Michigan State University, East Lansing, 48824, MI, USA
| |
Collapse
|
30
|
Upadhyay A, Ekenna C. A New Tool to Study the Binding Behavior of Intrinsically Disordered Proteins. Int J Mol Sci 2023; 24:11785. [PMID: 37511544 PMCID: PMC10380747 DOI: 10.3390/ijms241411785] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2023] [Revised: 07/07/2023] [Accepted: 07/14/2023] [Indexed: 07/30/2023] Open
Abstract
Understanding the binding behavior and conformational dynamics of intrinsically disordered proteins (IDPs) is crucial for unraveling their regulatory roles in biological processes. However, their lack of stable 3D structures poses challenges for analysis. To address this, we propose an algorithm that explores IDP binding behavior with protein complexes by extracting topological and geometric features from the protein surface model. Our algorithm identifies a geometrically favorable binding pose for the IDP and plans a feasible trajectory to evaluate its transition to the docking position. We focus on IDPs from Homo sapiens and Mus-musculus, investigating their interaction with the Plasmodium falciparum (PF) pathogen associated with malaria-related deaths. We compare our algorithm with HawkDock and HDOCK docking tools for quantitative (computation time) and qualitative (binding affinity) measures. Our results indicated that our method outperformed the compared methods in computation performance and binding affinity in experimental conformations.
Collapse
Affiliation(s)
- Aakriti Upadhyay
- Department of Computer Science, University at Albany, State University of New York, 1400 Washington Avenue, Albany, NY 12222, USA
| | - Chinwe Ekenna
- Department of Computer Science, University at Albany, State University of New York, 1400 Washington Avenue, Albany, NY 12222, USA
| |
Collapse
|
31
|
Zhu Y, Xiong H, Liu S, Wu D, Zhang X, Shi X, Qu J, Chen L, Liu Z, Peng B, Zhang D. Combining MOE Bioinformatics Analysis and In Vitro Pseudovirus Neutralization Assays to Predict the Neutralizing Ability of CV30 Monoclonal Antibody on SARS-CoV-2 Variants. Viruses 2023; 15:1565. [PMID: 37515251 PMCID: PMC10386485 DOI: 10.3390/v15071565] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2023] [Revised: 07/13/2023] [Accepted: 07/13/2023] [Indexed: 07/30/2023] Open
Abstract
Combining bioinformatics and in vitro cytology assays, a predictive method was established to quickly evaluate the protective effect of immunity acquired through SARS-CoV-2 infection against variants. Bioinformatics software was first used to predict the changes in the affinity of variant antigens to the CV30 monoclonal antibody by integrating bioinformatics and cytology assays. Then, the ability of the antibody to neutralize the variant antigen was further verified, and the ability of the CV30 to neutralize the new variant strain was predicted through pseudovirus neutralization experiments. The current study has demonstrated that when the Molecular Operating Environment (MOE) predicts |ΔBFE| ≤ 3.0003, it suggests that the CV30 monoclonal antibody exhibits some affinity toward the variant strain and can potentially neutralize it. However, if |ΔBFE| ≥ 4.1539, the CV30 monoclonal antibody does not display any affinity for the variant strain and cannot neutralize it. In contrast, if 3.0003 < |ΔBFE| < 4.1539, it is necessary to conduct a series of neutralization tests promptly with the CV30 monoclonal antibody and the variant pseudovirus to obtain results and supplement the existing method, which is faster than the typical procedures. This approach allows for a rapid assessment of the protective efficacy of natural immunity gained through SARS-CoV-2 infection against variants.
Collapse
Affiliation(s)
- Yajuan Zhu
- School of Public Health, Sun Yat-Sen University, Guangzhou 510080, China
| | - Husheng Xiong
- School of Public Health, Sun Yat-Sen University, Guangzhou 510080, China
| | - Shuang Liu
- School of Public Health, Sun Yat-Sen University, Guangzhou 510080, China
| | - Dawei Wu
- School of Public Health, Sun Yat-Sen University, Guangzhou 510080, China
| | - Xiaomin Zhang
- Department of Microbiology Laboratory, Shenzhen Center for Disease Control and Prevention, Shenzhen 518055, China
| | - Xiaolu Shi
- Department of Microbiology Laboratory, Shenzhen Center for Disease Control and Prevention, Shenzhen 518055, China
| | - Jing Qu
- Department of Microbiology Laboratory, Shenzhen Center for Disease Control and Prevention, Shenzhen 518055, China
| | - Long Chen
- Department of Microbiology Laboratory, Shenzhen Center for Disease Control and Prevention, Shenzhen 518055, China
| | - Zheng Liu
- Kobilka Institute of Innovative Drug Discovery, School of Medicine, Chinese University of Hong Kong, Shenzhen 518172, China
| | - Bo Peng
- Department of Microbiology Laboratory, Shenzhen Center for Disease Control and Prevention, Shenzhen 518055, China
| | - Dingmei Zhang
- School of Public Health, Sun Yat-Sen University, Guangzhou 510080, China
- NMPA Key Laboratory for Quality Monitoring and Evaluation of Vaccines and Biological Products, Guangzhou 510080, China
| |
Collapse
|
32
|
Mohseni Behbahani Y, Laine E, Carbone A. Deep Local Analysis deconstructs protein-protein interfaces and accurately estimates binding affinity changes upon mutation. Bioinformatics 2023; 39:i544-i552. [PMID: 37387162 DOI: 10.1093/bioinformatics/btad231] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/01/2023] Open
Abstract
MOTIVATION The spectacular recent advances in protein and protein complex structure prediction hold promise for reconstructing interactomes at large-scale and residue resolution. Beyond determining the 3D arrangement of interacting partners, modeling approaches should be able to unravel the impact of sequence variations on the strength of the association. RESULTS In this work, we report on Deep Local Analysis, a novel and efficient deep learning framework that relies on a strikingly simple deconstruction of protein interfaces into small locally oriented residue-centered cubes and on 3D convolutions recognizing patterns within cubes. Merely based on the two cubes associated with the wild-type and the mutant residues, DLA accurately estimates the binding affinity change for the associated complexes. It achieves a Pearson correlation coefficient of 0.735 on about 400 mutations on unseen complexes. Its generalization capability on blind datasets of complexes is higher than the state-of-the-art methods. We show that taking into account the evolutionary constraints on residues contributes to predictions. We also discuss the influence of conformational variability on performance. Beyond the predictive power on the effects of mutations, DLA is a general framework for transferring the knowledge gained from the available non-redundant set of complex protein structures to various tasks. For instance, given a single partially masked cube, it recovers the identity and physicochemical class of the central residue. Given an ensemble of cubes representing an interface, it predicts the function of the complex. AVAILABILITY AND IMPLEMENTATION Source code and models are available at http://gitlab.lcqb.upmc.fr/DLA/DLA.git.
Collapse
Affiliation(s)
- Yasser Mohseni Behbahani
- Laboratory of Computational and Quantitative Biology (LCQB), UMR 7238, Sorbonne Université, CNRS, IBPS, Paris 75005, France
| | - Elodie Laine
- Laboratory of Computational and Quantitative Biology (LCQB), UMR 7238, Sorbonne Université, CNRS, IBPS, Paris 75005, France
| | - Alessandra Carbone
- Laboratory of Computational and Quantitative Biology (LCQB), UMR 7238, Sorbonne Université, CNRS, IBPS, Paris 75005, France
| |
Collapse
|
33
|
Pandey P, Ghimire S, Wu B, Alexov E. On the linkage of thermodynamics and pathogenicity. Curr Opin Struct Biol 2023; 80:102572. [PMID: 36965249 PMCID: PMC10239362 DOI: 10.1016/j.sbi.2023.102572] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2023] [Revised: 02/16/2023] [Accepted: 02/21/2023] [Indexed: 03/27/2023]
Abstract
This review outlines the effect of disease-causing mutations on proteins' thermodynamics. Two major thermodynamics quantities, which are essential for structural integrity, the folding and binding free energy changes caused by missense mutations, are considered. It is emphasized that disease effects in case of complex diseases may originate from several mutations over several genes, while monogenic diseases are caused by mutation is a single gene. Nevertheless, in both cases it is shown that pathogenic mutations cause larger perturbations of the above-mentioned thermodynamics quantities as compared with the benign mutations. Recent works demonstrating the effect of pathogenic mutations on the above-mentioned thermodynamics quantities, as well as on structural dynamics and allosteric pathways, are reviewed.
Collapse
Affiliation(s)
- Preeti Pandey
- Department of Physics and Astronomy, Clemson University, Clemson, SC 29634, USA
| | - Sanjeev Ghimire
- Department of Physics and Astronomy, Clemson University, Clemson, SC 29634, USA
| | - Bohua Wu
- Department of Physics and Astronomy, Clemson University, Clemson, SC 29634, USA
| | - Emil Alexov
- Department of Physics and Astronomy, Clemson University, Clemson, SC 29634, USA.
| |
Collapse
|
34
|
Shen L, Feng H, Qiu Y, Wei GW. SVSBI: sequence-based virtual screening of biomolecular interactions. Commun Biol 2023; 6:536. [PMID: 37202415 PMCID: PMC10195826 DOI: 10.1038/s42003-023-04866-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2023] [Accepted: 04/24/2023] [Indexed: 05/20/2023] Open
Abstract
Virtual screening (VS) is a critical technique in understanding biomolecular interactions, particularly in drug design and discovery. However, the accuracy of current VS models heavily relies on three-dimensional (3D) structures obtained through molecular docking, which is often unreliable due to the low accuracy. To address this issue, we introduce a sequence-based virtual screening (SVS) as another generation of VS models that utilize advanced natural language processing (NLP) algorithms and optimized deep K-embedding strategies to encode biomolecular interactions without relying on 3D structure-based docking. We demonstrate that SVS outperforms state-of-the-art performance for four regression datasets involving protein-ligand binding, protein-protein, protein-nucleic acid binding, and ligand inhibition of protein-protein interactions and five classification datasets for protein-protein interactions in five biological species. SVS has the potential to transform current practices in drug discovery and protein engineering.
Collapse
Affiliation(s)
- Li Shen
- Department of Mathematics, Michigan State University, East Lansing, MI, 48824, USA
| | - Hongsong Feng
- Department of Mathematics, Michigan State University, East Lansing, MI, 48824, USA
| | - Yuchi Qiu
- Department of Mathematics, Michigan State University, East Lansing, MI, 48824, USA
| | - Guo-Wei Wei
- Department of Mathematics, Michigan State University, East Lansing, MI, 48824, USA.
- Department of Electrical and Computer Engineering, Michigan State University, East Lansing, MI, 48824, USA.
- Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, MI, 48824, USA.
| |
Collapse
|
35
|
Chen Z, Wang X, Chen X, Huang J, Wang C, Wang J, Wang Z. Accelerating therapeutic protein design with computational approaches toward the clinical stage. Comput Struct Biotechnol J 2023; 21:2909-2926. [PMID: 38213894 PMCID: PMC10781723 DOI: 10.1016/j.csbj.2023.04.027] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2023] [Revised: 04/11/2023] [Accepted: 04/27/2023] [Indexed: 01/13/2024] Open
Abstract
Therapeutic protein, represented by antibodies, is of increasing interest in human medicine. However, clinical translation of therapeutic protein is still largely hindered by different aspects of developability, including affinity and selectivity, stability and aggregation prevention, solubility and viscosity reduction, and deimmunization. Conventional optimization of the developability with widely used methods, like display technologies and library screening approaches, is a time and cost-intensive endeavor, and the efficiency in finding suitable solutions is still not enough to meet clinical needs. In recent years, the accelerated advancement of computational methodologies has ushered in a transformative era in the field of therapeutic protein design. Owing to their remarkable capabilities in feature extraction and modeling, the integration of cutting-edge computational strategies with conventional techniques presents a promising avenue to accelerate the progression of therapeutic protein design and optimization toward clinical implementation. Here, we compared the differences between therapeutic protein and small molecules in developability and provided an overview of the computational approaches applicable to the design or optimization of therapeutic protein in several developability issues.
Collapse
Affiliation(s)
- Zhidong Chen
- Department of Pathology, The Eighth Affiliated Hospital, Sun Yat-sen University, Shenzhen 518033, China
- School of Pharmaceutical Sciences, Shenzhen Campus of Sun Yat-sen University, Shenzhen 518107, China
| | - Xinpei Wang
- School of Pharmaceutical Sciences, Shenzhen Campus of Sun Yat-sen University, Shenzhen 518107, China
| | - Xu Chen
- School of Pharmaceutical Sciences, Shenzhen Campus of Sun Yat-sen University, Shenzhen 518107, China
| | - Juyang Huang
- School of Pharmaceutical Sciences, Shenzhen Campus of Sun Yat-sen University, Shenzhen 518107, China
| | - Chenglin Wang
- Shenzhen Qiyu Biotechnology Co., Ltd, Shenzhen 518107, China
| | - Junqing Wang
- School of Pharmaceutical Sciences, Shenzhen Campus of Sun Yat-sen University, Shenzhen 518107, China
| | - Zhe Wang
- Department of Pathology, The Eighth Affiliated Hospital, Sun Yat-sen University, Shenzhen 518033, China
| |
Collapse
|
36
|
Muniyappan S, Rayan AXA, Varrieth GT. DTiGNN: Learning drug-target embedding from a heterogeneous biological network based on a two-level attention-based graph neural network. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2023; 20:9530-9571. [PMID: 37161255 DOI: 10.3934/mbe.2023419] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/11/2023]
Abstract
MOTIVATION In vitro experiment-based drug-target interaction (DTI) exploration demands more human, financial and data resources. In silico approaches have been recommended for predicting DTIs to reduce time and cost. During the drug development process, one can analyze the therapeutic effect of the drug for a particular disease by identifying how the drug binds to the target for treating that disease. Hence, DTI plays a major role in drug discovery. Many computational methods have been developed for DTI prediction. However, the existing methods have limitations in terms of capturing the interactions via multiple semantics between drug and target nodes in a heterogeneous biological network (HBN). METHODS In this paper, we propose a DTiGNN framework for identifying unknown drug-target pairs. The DTiGNN first calculates the similarity between the drug and target from multiple perspectives. Then, the features of drugs and targets from each perspective are learned separately by using a novel method termed an information entropy-based random walk. Next, all of the learned features from different perspectives are integrated into a single drug and target similarity network by using a multi-view convolutional neural network. Using the integrated similarity networks, drug interactions, drug-disease associations, protein interactions and protein-disease association, the HBN is constructed. Next, a novel embedding algorithm called a meta-graph guided graph neural network is used to learn the embedding of drugs and targets. Then, a convolutional neural network is employed to infer new DTIs after balancing the sample using oversampling techniques. RESULTS The DTiGNN is applied to various datasets, and the result shows better performance in terms of the area under receiver operating characteristic curve (AUC) and area under precision-recall curve (AUPR), with scores of 0.98 and 0.99, respectively. There are 23,739 newly predicted DTI pairs in total.
Collapse
Affiliation(s)
- Saranya Muniyappan
- Computer Science and Engineering, CEG Campus, Anna University, Tamil Nadu, India
| | | | | |
Collapse
|
37
|
Zeng L, Lu Y, Yan W, Yang Y. A Protein Co-Conservation Network Model Characterizes Mutation Effects on SARS-CoV-2 Spike Protein. Int J Mol Sci 2023; 24:ijms24043255. [PMID: 36834664 PMCID: PMC9960056 DOI: 10.3390/ijms24043255] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2023] [Revised: 02/03/2023] [Accepted: 02/04/2023] [Indexed: 02/10/2023] Open
Abstract
The emergence of numerous variants of SARS-CoV-2 has presented challenges to the global efforts to control the COVID-19 pandemic. The major mutation is in the SARS-CoV-2 viral envelope spike protein that is responsible for virus attachment to the host, and is the main target for host antibodies. It is critically important to study the biological effects of the mutations to understand the mechanisms of how mutations alter viral functions. Here, we propose a protein co-conservation weighted network (PCCN) model only based on the protein sequence to characterize the mutation sites by topological features and to investigate the mutation effects on the spike protein from a network view. Frist, we found that the mutation sites on the spike protein had significantly larger centrality than the non-mutation sites. Second, the stability changes and binding free energy changes in the mutation sites were positively significantly correlated with their neighbors' degree and the shortest path length separately. The results indicate that our PCCN model provides new insights into mutations on spike proteins and reflects the mutation effects on protein function alternations.
Collapse
Affiliation(s)
- Lianjie Zeng
- School of Computer Science & Technology, Soochow University, Suzhou 215000, China
- Collaborative Innovation Center of Novel Software Technology and Industrialization, Nanjing 210000, China
| | - Yitan Lu
- Department of Bioinformatics, School of Biology and Basic Medical Sciences, Medical College of Soochow University, Suzhou 215123, China
| | - Wenying Yan
- Department of Bioinformatics, School of Biology and Basic Medical Sciences, Medical College of Soochow University, Suzhou 215123, China
- Jiangsu Province Engineering Research Center of Precision Diagnostics and Therapeutics Development, Suzhou 215123, China
- Correspondence: (W.Y.); (Y.Y.)
| | - Yang Yang
- School of Computer Science & Technology, Soochow University, Suzhou 215000, China
- Collaborative Innovation Center of Novel Software Technology and Industrialization, Nanjing 210000, China
- Correspondence: (W.Y.); (Y.Y.)
| |
Collapse
|
38
|
Qiu Y, Wei GW. Persistent spectral theory-guided protein engineering. NATURE COMPUTATIONAL SCIENCE 2023; 3:149-163. [PMID: 37637776 PMCID: PMC10456983 DOI: 10.1038/s43588-022-00394-y] [Citation(s) in RCA: 22] [Impact Index Per Article: 22.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/09/2022] [Accepted: 12/22/2022] [Indexed: 08/29/2023]
Abstract
While protein engineering, which iteratively optimizes protein fitness by screening the gigantic mutational space, is constrained by experimental capacity, various machine learning models have substantially expedited protein engineering. Three-dimensional protein structures promise further advantages, but their intricate geometric complexity hinders their applications in deep mutational screening. Persistent homology, an established algebraic topology tool for protein structural complexity reduction, fails to capture the homotopic shape evolution during the filtration of a given data. This work introduces a Topology-offered protein Fitness (TopFit) framework to complement protein sequence and structure embeddings. Equipped with an ensemble regression strategy, TopFit integrates the persistent spectral theory, a new topological Laplacian, and two auxiliary sequence embeddings to capture mutation-induced topological invariant, shape evolution, and sequence disparity in the protein fitness landscape. The performance of TopFit is assessed by 34 benchmark datasets with 128,634 variants, involving a vast variety of protein structure acquisition modalities and training set size variations.
Collapse
Affiliation(s)
- Yuchi Qiu
- Department of Mathematics, Michigan State University, East Lansing, MI 48824, USA
| | - Guo-Wei Wei
- Department of Mathematics, Michigan State University, East Lansing, MI 48824, USA
- Department of Biochemistry and Molecular Biology, Michigan State University, MI, 48824, USA
- Department of Electrical and Computer Engineering, Michigan State University, MI 48824, USA
| |
Collapse
|
39
|
Khan A, Cowen-Rivers AI, Grosnit A, Deik DGX, Robert PA, Greiff V, Smorodina E, Rawat P, Akbar R, Dreczkowski K, Tutunov R, Bou-Ammar D, Wang J, Storkey A, Bou-Ammar H. Toward real-world automated antibody design with combinatorial Bayesian optimization. CELL REPORTS METHODS 2023; 3:100374. [PMID: 36814835 PMCID: PMC9939385 DOI: 10.1016/j.crmeth.2022.100374] [Citation(s) in RCA: 12] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/29/2022] [Revised: 10/08/2022] [Accepted: 12/07/2022] [Indexed: 06/14/2023]
Abstract
Antibodies are multimeric proteins capable of highly specific molecular recognition. The complementarity determining region 3 of the antibody variable heavy chain (CDRH3) often dominates antigen-binding specificity. Hence, it is a priority to design optimal antigen-specific CDRH3 to develop therapeutic antibodies. The combinatorial structure of CDRH3 sequences makes it impossible to query binding-affinity oracles exhaustively. Moreover, antibodies are expected to have high target specificity and developability. Here, we present AntBO, a combinatorial Bayesian optimization framework utilizing a CDRH3 trust region for an in silico design of antibodies with favorable developability scores. The in silico experiments on 159 antigens demonstrate that AntBO is a step toward practically viable in vitro antibody design. In under 200 calls to the oracle, AntBO suggests antibodies outperforming the best binding sequence from 6.9 million experimentally obtained CDRH3s. Additionally, AntBO finds very-high-affinity CDRH3 in only 38 protein designs while requiring no domain knowledge.
Collapse
Affiliation(s)
- Asif Khan
- School of Informatics, University of Edinburgh, Edinburgh EH8 9YL, UK
| | | | | | | | - Philippe A. Robert
- Department of Immunology, University of Oslo and Oslo University Hospital, Oslo 0315, Norway
| | - Victor Greiff
- Department of Immunology, University of Oslo and Oslo University Hospital, Oslo 0315, Norway
| | - Eva Smorodina
- Department of Immunology, University of Oslo and Oslo University Hospital, Oslo 0315, Norway
| | - Puneet Rawat
- Department of Immunology, University of Oslo and Oslo University Hospital, Oslo 0315, Norway
| | - Rahmad Akbar
- Department of Immunology, University of Oslo and Oslo University Hospital, Oslo 0315, Norway
| | | | | | - Dany Bou-Ammar
- American University of Beirut Medical Centre, Beirut 11-0236, Lebanon
| | - Jun Wang
- Huawei Noah’s Ark Lab, London N1C 4AG, UK
- University College London, London WC1E 6BT, UK
| | - Amos Storkey
- School of Informatics, University of Edinburgh, Edinburgh EH8 9YL, UK
| | - Haitham Bou-Ammar
- Huawei Noah’s Ark Lab, London N1C 4AG, UK
- University College London, London WC1E 6BT, UK
| |
Collapse
|
40
|
Rogers JR, Nikolényi G, AlQuraishi M. Growing ecosystem of deep learning methods for modeling protein-protein interactions. Protein Eng Des Sel 2023; 36:gzad023. [PMID: 38102755 DOI: 10.1093/protein/gzad023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2023] [Revised: 12/06/2023] [Accepted: 12/07/2023] [Indexed: 12/17/2023] Open
Abstract
Numerous cellular functions rely on protein-protein interactions. Efforts to comprehensively characterize them remain challenged however by the diversity of molecular recognition mechanisms employed within the proteome. Deep learning has emerged as a promising approach for tackling this problem by exploiting both experimental data and basic biophysical knowledge about protein interactions. Here, we review the growing ecosystem of deep learning methods for modeling protein interactions, highlighting the diversity of these biophysically informed models and their respective trade-offs. We discuss recent successes in using representation learning to capture complex features pertinent to predicting protein interactions and interaction sites, geometric deep learning to reason over protein structures and predict complex structures, and generative modeling to design de novo protein assemblies. We also outline some of the outstanding challenges and promising new directions. Opportunities abound to discover novel interactions, elucidate their physical mechanisms, and engineer binders to modulate their functions using deep learning and, ultimately, unravel how protein interactions orchestrate complex cellular behaviors.
Collapse
Affiliation(s)
- Julia R Rogers
- Department of Systems Biology, Columbia University, New York, NY 10032, USA
| | - Gergő Nikolényi
- Department of Systems Biology, Columbia University, New York, NY 10032, USA
| | | |
Collapse
|
41
|
Vandaele R, Mukherjee P, Selby HM, Shah RP, Gevaert O. Topological data analysis of thoracic radiographic images shows improved radiomics-based lung tumor histology prediction. PATTERNS (NEW YORK, N.Y.) 2023; 4:100657. [PMID: 36699734 PMCID: PMC9868648 DOI: 10.1016/j.patter.2022.100657] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 05/14/2022] [Revised: 07/15/2022] [Accepted: 11/15/2022] [Indexed: 12/15/2022]
Abstract
Topological data analysis provides tools to capture wide-scale structural shape information in data. Its main method, persistent homology, has found successful applications to various machine-learning problems. Despite its recent gain in popularity, much of its potential for medical image analysis remains undiscovered. We explore the prominent learning problems on thoracic radiographic images of lung tumors for which persistent homology improves radiomic-based learning. It turns out that our topological features well capture complementary information important for benign versus malignant and adenocarcinoma versus squamous cell carcinoma tumor prediction while contributing less consistently to small cell versus non-small cell-an interesting result in its own right. Furthermore, while radiomic features are better for predicting malignancy scores assigned by expert radiologists through visual inspection, we find that topological features are better for predicting more accurate histology assessed through long-term radiology review, biopsy, surgical resection, progression, or response.
Collapse
Affiliation(s)
- Robin Vandaele
- Department of Applied Mathematics, Computer Science and Statistics, Ghent University, 9000 Ghent, Belgium.,Data Mining and Modeling for Biomedicine, VIB Inflammation Research Center, 9052 Ghent, Belgium.,IDLab, Department of Electronics and Information Systems, Ghent University, Gent, Belgium
| | - Pritam Mukherjee
- Stanford Center for Biomedical Informatics Research (BMIR), Department of Medicine, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Heather Marie Selby
- Stanford Center for Biomedical Informatics Research (BMIR), Department of Medicine, Stanford University School of Medicine, Stanford, CA 94305, USA.,Department of Biomedical Data Science, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Rajesh Pravin Shah
- Veterans Affairs Palo Alto Health Care System, Palo Alto, CA, USA.,Department of Radiology, Stanford University, Stanford, CA, USA
| | - Olivier Gevaert
- Stanford Center for Biomedical Informatics Research (BMIR), Department of Medicine, Stanford University School of Medicine, Stanford, CA 94305, USA.,Department of Biomedical Data Science, Stanford University School of Medicine, Stanford, CA 94305, USA
| |
Collapse
|
42
|
Luo X, Tong F, Zhao W, Zheng X, Li J, Li J, Zhao D. BERT2DAb: a pre-trained model for antibody representation based on amino acid sequences and 2D-structure. MAbs 2023; 15:2285904. [PMID: 38010801 DOI: 10.1080/19420862.2023.2285904] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2023] [Accepted: 11/16/2023] [Indexed: 11/29/2023] Open
Abstract
Prior research has generated a vast amount of antibody sequences, which has allowed the pre-training of language models on amino acid sequences to improve the efficiency of antibody screening and optimization. However, compared to those for proteins, there are fewer pre-trained language models available for antibody sequences. Additionally, existing pre-trained models solely rely on embedding representations using amino acids or k-mers, which do not explicitly take into account the role of secondary structure features. Here, we present a new pre-trained model called BERT2DAb. This model incorporates secondary structure information based on self-attention to learn representations of antibody sequences. Our model achieves state-of-the-art performance on three downstream tasks, including two antigen-antibody binding classification tasks (precision: 85.15%/94.86%; recall:87.41%/86.15%) and one antigen-antibody complex mutation binding free energy prediction task (Pearson correlation coefficient: 0.77). Moreover, we propose a novel method to analyze the relationship between attention weights and contact states of pairs of subsequences in tertiary structures. This enhances the interpretability of BERT2DAb. Overall, our model demonstrates strong potential for improving antibody screening and design through downstream applications.
Collapse
Affiliation(s)
- Xiaowei Luo
- Information Center, Academy of Military Medical Sciences, Beijing, China
| | - Fan Tong
- Information Center, Academy of Military Medical Sciences, Beijing, China
| | - Wenbin Zhao
- Information Center, Academy of Military Medical Sciences, Beijing, China
| | - Xiangwen Zheng
- Information Center, Academy of Military Medical Sciences, Beijing, China
| | - Jiangyu Li
- Information Center, Academy of Military Medical Sciences, Beijing, China
| | - Jing Li
- State Key Laboratory of Pathogen and Biosecurity, Beijing Institute of Microbiology and Epidemiology, Beijing, China
| | - Dongsheng Zhao
- Information Center, Academy of Military Medical Sciences, Beijing, China
| |
Collapse
|
43
|
Guo Z, Yamaguchi R. Machine learning methods for protein-protein binding affinity prediction in protein design. FRONTIERS IN BIOINFORMATICS 2022; 2:1065703. [PMID: 36591334 PMCID: PMC9800603 DOI: 10.3389/fbinf.2022.1065703] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2022] [Accepted: 12/01/2022] [Indexed: 12/23/2022] Open
Abstract
Protein-protein interactions govern a wide range of biological activity. A proper estimation of the protein-protein binding affinity is vital to design proteins with high specificity and binding affinity toward a target protein, which has a variety of applications including antibody design in immunotherapy, enzyme engineering for reaction optimization, and construction of biosensors. However, experimental and theoretical modelling methods are time-consuming, hinder the exploration of the entire protein space, and deter the identification of optimal proteins that meet the requirements of practical applications. In recent years, the rapid development in machine learning methods for protein-protein binding affinity prediction has revealed the potential of a paradigm shift in protein design. Here, we review the prediction methods and associated datasets and discuss the requirements and construction methods of binding affinity prediction models for protein design.
Collapse
Affiliation(s)
- Zhongliang Guo
- Division of Cancer Systems Biology, Aichi Cancer Center Research Institute, Nagoya, Aichi, Japan
| | - Rui Yamaguchi
- Division of Cancer Systems Biology, Aichi Cancer Center Research Institute, Nagoya, Aichi, Japan,Division of Cancer Informatics, Nagoya University Graduate School of Medicine, Nagoya, Aichi, Japan,*Correspondence: Rui Yamaguchi,
| |
Collapse
|
44
|
Chen J, Qiu Y, Wang R, Wei GW. Persistent Laplacian projected Omicron BA.4 and BA.5 to become new dominating variants. Comput Biol Med 2022; 151:106262. [PMID: 36379191 PMCID: PMC10754203 DOI: 10.1016/j.compbiomed.2022.106262] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2022] [Revised: 10/21/2022] [Accepted: 10/30/2022] [Indexed: 11/15/2022]
Abstract
Due to its high transmissibility, Omicron BA.1 ousted the Delta variant to become a dominating variant in late 2021 and was replaced by more transmissible Omicron BA.2 in March 2022. An important question is which new variants will dominate in the future. Topology-based deep learning models have had tremendous success in forecasting emerging variants in the past. However, topology is insensitive to homotopic shape evolution in virus-human protein-protein binding, which is crucial to viral evolution and transmission. This challenge is tackled with persistent Laplacian, which is able to capture both the topological change and homotopic shape evolution of data. Persistent Laplacian-based deep learning models are developed to systematically evaluate variant infectivity. Our comparative analysis of Alpha, Beta, Gamma, Delta, Lambda, Mu, and Omicron BA.1, BA.1.1, BA.2, BA.2.11, BA.2.12.1, BA.3, BA.4, and BA.5 unveils that Omicron BA.2.11, BA.2.12.1, BA.3, BA.4, and BA.5 are more contagious than BA.2. In particular, BA.4 and BA.5 are about 36% more infectious than BA.2 and are projected to become new dominant variants by natural selection. Moreover, the proposed models outperform the state-of-the-art methods on three major benchmark datasets for mutation-induced protein-protein binding free energy changes. Our key projection about BA4 and BA.5's dominance made on May 1, 2022 (see arXiv:2205.00532) became a reality in late June 2022.
Collapse
Affiliation(s)
- Jiahui Chen
- Department of Mathematics, Michigan State University, East Lansing, MI 48824, USA
| | - Yuchi Qiu
- Department of Mathematics, Michigan State University, East Lansing, MI 48824, USA
| | - Rui Wang
- Department of Mathematics, Michigan State University, East Lansing, MI 48824, USA
| | - Guo-Wei Wei
- Department of Mathematics, Michigan State University, East Lansing, MI 48824, USA; Department of Electrical and Computer Engineering, Michigan State University, East Lansing, MI 48824, USA; Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, MI 48824, USA.
| |
Collapse
|
45
|
Liu J, Xia KL, Wu J, Yau SST, Wei GW. Biomolecular Topology: Modelling and Analysis. ACTA MATHEMATICA SINICA, ENGLISH SERIES 2022; 38:1901-1938. [PMID: 36407804 PMCID: PMC9640850 DOI: 10.1007/s10114-022-2326-5] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/27/2022] [Accepted: 07/12/2022] [Indexed: 05/25/2023]
Abstract
With the great advancement of experimental tools, a tremendous amount of biomolecular data has been generated and accumulated in various databases. The high dimensionality, structural complexity, the nonlinearity, and entanglements of biomolecular data, ranging from DNA knots, RNA secondary structures, protein folding configurations, chromosomes, DNA origami, molecular assembly, to others at the macromolecular level, pose a severe challenge in their analysis and characterization. In the past few decades, mathematical concepts, models, algorithms, and tools from algebraic topology, combinatorial topology, computational topology, and topological data analysis, have demonstrated great power and begun to play an essential role in tackling the biomolecular data challenge. In this work, we introduce biomolecular topology, which concerns the topological problems and models originated from the biomolecular systems. More specifically, the biomolecular topology encompasses topological structures, properties and relations that are emerged from biomolecular structures, dynamics, interactions, and functions. We discuss the various types of biomolecular topology from structures (of proteins, DNAs, and RNAs), protein folding, and protein assembly. A brief discussion of databanks (and databases), theoretical models, and computational algorithms, is presented. Further, we systematically review related topological models, including graphs, simplicial complexes, persistent homology, persistent Laplacians, de Rham-Hodge theory, Yau-Hausdorff distance, and the topology-based machine learning models.
Collapse
Affiliation(s)
- Jian Liu
- School of Mathematical Sciences, Hebei Normal University, Shijiazhuang, 050024 P. R. China
- Yanqi Lake Beijing Institute of Mathematical Sciences and Applications, Beijing, 101408 P. R. China
| | - Ke-Lin Xia
- School of Physical and Mathematical Sciences, Nanyang Technological University, Singapore, 639798 Singapore
| | - Jie Wu
- Yanqi Lake Beijing Institute of Mathematical Sciences and Applications, Beijing, 101408 P. R. China
- Department of Mathematical Sciences, Tsinghua University, Beijing, 100084 P. R. China
| | - Stephen Shing-Toung Yau
- Yanqi Lake Beijing Institute of Mathematical Sciences and Applications, Beijing, 101408 P. R. China
- Department of Mathematical Sciences, Tsinghua University, Beijing, 100084 P. R. China
| | - Guo-Wei Wei
- Department of Mathematics & Department of Biochemistry and Molecular Biology & Department of Electrical and Computer Engineering, Michigan State University, Wells Hall 619 Red Cedar Road, East Lansing, MI 48824-1027 USA
| |
Collapse
|
46
|
Qiu Y, Wei GW. CLADE 2.0: Evolution-Driven Cluster Learning-Assisted Directed Evolution. J Chem Inf Model 2022; 62:4629-4641. [PMID: 36154171 DOI: 10.1021/acs.jcim.2c01046] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Directed evolution, a revolutionary biotechnology in protein engineering, optimizes protein fitness by searching an astronomical mutational space via expensive experiments. The cluster learning-assisted directed evolution (CLADE) efficiently explores the mutational space via a combination of unsupervised hierarchical clustering and supervised learning. However, the initial-stage sampling in CLADE treats all clusters equally despite many clusters containing a large portion of non-functional mutations. Recent statistical and deep learning tools enable evolutionary density modeling to access protein fitness in an unsupervised manner. In this work, we construct an ensemble of multiple evolutionary scores to guide the initial sampling in CLADE. The resulting evolutionary score-enhanced CLADE, called CLADE 2.0, efficiently selects a training set within a small informative space using the evolution-driven clustering sampling. CLADE 2.0 is validated by using two benchmark libraries both having 160,000 sequences from four-site mutational combinations. Extensive computational experiments and comparisons with existing cutting-edge methods indicate that CLADE 2.0 is a new state-of-art tool for machine learning-assisted directed evolution.
Collapse
Affiliation(s)
- Yuchi Qiu
- Department of Mathematics, Michigan State University, East Lansing, Michigan 48824, United States
| | - Guo-Wei Wei
- Department of Mathematics, Michigan State University, East Lansing, Michigan 48824, United States.,Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, Michigan 48824, United States.,Department of Electrical and Computer Engineering, Michigan State University, East Lansing, Michigan 48824, United States
| |
Collapse
|
47
|
Deep learning methods for molecular representation and property prediction. Drug Discov Today 2022; 27:103373. [PMID: 36167282 DOI: 10.1016/j.drudis.2022.103373] [Citation(s) in RCA: 35] [Impact Index Per Article: 17.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2022] [Revised: 08/22/2022] [Accepted: 09/21/2022] [Indexed: 01/11/2023]
Abstract
With advances in artificial intelligence (AI) methods, computer-aided drug design (CADD) has developed rapidly in recent years. Effective molecular representation and accurate property prediction are crucial tasks in CADD workflows. In this review, we summarize contemporary applications of deep learning (DL) methods for molecular representation and property prediction. We categorize DL methods according to the format of molecular data (1D, 2D, and 3D). In addition, we discuss some common DL models, such as ensemble learning and transfer learning, and analyze the interpretability methods for these models. We also highlight the challenges and opportunities of DL methods for molecular representation and property prediction.
Collapse
|
48
|
Wei GW. Topological AI forecasting of future dominating viral variants. ARXIV 2022:arXiv:2209.03229v1. [PMID: 36118666 PMCID: PMC9479042] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
Abstract
The understanding of the mechanisms of SARS-CoV-2 evolution and transmission is one of the greatest challenges of our time. By integrating artificial intelligence (AI), viral genomes isolated from patients, tens of thousands of mutational data, biophysics, bioinformatics, and algebraic topology, the SARS-CoV-2 evolution was revealed to be governed by infectivity-based natural selection. Two key mutation sites, L452 and N501 on the viral spike protein receptor-binding domain (RBD), were predicted in summer 2020, long before they occur in prevailing variants Alpha, Beta, Gamma, Delta, Kappa, Theta, Lambda, Mu, and Omicron. Recent studies identified a new mechanism of natural selection: antibody resistance. AI-based forecasting of Omicron's infectivity, vaccine breakthrough, and antibody resistance was later nearly perfectly confirmed by experiments. The replacement of dominant BA.1 by BA.2 in later March was predicted in early February. On May 1, 2022, persistent Laplacian-based AI projected Omicron BA.4 and BA.5 to become the new dominating COVID-19 variants. This prediction became reality in late June. Topological AI models offer accurate prediction of mutational impacts on the efficacy of monoclonal antibodies (mAbs).
Collapse
Affiliation(s)
- Guo-Wei Wei
- Department of Mathematics, Michigan State University, MI 48824, USA
- Department of Electrical and Computer Engineering, Michigan State University, MI 48824, USA
- Department of Biochemistry and Molecular Biology, Michigan State University, MI 48824, USA
| |
Collapse
|
49
|
Woodard J, Iqbal S, Mashaghi A. Circuit topology predicts pathogenicity of missense mutations. Proteins 2022; 90:1634-1644. [PMID: 35394672 PMCID: PMC9543832 DOI: 10.1002/prot.26342] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2021] [Revised: 03/07/2022] [Accepted: 03/30/2022] [Indexed: 12/05/2022]
Abstract
The contact topology of a protein determines important aspects of the folding process. The topological measure of contact order has been shown to be predictive of the rate of folding. Circuit topology is emerging as another fundamental descriptor of biomolecular structure, with predicted effects on the folding rate. We analyze the residue‐based circuit topological environments of 21 K mutations labeled as pathogenic or benign. Multiple statistical lines of reasoning support the conclusion that the number of contacts in two specific circuit topological arrangements, namely inverse parallel and cross relations, with contacts involving the mutated residue have discriminatory value in determining the pathogenicity of human variants. We investigate how results vary with residue type and according to whether the gene is essential. We further explore the relationship to a number of structural features and find that circuit topology provides nonredundant information on protein structures and pathogenicity of mutations. Results may have implications for the polymer physics of protein folding and suggest that “local” topological information, including residue‐based circuit topology and residue contact order, could be useful in improving state‐of‐the‐art machine learning algorithms for pathogenicity prediction.
Collapse
Affiliation(s)
- Jaie Woodard
- Medical Systems Biophysics and Bioengineering, Leiden Academic Centre for Drug Research, Faculty of Science, Leiden University, Leiden, The Netherlands.,Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, USA
| | - Sumaiya Iqbal
- Center for the Development of Therapeutics, Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA.,Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA.,Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA.,Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, Massachusetts, USA
| | - Alireza Mashaghi
- Medical Systems Biophysics and Bioengineering, Leiden Academic Centre for Drug Research, Faculty of Science, Leiden University, Leiden, The Netherlands.,Centre for Interdisciplinary Genome Research, Faculty of Science, Leiden University, Leiden, The Netherlands
| |
Collapse
|
50
|
Liu X, Feng H, Wu J, Xia K. Hom-Complex-Based Machine Learning (HCML) for the Prediction of Protein-Protein Binding Affinity Changes upon Mutation. J Chem Inf Model 2022; 62:3961-3969. [PMID: 36040839 DOI: 10.1021/acs.jcim.2c00580] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Protein-protein interactions (PPIs) are involved in almost all biological processes in the cell. Understanding protein-protein interactions holds the key for the understanding of biological functions, diseases and the development of therapeutics. Recently, artificial intelligence (AI) models have demonstrated great power in PPIs. However, a key issue for all AI-based PPI models is efficient molecular representations and featurization. Here, we propose Hom-complex-based PPI representation, and Hom-complex-based machine learning models for the prediction of PPI binding affinity changes upon mutation, for the first time. In our model, various Hom complexes Hom(G1, G) can be generated for the graph representation G of protein-protein complex by using different graphs G1, which reveal G1-related inner connections within the graph representation G of protein-protein complex. Further, for a specific graph G1, a series of nested Hom complexes are generated to give a multiscale characterization of the PPIs. Its persistent homology and persistent Euler characteristic are used as molecular descriptors and further combined with the machine learning model, in particular, gradient boosting tree (GBT). We systematically test our model on the two most-commonly used data sets, that is, SKEMPI and AB-Bind. It has been found that our model outperforms all the existing models as far as we know, which demonstrates the great potential of our model for the analysis of PPIs. Our model can be used for the analysis and design of efficient antibodies for SARS-CoV-2.
Collapse
Affiliation(s)
- Xiang Liu
- Chern Institute of Mathematics and LPMC, Nankai University, Tianjin, China, 300071.,Division of Mathematical Sciences, School of Physical and Mathematical Sciences Nanyang Technological University, Singapore 637371
| | - Huitao Feng
- Division of Mathematical Sciences, School of Physical and Mathematical Sciences Nanyang Technological University, Singapore 637371.,Mathematical Science Research Center, Chongqing University of Technology, Chongqing, China, 400054
| | - Jie Wu
- Yanqi Lake Beijing Institute of Mathematical Sciences and Applications (BIMSA), Beijing, China,101408
| | - Kelin Xia
- Division of Mathematical Sciences, School of Physical and Mathematical Sciences Nanyang Technological University, Singapore 637371
| |
Collapse
|