1
|
Rimal P, Panday SK, Xu W, Peng Y, Alexov E. SAAMBE-MEM: a sequence-based method for predicting binding free energy change upon mutation in membrane protein-protein complexes. BIOINFORMATICS (OXFORD, ENGLAND) 2024; 40:btae544. [PMID: 39240325 PMCID: PMC11407696 DOI: 10.1093/bioinformatics/btae544] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/26/2024] [Revised: 08/04/2024] [Accepted: 09/04/2024] [Indexed: 09/07/2024]
Abstract
MOTIVATION Mutations in protein-protein interactions can affect the corresponding complexes, impacting function and potentially leading to disease. Given the abundance of membrane proteins, it is crucial to assess the impact of mutations on the binding affinity of these proteins. Although several methods exist to predict the binding free energy change due to mutations in protein-protein complexes, most require structural information of the protein complex and are primarily trained on the SKEMPI database, which is composed mainly of soluble proteins. RESULTS A novel sequence-based method (SAAMBE-MEM) for predicting binding free energy changes (ΔΔG) in membrane protein-protein complexes due to mutations has been developed. This method utilized the MPAD database, which contains binding affinities for wild-type and mutant membrane protein complexes. A machine learning model was developed to predict ΔΔG by leveraging features such as amino acid indices and position-specific scoring matrices (PSSM). Through extensive dataset curation and feature extraction, SAAMBE-MEM was trained and validated using the XGBoost regression algorithm. The optimal feature set, including PSSM-related features, achieved a Pearson correlation coefficient of 0.64, outperforming existing methods trained on the SKEMPI database. Furthermore, it was demonstrated that SAAMBE-MEM performs much better when utilizing evolution-based features in contrast to physicochemical features. AVAILABILITY AND IMPLEMENTATION The method is accessible via a web server and standalone code at http://compbio.clemson.edu/SAAMBE-MEM/. The cleaned MPAD database is available at the website.
Collapse
Affiliation(s)
- Prawin Rimal
- Department of Physics and Astronomy, Clemson University, Clemson, SC 29634, United States
| | - Shailesh Kumar Panday
- Department of Physics and Astronomy, Clemson University, Clemson, SC 29634, United States
| | - Wang Xu
- Institute of Biophysics and Department of Physics, Central China Normal University, Wuhan, Hubei 430079, China
| | - Yunhui Peng
- Institute of Biophysics and Department of Physics, Central China Normal University, Wuhan, Hubei 430079, China
| | - Emil Alexov
- Department of Physics and Astronomy, Clemson University, Clemson, SC 29634, United States
| |
Collapse
|
2
|
Nikam R, Jemimah S, Gromiha MM. DeepPPAPredMut: deep ensemble method for predicting the binding affinity change in protein-protein complexes upon mutation. Bioinformatics 2024; 40:btae309. [PMID: 38718170 PMCID: PMC11112046 DOI: 10.1093/bioinformatics/btae309] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2023] [Revised: 04/08/2024] [Accepted: 05/08/2024] [Indexed: 05/24/2024] Open
Abstract
MOTIVATION Protein-protein interactions underpin many cellular processes and their disruption due to mutations can lead to diseases. With the evolution of protein structure prediction methods like AlphaFold2 and the availability of extensive experimental affinity data, there is a pressing need for updated computational tools that can efficiently predict changes in binding affinity caused by mutations in protein-protein complexes. RESULTS We developed a deep ensemble model that leverages protein sequences, predicted structure-based features, and protein functional classes to accurately predict the change in binding affinity due to mutations. The model achieved a correlation of 0.97 and a mean absolute error (MAE) of 0.35 kcal/mol on the training dataset, and maintained robust performance on the test set with a correlation of 0.72 and a MAE of 0.83 kcal/mol. Further validation using Leave-One-Out Complex (LOOC) cross-validation exhibited a correlation of 0.83 and a MAE of 0.51 kcal/mol, indicating consistent performance. AVAILABILITY AND IMPLEMENTATION https://web.iitm.ac.in/bioinfo2/DeepPPAPredMut/index.html.
Collapse
Affiliation(s)
- Rahul Nikam
- Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology Madras, Chennai 600036, India
| | - Sherlyn Jemimah
- Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology Madras, Chennai 600036, India
- Department of Biomedical Engineering, Khalifa University, P.O. Box: 127788 , Abu Dhabi, United Arab Emirates
| | - M Michael Gromiha
- Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology Madras, Chennai 600036, India
- Department of Computer Science, Tokyo Tech World Research Hub Initiative (WRHI), Institute of Innovative Research, Tokyo Institute of Technology, 4259 Nagatsutacho, Midori-ku, Yokohama, Kanagawa 226-8501, Japan
| |
Collapse
|
3
|
Ridha F, Gromiha MM. MPA-Pred: A machine learning approach for predicting the binding affinity of membrane protein-protein complexes. Proteins 2024; 92:499-508. [PMID: 37949651 DOI: 10.1002/prot.26633] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2023] [Revised: 10/05/2023] [Accepted: 10/25/2023] [Indexed: 11/12/2023]
Abstract
Membrane protein-protein interactions are essential for several functions including cell signaling, ion transport, and enzymatic activity. These interactions are mainly dictated by their binding affinities. Although several methods are available for predicting the binding affinity of protein-protein complexes, there exists no specific method for membrane protein-protein complexes. In this work, we collected the experimental binding affinity data for a set of 114 membrane protein-protein complexes and derived several structure and sequence-based features. Our analysis on the relationship between binding affinity and the features revealed that the important factors mainly depend on the type of membrane protein and the functional class of the protein. Specifically, aromatic and charged residues at the interface, and aromatic-aromatic and electrostatic interactions are found to be important to understand the binding affinity. Further, we developed a method, MPA-Pred, for predicting the binding affinity of membrane protein-protein complexes using a machine learning approach. It showed an average correlation and mean absolute error of 0.83 and 0.91 kcal/mol, respectively, using the jack-knife test on a set of 114 complexes. We have also developed a web server and it is available at https://web.iitm.ac.in/bioinfo2/MPA-Pred/. This method can be used for predicting the affinity of membrane protein-protein complexes at a large scale and aid to improve drug design strategies.
Collapse
Affiliation(s)
- Fathima Ridha
- Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology Madras, Chennai, India
| | - M Michael Gromiha
- Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology Madras, Chennai, India
- Department of Computer Science, National University of Singapore, Singapore, Singapore
| |
Collapse
|
4
|
Sharma D, Rawat P, Greiff V, Janakiraman V, Gromiha MM. Predicting the immune escape of SARS-CoV-2 neutralizing antibodies upon mutation. Biochim Biophys Acta Mol Basis Dis 2024; 1870:166959. [PMID: 37967796 DOI: 10.1016/j.bbadis.2023.166959] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2023] [Revised: 10/25/2023] [Accepted: 11/07/2023] [Indexed: 11/17/2023]
Abstract
COVID-19 has resulted in millions of deaths and severe impact on economies worldwide. Moreover, the emergence of SARS-CoV-2 variants presented significant challenges in controlling the pandemic, particularly their potential to avoid the immune system and evade vaccine immunity. This has led to a growing need for research to predict how mutations in SARS-CoV-2 reduces the ability of antibodies to neutralize the virus. In this study, we assembled a set of 1813 mutations from the interface of SARS-CoV-2 spike protein's receptor binding domain (RBD) and neutralizing antibody complexes and developed a machine learning model to classify high or low escape mutations using interaction energy, inter-residue contacts and predicted binding free energy change. Our approach achieved an Area under the Receiver Operating Characteristics (ROC) Curve (AUC) of 0.91 using the Random Forest classifier on the test dataset with 217 mutations. The model was further utilized to predict the escape mutations on a dataset of 29,165 mutations located at the interface of 83 RBD-neutralizing antibody complexes. A small subset of this dataset was also validated based on available experimental data. We found that top 10 % high escape mutations were dominated by charged to nonpolar mutations whereas low escape mutations were dominated by polar to nonpolar mutations. We believe that the present method will allow prioritization of high/low escape mutations in the context of neutralizing antibodies targeting SARS-CoV-2 RBD region and assist antibody design for current and emerging variants.
Collapse
Affiliation(s)
- Divya Sharma
- Protein Bioinformatics Lab, Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology Madras, Chennai, Tamil Nadu 600036, India
| | - Puneet Rawat
- University of Oslo and Oslo University Hospital, Oslo, Norway
| | - Victor Greiff
- University of Oslo and Oslo University Hospital, Oslo, Norway
| | - Vani Janakiraman
- Infection Biology Lab, Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology Madras, Chennai, Tamil Nadu 600036, India
| | - M Michael Gromiha
- Protein Bioinformatics Lab, Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology Madras, Chennai, Tamil Nadu 600036, India; International Research Frontiers Initiative, School of Computing, Tokyo Institute of Technology, Yokohama 226-8501, Japan; Department of Computer Science, National University of Singapore, Singapore.
| |
Collapse
|
5
|
Krishnan SR, Roy A, Gromiha MM. Reliable method for predicting the binding affinity of RNA-small molecule interactions using machine learning. Brief Bioinform 2024; 25:bbae002. [PMID: 38261341 PMCID: PMC10805179 DOI: 10.1093/bib/bbae002] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2023] [Revised: 12/21/2023] [Accepted: 12/24/2023] [Indexed: 01/24/2024] Open
Abstract
Ribonucleic acids (RNAs) play important roles in cellular regulation. Consequently, dysregulation of both coding and non-coding RNAs has been implicated in several disease conditions in the human body. In this regard, a growing interest has been observed to probe into the potential of RNAs to act as drug targets in disease conditions. To accelerate this search for disease-associated novel RNA targets and their small molecular inhibitors, machine learning models for binding affinity prediction were developed specific to six RNA subtypes namely, aptamers, miRNAs, repeats, ribosomal RNAs, riboswitches and viral RNAs. We found that differences in RNA sequence composition, flexibility and polar nature of RNA-binding ligands are important for predicting the binding affinity. Our method showed an average Pearson correlation (r) of 0.83 and a mean absolute error of 0.66 upon evaluation using the jack-knife test, indicating their reliability despite the low amount of data available for several RNA subtypes. Further, the models were validated with external blind test datasets, which outperform other existing quantitative structure-activity relationship (QSAR) models. We have developed a web server to host the models, RNA-Small molecule binding Affinity Predictor, which is freely available at: https://web.iitm.ac.in/bioinfo2/RSAPred/.
Collapse
Affiliation(s)
- Sowmya R Krishnan
- Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology Madras, Chennai 600036, India
- TCS Research (Life Sciences division), Tata Consultancy Services, Hyderabad 500081, India
| | - Arijit Roy
- TCS Research (Life Sciences division), Tata Consultancy Services, Hyderabad 500081, India
| | - M Michael Gromiha
- Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology Madras, Chennai 600036, India
- International Research Frontiers Initiative, School of Computing, Tokyo Institute of Technology, Yokohama 226-8501, Japan
- Department of Computer Science, National University of Singapore, Singapore 117543
| |
Collapse
|
6
|
Jarończyk M. Software for Predicting Binding Free Energy of Protein-Protein Complexes and Their Mutants. Methods Mol Biol 2024; 2780:139-147. [PMID: 38987468 DOI: 10.1007/978-1-0716-3985-6_9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/12/2024]
Abstract
Protein-protein binding affinity prediction is important for understanding complex biochemical pathways and to uncover protein interaction networks. Quantitative estimation of the binding affinity changes caused by mutations can provide critical information for protein function annotation and genetic disease diagnoses. The binding free energies of protein-protein complexes can be predicted using several computational tools. This chapter is a summary of software developed for the prediction of binding free energies for protein-protein complexes and their mutants.
Collapse
|
7
|
Rana MM, Nguyen DD. Geometric Graph Learning to Predict Changes in Binding Free Energy and Protein Thermodynamic Stability upon Mutation. J Phys Chem Lett 2023; 14:10870-10879. [PMID: 38032742 DOI: 10.1021/acs.jpclett.3c02679] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/02/2023]
Abstract
Accurate prediction of binding free energy changes upon mutations is vital for optimizing drugs, designing proteins, understanding genetic diseases, and cost-effective virtual screening. While machine learning methods show promise in this domain, achieving accuracy and generalization across diverse data sets remains a challenge. This study introduces Geometric Graph Learning for Protein-Protein Interactions (GGL-PPI), a novel approach integrating geometric graph representation and machine learning to forecast mutation-induced binding free energy changes. GGL-PPI leverages atom-level graph coloring and multiscale weighted colored geometric subgraphs to capture structural features of biomolecules, demonstrating superior performance on three standard data sets, namely, AB-Bind, SKEMPI 1.0, and SKEMPI 2.0 data sets. The model's efficacy extends to predicting protein thermodynamic stability in a blind test set, providing unbiased predictions for both direct and reverse mutations and showcasing notable generalization. GGL-PPI's precision in predicting changes in binding free energy and stability due to mutations enhances our comprehension of protein complexes, offering valuable insights for drug design endeavors.
Collapse
Affiliation(s)
- Md Masud Rana
- Department of Mathematics, University of Kentucky, Lexington, Kentucky 40506, United States
| | - Duc Duy Nguyen
- Department of Mathematics, University of Kentucky, Lexington, Kentucky 40506, United States
| |
Collapse
|
8
|
Shirvanizadeh N, Vihinen M. VariBench, new variation benchmark categories and data sets. FRONTIERS IN BIOINFORMATICS 2023; 3:1248732. [PMID: 37795169 PMCID: PMC10546188 DOI: 10.3389/fbinf.2023.1248732] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2023] [Accepted: 09/08/2023] [Indexed: 10/06/2023] Open
Affiliation(s)
| | - Mauno Vihinen
- Department of Experimental Medical Science, Lund University, Lund, Sweden
| |
Collapse
|
9
|
Liu Z, Qian W, Cai W, Song W, Wang W, Maharjan DT, Cheng W, Chen J, Wang H, Xu D, Lin GN. Inferring the Effects of Protein Variants on Protein-Protein Interactions with Interpretable Transformer Representations. RESEARCH (WASHINGTON, D.C.) 2023; 6:0219. [PMID: 37701056 PMCID: PMC10494974 DOI: 10.34133/research.0219] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/04/2023] [Accepted: 08/20/2023] [Indexed: 09/14/2023]
Abstract
Identifying pathogenetic variants and inferring their impact on protein-protein interactions sheds light on their functional consequences on diseases. Limited by the availability of experimental data on the consequences of protein interaction, most existing methods focus on building models to predict changes in protein binding affinity. Here, we introduced MIPPI, an end-to-end, interpretable transformer-based deep learning model that learns features directly from sequences by leveraging the interaction data from IMEx. MIPPI was specifically trained to determine the types of variant impact (increasing, decreasing, disrupting, and no effect) on protein-protein interactions. We demonstrate the accuracy of MIPPI and provide interpretation through the analysis of learned attention weights, which exhibit correlations with the amino acids interacting with the variant. Moreover, we showed the practicality of MIPPI in prioritizing de novo mutations associated with complex neurodevelopmental disorders and the potential to determine the pathogenic and driving mutations. Finally, we experimentally validated the functional impact of several variants identified in patients with such disorders. Overall, MIPPI emerges as a versatile, robust, and interpretable model, capable of effectively predicting mutation impacts on protein-protein interactions and facilitating the discovery of clinically actionable variants.
Collapse
Affiliation(s)
- Zhe Liu
- Shanghai Mental Health Center, Shanghai Jiao Tong University School of Medicine, School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai, China
| | - Wei Qian
- Shanghai Mental Health Center, Shanghai Jiao Tong University School of Medicine, School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai, China
| | - Wenxiang Cai
- Shanghai Mental Health Center, Shanghai Jiao Tong University School of Medicine, School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai, China
| | - Weichen Song
- Shanghai Mental Health Center, Shanghai Jiao Tong University School of Medicine, School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai, China
| | - Weidi Wang
- Shanghai Mental Health Center, Shanghai Jiao Tong University School of Medicine, School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai, China
- Shanghai Key Laboratory of Psychotic Disorders, Shanghai, China
| | - Dhruba Tara Maharjan
- Shanghai Mental Health Center, Shanghai Jiao Tong University School of Medicine, School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai, China
| | - Wenhong Cheng
- Shanghai Mental Health Center, Shanghai Jiao Tong University School of Medicine, School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai, China
| | - Jue Chen
- Shanghai Mental Health Center, Shanghai Jiao Tong University School of Medicine, School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai, China
| | - Han Wang
- School of Information Science and Technology, Institute of Computational Biology, Northeast Normal University, Changchun, China
| | - Dong Xu
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO 65211, USA
- Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, MO 65211, USA
| | - Guan Ning Lin
- Shanghai Mental Health Center, Shanghai Jiao Tong University School of Medicine, School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai, China
- Shanghai Key Laboratory of Psychotic Disorders, Shanghai, China
| |
Collapse
|
10
|
Pandey P, Panday SK, Rimal P, Ancona N, Alexov E. Predicting the Effect of Single Mutations on Protein Stability and Binding with Respect to Types of Mutations. Int J Mol Sci 2023; 24:12073. [PMID: 37569449 PMCID: PMC10418460 DOI: 10.3390/ijms241512073] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2023] [Revised: 07/24/2023] [Accepted: 07/26/2023] [Indexed: 08/13/2023] Open
Abstract
The development of methods and algorithms to predict the effect of mutations on protein stability, protein-protein interaction, and protein-DNA/RNA binding is necessitated by the needs of protein engineering and for understanding the molecular mechanism of disease-causing variants. The vast majority of the leading methods require a database of experimentally measured folding and binding free energy changes for training. These databases are collections of experimental data taken from scientific investigations typically aimed at probing the role of particular residues on the above-mentioned thermodynamic characteristics, i.e., the mutations are not introduced at random and do not necessarily represent mutations originating from single nucleotide variants (SNV). Thus, the reported performance of the leading algorithms assessed on these databases or other limited cases may not be applicable for predicting the effect of SNVs seen in the human population. Indeed, we demonstrate that the SNVs and non-SNVs are not equally presented in the corresponding databases, and the distribution of the free energy changes is not the same. It is shown that the Pearson correlation coefficients (PCCs) of folding and binding free energy changes obtained in cases involving SNVs are smaller than for non-SNVs, indicating that caution should be used in applying them to reveal the effect of human SNVs. Furthermore, it is demonstrated that some methods are sensitive to the chemical nature of the mutations, resulting in PCCs that differ by a factor of four across chemically different mutations. All methods are found to underestimate the energy changes by roughly a factor of 2.
Collapse
Affiliation(s)
- Preeti Pandey
- Department of Physics and Astronomy, Clemson University, Clemson, SC 29634, USA; (P.P.); (S.K.P.); (P.R.)
| | - Shailesh Kumar Panday
- Department of Physics and Astronomy, Clemson University, Clemson, SC 29634, USA; (P.P.); (S.K.P.); (P.R.)
| | - Prawin Rimal
- Department of Physics and Astronomy, Clemson University, Clemson, SC 29634, USA; (P.P.); (S.K.P.); (P.R.)
| | - Nicolas Ancona
- Department of Biological Sciences, Clemson University, Clemson, SC 29634, USA;
| | - Emil Alexov
- Department of Physics and Astronomy, Clemson University, Clemson, SC 29634, USA; (P.P.); (S.K.P.); (P.R.)
| |
Collapse
|
11
|
Chen Z, Wang X, Chen X, Huang J, Wang C, Wang J, Wang Z. Accelerating therapeutic protein design with computational approaches toward the clinical stage. Comput Struct Biotechnol J 2023; 21:2909-2926. [PMID: 38213894 PMCID: PMC10781723 DOI: 10.1016/j.csbj.2023.04.027] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2023] [Revised: 04/11/2023] [Accepted: 04/27/2023] [Indexed: 01/13/2024] Open
Abstract
Therapeutic protein, represented by antibodies, is of increasing interest in human medicine. However, clinical translation of therapeutic protein is still largely hindered by different aspects of developability, including affinity and selectivity, stability and aggregation prevention, solubility and viscosity reduction, and deimmunization. Conventional optimization of the developability with widely used methods, like display technologies and library screening approaches, is a time and cost-intensive endeavor, and the efficiency in finding suitable solutions is still not enough to meet clinical needs. In recent years, the accelerated advancement of computational methodologies has ushered in a transformative era in the field of therapeutic protein design. Owing to their remarkable capabilities in feature extraction and modeling, the integration of cutting-edge computational strategies with conventional techniques presents a promising avenue to accelerate the progression of therapeutic protein design and optimization toward clinical implementation. Here, we compared the differences between therapeutic protein and small molecules in developability and provided an overview of the computational approaches applicable to the design or optimization of therapeutic protein in several developability issues.
Collapse
Affiliation(s)
- Zhidong Chen
- Department of Pathology, The Eighth Affiliated Hospital, Sun Yat-sen University, Shenzhen 518033, China
- School of Pharmaceutical Sciences, Shenzhen Campus of Sun Yat-sen University, Shenzhen 518107, China
| | - Xinpei Wang
- School of Pharmaceutical Sciences, Shenzhen Campus of Sun Yat-sen University, Shenzhen 518107, China
| | - Xu Chen
- School of Pharmaceutical Sciences, Shenzhen Campus of Sun Yat-sen University, Shenzhen 518107, China
| | - Juyang Huang
- School of Pharmaceutical Sciences, Shenzhen Campus of Sun Yat-sen University, Shenzhen 518107, China
| | - Chenglin Wang
- Shenzhen Qiyu Biotechnology Co., Ltd, Shenzhen 518107, China
| | - Junqing Wang
- School of Pharmaceutical Sciences, Shenzhen Campus of Sun Yat-sen University, Shenzhen 518107, China
| | - Zhe Wang
- Department of Pathology, The Eighth Affiliated Hospital, Sun Yat-sen University, Shenzhen 518033, China
| |
Collapse
|
12
|
Liu X, Feng H, Wu J, Xia K. Hom-Complex-Based Machine Learning (HCML) for the Prediction of Protein-Protein Binding Affinity Changes upon Mutation. J Chem Inf Model 2022; 62:3961-3969. [PMID: 36040839 DOI: 10.1021/acs.jcim.2c00580] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Protein-protein interactions (PPIs) are involved in almost all biological processes in the cell. Understanding protein-protein interactions holds the key for the understanding of biological functions, diseases and the development of therapeutics. Recently, artificial intelligence (AI) models have demonstrated great power in PPIs. However, a key issue for all AI-based PPI models is efficient molecular representations and featurization. Here, we propose Hom-complex-based PPI representation, and Hom-complex-based machine learning models for the prediction of PPI binding affinity changes upon mutation, for the first time. In our model, various Hom complexes Hom(G1, G) can be generated for the graph representation G of protein-protein complex by using different graphs G1, which reveal G1-related inner connections within the graph representation G of protein-protein complex. Further, for a specific graph G1, a series of nested Hom complexes are generated to give a multiscale characterization of the PPIs. Its persistent homology and persistent Euler characteristic are used as molecular descriptors and further combined with the machine learning model, in particular, gradient boosting tree (GBT). We systematically test our model on the two most-commonly used data sets, that is, SKEMPI and AB-Bind. It has been found that our model outperforms all the existing models as far as we know, which demonstrates the great potential of our model for the analysis of PPIs. Our model can be used for the analysis and design of efficient antibodies for SARS-CoV-2.
Collapse
Affiliation(s)
- Xiang Liu
- Chern Institute of Mathematics and LPMC, Nankai University, Tianjin, China, 300071.,Division of Mathematical Sciences, School of Physical and Mathematical Sciences Nanyang Technological University, Singapore 637371
| | - Huitao Feng
- Division of Mathematical Sciences, School of Physical and Mathematical Sciences Nanyang Technological University, Singapore 637371.,Mathematical Science Research Center, Chongqing University of Technology, Chongqing, China, 400054
| | - Jie Wu
- Yanqi Lake Beijing Institute of Mathematical Sciences and Applications (BIMSA), Beijing, China,101408
| | - Kelin Xia
- Division of Mathematical Sciences, School of Physical and Mathematical Sciences Nanyang Technological University, Singapore 637371
| |
Collapse
|
13
|
Sharma D, Rawat P, Janakiraman V, Gromiha MM. Elucidating important structural features for the binding affinity of spike - SARS-CoV-2 neutralizing antibody complexes. Proteins 2022; 90:824-834. [PMID: 34761442 PMCID: PMC8661754 DOI: 10.1002/prot.26277] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2021] [Revised: 11/04/2021] [Accepted: 11/07/2021] [Indexed: 12/23/2022]
Abstract
The coronavirus disease 2019 (COVID-19) has affected the lives of millions of people around the world. In an effort to develop therapeutic interventions and control the pandemic, scientists have isolated several neutralizing antibodies against SARS-CoV-2 from the vaccinated and convalescent individuals. These antibodies can be explored further to understand SARS-CoV-2 specific antigen-antibody interactions and biophysical parameters related to binding affinity, which can be utilized to engineer more potent antibodies for current and emerging SARS-CoV-2 variants. In the present study, we have analyzed the interface between spike protein of SARS-CoV-2 and neutralizing antibodies in terms of amino acid residue propensity, pair preference, and atomic interaction energy. We observed that Tyr residues containing contacts are highly preferred and energetically favorable at the interface of spike protein-antibody complexes. We have also developed a regression model to relate the experimental binding affinity for antibodies using structural features, which showed a correlation of 0.93. Moreover, several mutations at the spike protein-antibody interface were identified, which may lead to immune escape (epitope residues) and improved affinity (paratope residues) in current/emerging variants. Overall, the work provides insights into spike protein-antibody interactions, structural parameters related to binding affinity and mutational effects on binding affinity change, which can be helpful to develop better therapeutics against COVID-19.
Collapse
Affiliation(s)
- Divya Sharma
- Protein Bioinformatics Lab, Department of Biotechnology, Bhupat and Jyoti Mehta School of BiosciencesIndian Institute of Technology MadrasChennaiIndia
| | - Puneet Rawat
- Protein Bioinformatics Lab, Department of Biotechnology, Bhupat and Jyoti Mehta School of BiosciencesIndian Institute of Technology MadrasChennaiIndia
| | - Vani Janakiraman
- Infection Biology Lab, Department of Biotechnology, Bhupat and Jyoti Mehta School of BiosciencesIndian Institute of Technology MadrasChennaiIndia
| | - M. Michael Gromiha
- Protein Bioinformatics Lab, Department of Biotechnology, Bhupat and Jyoti Mehta School of BiosciencesIndian Institute of Technology MadrasChennaiIndia
| |
Collapse
|
14
|
Li G, Pahari S, Murthy AK, Liang S, Fragoza R, Yu H, Alexov E. SAAMBE-SEQ: a sequence-based method for predicting mutation effect on protein-protein binding affinity. Bioinformatics 2021; 37:992-999. [PMID: 32866236 PMCID: PMC8128451 DOI: 10.1093/bioinformatics/btaa761] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2020] [Revised: 08/17/2020] [Accepted: 08/24/2020] [Indexed: 01/04/2023] Open
Abstract
MOTIVATION Vast majority of human genetic disorders are associated with mutations that affect protein-protein interactions by altering wild-type binding affinity. Therefore, it is extremely important to assess the effect of mutations on protein-protein binding free energy to assist the development of therapeutic solutions. Currently, the most popular approaches use structural information to deliver the predictions, which precludes them to be applicable on genome-scale investigations. Indeed, with the progress of genomic sequencing, researchers are frequently dealing with assessing effect of mutations for which there is no structure available. RESULTS Here, we report a Gradient Boosting Decision Tree machine learning algorithm, the SAAMBE-SEQ, which is completely sequence-based and does not require structural information at all. SAAMBE-SEQ utilizes 80 features representing evolutionary information, sequence-based features and change of physical properties upon mutation at the mutation site. The approach is shown to achieve Pearson correlation coefficient (PCC) of 0.83 in 5-fold cross validation in a benchmarking test against experimentally determined binding free energy change (ΔΔG). Further, a blind test (no-STRUC) is compiled collecting experimental ΔΔG upon mutation for protein complexes for which structure is not available and used to benchmark SAAMBE-SEQ resulting in PCC in the range of 0.37-0.46. The accuracy of SAAMBE-SEQ method is found to be either better or comparable to most advanced structure-based methods. SAAMBE-SEQ is very fast, available as webserver and stand-alone code, and indeed utilizes only sequence information, and thus it is applicable for genome-scale investigations to study the effect of mutations on protein-protein interactions. AVAILABILITY AND IMPLEMENTATION SAAMBE-SEQ is available at http://compbio.clemson.edu/saambe_webserver/indexSEQ.php#started. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Gen Li
- Department of Physics and Astronomy, Clemson University, Clemson, SC 29634, USA
| | - Swagata Pahari
- Department of Physics and Astronomy, Clemson University, Clemson, SC 29634, USA
| | | | - Siqi Liang
- Department of Computational Biology, Cornell University, Ithaca, NY 14850, USA
| | - Robert Fragoza
- Department of Computational Biology, Cornell University, Ithaca, NY 14850, USA
| | - Haiyuan Yu
- Department of Computational Biology, Cornell University, Ithaca, NY 14850, USA
| | - Emil Alexov
- Department of Physics and Astronomy, Clemson University, Clemson, SC 29634, USA
| |
Collapse
|
15
|
Gonzalez TR, Martin KP, Barnes JE, Patel JS, Ytreberg FM. Assessment of software methods for estimating protein-protein relative binding affinities. PLoS One 2020; 15:e0240573. [PMID: 33347442 PMCID: PMC7751979 DOI: 10.1371/journal.pone.0240573] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2020] [Accepted: 12/07/2020] [Indexed: 11/19/2022] Open
Abstract
A growing number of computational tools have been developed to accurately and rapidly predict the impact of amino acid mutations on protein-protein relative binding affinities. Such tools have many applications, for example, designing new drugs and studying evolutionary mechanisms. In the search for accuracy, many of these methods employ expensive yet rigorous molecular dynamics simulations. By contrast, non-rigorous methods use less exhaustive statistical mechanics, allowing for more efficient calculations. However, it is unclear if such methods retain enough accuracy to replace rigorous methods in binding affinity calculations. This trade-off between accuracy and computational expense makes it difficult to determine the best method for a particular system or study. Here, eight non-rigorous computational methods were assessed using eight antibody-antigen and eight non-antibody-antigen complexes for their ability to accurately predict relative binding affinities (ΔΔG) for 654 single mutations. In addition to assessing accuracy, we analyzed the CPU cost and performance for each method using a variety of physico-chemical structural features. This allowed us to posit scenarios in which each method may be best utilized. Most methods performed worse when applied to antibody-antigen complexes compared to non-antibody-antigen complexes. Rosetta-based JayZ and EasyE methods classified mutations as destabilizing (ΔΔG < -0.5 kcal/mol) with high (83-98%) accuracy and a relatively low computational cost for non-antibody-antigen complexes. Some of the most accurate results for antibody-antigen systems came from combining molecular dynamics with FoldX with a correlation coefficient (r) of 0.46, but this was also the most computationally expensive method. Overall, our results suggest these methods can be used to quickly and accurately predict stabilizing versus destabilizing mutations but are less accurate at predicting actual binding affinities. This study highlights the need for continued development of reliable, accessible, and reproducible methods for predicting binding affinities in antibody-antigen proteins and provides a recipe for using current methods.
Collapse
Affiliation(s)
- Tawny R. Gonzalez
- Institute for Modeling Collaboration and Innovation, University of Idaho, Moscow, Idaho, United States of America
| | - Kyle P. Martin
- Institute for Modeling Collaboration and Innovation, University of Idaho, Moscow, Idaho, United States of America
- Department of Physics, University of Idaho, Moscow, Idaho, United States of America
| | - Jonathan E. Barnes
- Institute for Modeling Collaboration and Innovation, University of Idaho, Moscow, Idaho, United States of America
- Department of Physics, University of Idaho, Moscow, Idaho, United States of America
| | - Jagdish Suresh Patel
- Institute for Modeling Collaboration and Innovation, University of Idaho, Moscow, Idaho, United States of America
- Department of Biological Sciences, University of Idaho, Moscow, Idaho, United States of America
| | - F. Marty Ytreberg
- Institute for Modeling Collaboration and Innovation, University of Idaho, Moscow, Idaho, United States of America
- Department of Physics, University of Idaho, Moscow, Idaho, United States of America
| |
Collapse
|
16
|
Insights into changes in binding affinity caused by disease mutations in protein-protein complexes. Comput Biol Med 2020; 123:103829. [DOI: 10.1016/j.compbiomed.2020.103829] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2020] [Revised: 05/20/2020] [Accepted: 05/20/2020] [Indexed: 01/11/2023]
|
17
|
Zhang N, Chen Y, Lu H, Zhao F, Alvarez RV, Goncearenco A, Panchenko AR, Li M. MutaBind2: Predicting the Impacts of Single and Multiple Mutations on Protein-Protein Interactions. iScience 2020; 23:100939. [PMID: 32169820 PMCID: PMC7068639 DOI: 10.1016/j.isci.2020.100939] [Citation(s) in RCA: 80] [Impact Index Per Article: 20.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2019] [Revised: 11/21/2019] [Accepted: 02/20/2020] [Indexed: 01/17/2023] Open
Abstract
Missense mutations may affect proteostasis by destabilizing or over-stabilizing protein complexes and changing the pathway flux. Predicting the effects of stabilizing mutations on protein-protein interactions is notoriously difficult because existing experimental sets are skewed toward mutations reducing protein-protein binding affinity and many computational methods fail to correctly evaluate their effects. To address this issue, we developed a method MutaBind2, which estimates the impacts of single as well as multiple mutations on protein-protein interactions. MutaBind2 employs only seven features, and the most important of them describe interactions of proteins with the solvent, evolutionary conservation of the site, and thermodynamic stability of the complex and each monomer. This approach shows a distinct improvement especially in evaluating the effects of mutations increasing binding affinity. MutaBind2 can be used for finding disease driver mutations, designing stable protein complexes, and discovering new protein-protein interaction inhibitors.
Collapse
Affiliation(s)
- Ning Zhang
- Center for Systems Biology, Department of Bioinformatics, School of Biology and Basic Medical Sciences, Soochow University, Suzhou 215123, China
| | - Yuting Chen
- Center for Systems Biology, Department of Bioinformatics, School of Biology and Basic Medical Sciences, Soochow University, Suzhou 215123, China
| | - Haoyu Lu
- Center for Systems Biology, Department of Bioinformatics, School of Biology and Basic Medical Sciences, Soochow University, Suzhou 215123, China
| | - Feiyang Zhao
- Center for Systems Biology, Department of Bioinformatics, School of Biology and Basic Medical Sciences, Soochow University, Suzhou 215123, China
| | - Roberto Vera Alvarez
- National Center for Biotechnology Information, National Institutes of Health, Bethesda, MD 20894, USA
| | - Alexander Goncearenco
- National Center for Biotechnology Information, National Institutes of Health, Bethesda, MD 20894, USA
| | - Anna R Panchenko
- National Center for Biotechnology Information, National Institutes of Health, Bethesda, MD 20894, USA.
| | - Minghui Li
- Center for Systems Biology, Department of Bioinformatics, School of Biology and Basic Medical Sciences, Soochow University, Suzhou 215123, China.
| |
Collapse
|