1
|
Nandigrami P, Fiser A. Assessing the functional impact of protein binding site definition. Protein Sci 2024; 33:e5026. [PMID: 38757384 PMCID: PMC11099757 DOI: 10.1002/pro.5026] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2023] [Revised: 05/01/2024] [Accepted: 05/03/2024] [Indexed: 05/18/2024]
Abstract
Many biomedical applications, such as classification of binding specificities or bioengineering, depend on the accurate definition of protein binding interfaces. Depending on the choice of method used, substantially different sets of residues can be classified as belonging to the interface of a protein. A typical approach used to verify these definitions is to mutate residues and measure the impact of these changes on binding. Besides the lack of exhaustive data, this approach also suffers from the fundamental problem that a mutation introduces an unknown amount of alteration into an interface, which potentially alters the binding characteristics of the interface. In this study we explore the impact of alternative binding site definitions on the ability of a protein to recognize its cognate ligand using a pharmacophore approach, which does not affect the interface. The study also shows that methods for protein binding interface predictions should perform above approximately F-score = 0.7 accuracy level to capture the biological function of a protein.
Collapse
Affiliation(s)
- Prithviraj Nandigrami
- Departments of Systems and Computational Biology, and BiochemistryAlbert Einstein College of MedicineBronxNew YorkUSA
| | - Andras Fiser
- Departments of Systems and Computational Biology, and BiochemistryAlbert Einstein College of MedicineBronxNew YorkUSA
| |
Collapse
|
2
|
Yuan Y, Chen Q, Mao J, Li G, Pan X. DG-Affinity: predicting antigen-antibody affinity with language models from sequences. BMC Bioinformatics 2023; 24:430. [PMID: 37957563 PMCID: PMC10644518 DOI: 10.1186/s12859-023-05562-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2023] [Accepted: 11/06/2023] [Indexed: 11/15/2023] Open
Abstract
BACKGROUND Antibody-mediated immune responses play a crucial role in the immune defense of human body. The evolution of bioengineering has led the progress of antibody-derived drugs, showing promising efficacy in cancer and autoimmune disease therapy. A critical step of this development process is obtaining the affinity between antibodies and their binding antigens. RESULTS In this study, we introduce a novel sequence-based antigen-antibody affinity prediction method, named DG-Affinity. DG-Affinity uses deep neural networks to efficiently and accurately predict the affinity between antibodies and antigens from sequences, without the need for structural information. The sequences of both the antigen and the antibody are first transformed into embedding vectors by two pre-trained language models, then these embeddings are concatenated into an ConvNeXt framework with a regression task. The results demonstrate the superiority of DG-Affinity over the existing structure-based prediction methods and the sequence-based tools, achieving a Pearson's correlation of over 0.65 on an independent test dataset. CONCLUSIONS Compared to the baseline methods, DG-Affinity achieves the best performance and can advance the development of antibody design. It is freely available as an easy-to-use web server at https://www.digitalgeneai.tech/solution/affinity .
Collapse
Affiliation(s)
- Ye Yuan
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai, 200240, China.
| | | | - Jun Mao
- DigitalGene, Ltd, Shanghai, 200240, China
| | - Guipeng Li
- DigitalGene, Ltd, Shanghai, 200240, China
| | - Xiaoyong Pan
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai, 200240, China.
| |
Collapse
|
3
|
Vallina Estrada E, Zhang N, Wennerström H, Danielsson J, Oliveberg M. Diffusive intracellular interactions: On the role of protein net charge and functional adaptation. Curr Opin Struct Biol 2023; 81:102625. [PMID: 37331204 DOI: 10.1016/j.sbi.2023.102625] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2023] [Revised: 05/16/2023] [Accepted: 05/16/2023] [Indexed: 06/20/2023]
Abstract
A striking feature of nucleic acids and lipid membranes is that they all carry net negative charge and so is true for the majority of intracellular proteins. It is suggested that the role of this negative charge is to assure a basal intermolecular repulsion that keeps the cytosolic content suitably 'fluid' for function. We focus in this review on the experimental, theoretical and genetic findings which serve to underpin this idea and the new questions they raise. Unlike the situation in test tubes, any functional protein-protein interaction in the cytosol is subject to competition from the densely crowded background, i.e. surrounding stickiness. At the nonspecific limit of this stickiness is the 'random' protein-protein association, maintaining profuse populations of transient and constantly interconverting complexes at physiological protein concentrations. The phenomenon is readily quantified in studies of the protein rotational diffusion, showing that the more net negatively charged a protein is the less it is retarded by clustering. It is further evident that this dynamic protein-protein interplay is under evolutionary control and finely tuned across organisms to maintain optimal physicochemical conditions for the cellular processes. The emerging picture is then that specific cellular function relies on close competition between numerous weak and strong interactions, and where all parts of the protein surfaces are involved. The outstanding challenge is now to decipher the very basics of this many-body system: how the detailed patterns of charged, polar and hydrophobic side chains not only control protein-protein interactions at close- and long-range but also the collective properties of the cellular interior as a whole.
Collapse
Affiliation(s)
- Eloy Vallina Estrada
- Department of Biochemistry and Biophysics, Arrhenius Laboratories of Natural Sciences, Stockholm University, S-106 91 Stockholm, Sweden
| | - Nannan Zhang
- Department of Biochemistry and Biophysics, Arrhenius Laboratories of Natural Sciences, Stockholm University, S-106 91 Stockholm, Sweden
| | - Håkan Wennerström
- Division of Physical Chemistry, Department of Chemistry, Lund University, Box 124, 22100 Lund, Sweden
| | - Jens Danielsson
- Department of Biochemistry and Biophysics, Arrhenius Laboratories of Natural Sciences, Stockholm University, S-106 91 Stockholm, Sweden
| | - Mikael Oliveberg
- Department of Biochemistry and Biophysics, Arrhenius Laboratories of Natural Sciences, Stockholm University, S-106 91 Stockholm, Sweden.
| |
Collapse
|
4
|
Barradas-Bautista D, Almajed A, Oliva R, Kalnis P, Cavallo L. Improving classification of correct and incorrect protein-protein docking models by augmenting the training set. BIOINFORMATICS ADVANCES 2023; 3:vbad012. [PMID: 36789292 PMCID: PMC9923443 DOI: 10.1093/bioadv/vbad012] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/27/2022] [Revised: 01/20/2023] [Accepted: 02/01/2023] [Indexed: 02/04/2023]
Abstract
Motivation Protein-protein interactions drive many relevant biological events, such as infection, replication and recognition. To control or engineer such events, we need to access the molecular details of the interaction provided by experimental 3D structures. However, such experiments take time and are expensive; moreover, the current technology cannot keep up with the high discovery rate of new interactions. Computational modeling, like protein-protein docking, can help to fill this gap by generating docking poses. Protein-protein docking generally consists of two parts, sampling and scoring. The sampling is an exhaustive search of the tridimensional space. The caveat of the sampling is that it generates a large number of incorrect poses, producing a highly unbalanced dataset. This limits the utility of the data to train machine learning classifiers. Results Using weak supervision, we developed a data augmentation method that we named hAIkal. Using hAIkal, we increased the labeled training data to train several algorithms. We trained and obtained different classifiers; the best classifier has 81% accuracy and 0.51 Matthews' correlation coefficient on the test set, surpassing the state-of-the-art scoring functions. Availability and implementation Docking models from Benchmark 5 are available at https://doi.org/10.5281/zenodo.4012018. Processed tabular data are available at https://repository.kaust.edu.sa/handle/10754/666961. Google colab is available at https://colab.research.google.com/drive/1vbVrJcQSf6\_C3jOAmZzgQbTpuJ5zC1RP?usp=sharing. Supplementary information Supplementary data are available at Bioinformatics Advances online.
Collapse
Affiliation(s)
| | - Ali Almajed
- Computer, Electrical and Mathematical Science and Engineering Division, Kaust Extreme Computing Center, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Saudi Arabia
| | - Romina Oliva
- Department of Sciences and Technologies, University of Naples “Parthenope”, I-80143 Naples, Italy
| | - Panos Kalnis
- Computer, Electrical and Mathematical Science and Engineering Division, Kaust Extreme Computing Center, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Saudi Arabia
| | - Luigi Cavallo
- Physical Sciences and Engineering Division, Kaust Catalysis Center, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Saudi Arabia
| |
Collapse
|
5
|
Jung Y, Geng C, Bonvin AMJJ, Xue LC, Honavar VG. MetaScore: A Novel Machine-Learning-Based Approach to Improve Traditional Scoring Functions for Scoring Protein-Protein Docking Conformations. Biomolecules 2023; 13:121. [PMID: 36671507 PMCID: PMC9855734 DOI: 10.3390/biom13010121] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2022] [Revised: 12/22/2022] [Accepted: 12/26/2022] [Indexed: 01/11/2023] Open
Abstract
Protein-protein interactions play a ubiquitous role in biological function. Knowledge of the three-dimensional (3D) structures of the complexes they form is essential for understanding the structural basis of those interactions and how they orchestrate key cellular processes. Computational docking has become an indispensable alternative to the expensive and time-consuming experimental approaches for determining the 3D structures of protein complexes. Despite recent progress, identifying near-native models from a large set of conformations sampled by docking-the so-called scoring problem-still has considerable room for improvement. We present MetaScore, a new machine-learning-based approach to improve the scoring of docked conformations. MetaScore utilizes a random forest (RF) classifier trained to distinguish near-native from non-native conformations using their protein-protein interfacial features. The features include physicochemical properties, energy terms, interaction-propensity-based features, geometric properties, interface topology features, evolutionary conservation, and also scores produced by traditional scoring functions (SFs). MetaScore scores docked conformations by simply averaging the score produced by the RF classifier with that produced by any traditional SF. We demonstrate that (i) MetaScore consistently outperforms each of the nine traditional SFs included in this work in terms of success rate and hit rate evaluated over conformations ranked among the top 10; (ii) an ensemble method, MetaScore-Ensemble, that combines 10 variants of MetaScore obtained by combining the RF score with each of the traditional SFs outperforms each of the MetaScore variants. We conclude that the performance of traditional SFs can be improved upon by using machine learning to judiciously leverage protein-protein interfacial features and by using ensemble methods to combine multiple scoring functions.
Collapse
Affiliation(s)
- Yong Jung
- Bioinformatics & Genomics Graduate Program, Pennsylvania State University, University Park, PA 16802, USA
- Artificial Intelligence Research Laboratory, Pennsylvania State University, University Park, PA 16802, USA
- Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, PA 16802, USA
| | - Cunliang Geng
- Bijvoet Centre for Biomolecular Research, Faculty of Science—Chemistry, Utrecht University, Padualaan 8, 3584 CH Utrecht, The Netherlands
| | - Alexandre M. J. J. Bonvin
- Bijvoet Centre for Biomolecular Research, Faculty of Science—Chemistry, Utrecht University, Padualaan 8, 3584 CH Utrecht, The Netherlands
| | - Li C. Xue
- Bijvoet Centre for Biomolecular Research, Faculty of Science—Chemistry, Utrecht University, Padualaan 8, 3584 CH Utrecht, The Netherlands
- Center for Molecular and Biomolecular Informatics, Radboudumc, Greet Grooteplein 26-28, 6525 GA Nijmegen, The Netherlands
| | - Vasant G. Honavar
- Bioinformatics & Genomics Graduate Program, Pennsylvania State University, University Park, PA 16802, USA
- Artificial Intelligence Research Laboratory, Pennsylvania State University, University Park, PA 16802, USA
- Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, PA 16802, USA
- Clinical and Translational Sciences Institute, Pennsylvania State University, University Park, PA 16802, USA
- College of Information Sciences & Technology, Pennsylvania State University, University Park, PA 16802, USA
- Institute for Computational and Data Sciences, Pennsylvania State University, University Park, PA 16802, USA
- Center for Big Data Analytics and Discovery Informatics, Pennsylvania State University, University Park, PA 16823, USA
| |
Collapse
|
6
|
Mohseni Behbahani Y, Crouzet S, Laine E, Carbone A. Deep Local Analysis evaluates protein docking conformations with locally oriented cubes. Bioinformatics 2022; 38:4505-4512. [PMID: 35962985 PMCID: PMC9525006 DOI: 10.1093/bioinformatics/btac551] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2022] [Revised: 07/04/2022] [Accepted: 08/08/2022] [Indexed: 12/24/2022] Open
Abstract
MOTIVATION With the recent advances in protein 3D structure prediction, protein interactions are becoming more central than ever before. Here, we address the problem of determining how proteins interact with one another. More specifically, we investigate the possibility of discriminating near-native protein complex conformations from incorrect ones by exploiting local environments around interfacial residues. RESULTS Deep Local Analysis (DLA)-Ranker is a deep learning framework applying 3D convolutions to a set of locally oriented cubes representing the protein interface. It explicitly considers the local geometry of the interfacial residues along with their neighboring atoms and the regions of the interface with different solvent accessibility. We assessed its performance on three docking benchmarks made of half a million acceptable and incorrect conformations. We show that DLA-Ranker successfully identifies near-native conformations from ensembles generated by molecular docking. It surpasses or competes with other deep learning-based scoring functions. We also showcase its usefulness to discover alternative interfaces. AVAILABILITY AND IMPLEMENTATION http://gitlab.lcqb.upmc.fr/dla-ranker/DLA-Ranker.git. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Yasser Mohseni Behbahani
- Sorbonne Université, CNRS, IBPS, Laboratory of Computational and Quantitative Biology (LCQB), UMR 7238, Paris 75005, France
| | - Simon Crouzet
- Sorbonne Université, CNRS, IBPS, Laboratory of Computational and Quantitative Biology (LCQB), UMR 7238, Paris 75005, France
| | | | | |
Collapse
|
7
|
Villegas JA, Levy ED. A unified statistical potential reveals that amino acid stickiness governs nonspecific recruitment of client proteins into condensates. Protein Sci 2022; 31:e4361. [PMID: 35762716 PMCID: PMC9207749 DOI: 10.1002/pro.4361] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2022] [Revised: 05/06/2022] [Accepted: 05/10/2022] [Indexed: 11/07/2022]
Abstract
Membraneless organelles are cellular compartments that form by liquid-liquid phase separation of one or more components. Other molecules, such as proteins and nucleic acids, will distribute between the cytoplasm and the liquid compartment in accordance with the thermodynamic drive to lower the free energy of the system. The resulting distribution colocalizes molecular species to carry out a diversity of functions. Two factors could drive this partitioning: the difference in solvation between the dilute versus dense phase and intermolecular interactions between the client and scaffold proteins. Here, we develop a set of knowledge-based potentials that allow for the direct comparison between stickiness, which is dominated by desolvation energy, and pairwise residue contact propensity terms. We use these scales to examine experimental data from two systems: protein cargo dissolving within phase-separated droplets made from FG repeat proteins of the nuclear pore complex and client proteins dissolving within phase-separated FUS droplets. These analyses reveal a close agreement between the stickiness of the client proteins and the experimentally determined values of the partition coefficients (R > 0.9), while pairwise residue contact propensities between client and scaffold show weaker correlations. Hence, the stickiness of client proteins is sufficient to explain their differential partitioning within these two phase-separated systems without taking into account the composition of the condensate. This result implies that selective trafficking of client proteins to distinct membraneless organelles requires recognition elements beyond the client sequence composition. STATEMENT: Empirical potentials for amino acid stickiness and pairwise residue contact propensities are derived. These scales are unique in that they enable direct comparison of desolvation versus contact terms. We find that partitioning of a client protein to a condensate is best explained by amino acid stickiness.
Collapse
Affiliation(s)
- José A. Villegas
- Department of Chemical and Structural BiologyWeizmann Institute of ScienceRehovotIsrael
- Present address:
Department of Pharmaceutical SciencesCollege of Pharmacy, University of Illinois ChicagoChicagoIL60612
| | - Emmanuel D. Levy
- Department of Chemical and Structural BiologyWeizmann Institute of ScienceRehovotIsrael
| |
Collapse
|
8
|
Rajendran M, Ferran MC, Babbitt GA. Identifying vaccine escape sites via statistical comparisons of short-term molecular dynamics. BIOPHYSICAL REPORTS 2022; 2:100056. [PMID: 35403093 PMCID: PMC8978532 DOI: 10.1016/j.bpr.2022.100056] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/15/2021] [Accepted: 03/31/2022] [Indexed: 01/08/2023]
Abstract
The identification of viral mutations that confer escape from antibodies is crucial for understanding the interplay between immunity and viral evolution. We describe a molecular dynamics (MD)-based approach that goes beyond contact mapping, scales well to a desktop computer with a modern graphics processor, and enables the user to identify functional protein sites that are prone to vaccine escape in a viral antigen. We first implement our MD pipeline to employ site-wise calculation of Kullback-Leibler divergence in atom fluctuation over replicate sets of short-term MD production runs thus enabling a statistical comparison of the rapid motion of influenza hemagglutinin (HA) in both the presence and absence of three well-known neutralizing antibodies. Using this simple comparative method applied to motions of viral proteins, we successfully identified in silico all previously empirically confirmed sites of escape in influenza HA, predetermined via selection experiments and neutralization assays. Upon the validation of our computational approach, we then surveyed potential hotspot residues in the receptor binding domain of the SARS-CoV-2 virus in the presence of COVOX-222 and S2H97 antibodies. We identified many single sites in the antigen-antibody interface that are similarly prone to potential antibody escape and that match many of the known sites of mutations arising in the SARS-CoV-2 variants of concern. In the Omicron variant, we find only minimal adaptive evolutionary shifts in the functional binding profiles of both antibodies. In summary, we provide an inexpensive and accurate computational method to monitor hotspots of functional evolution in antibody binding footprints.
Collapse
|
9
|
Engineered protein-small molecule conjugates empower selective enzyme inhibition. Cell Chem Biol 2022; 29:328-338.e4. [PMID: 34363759 PMCID: PMC8807807 DOI: 10.1016/j.chembiol.2021.07.013] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2021] [Revised: 06/17/2021] [Accepted: 07/14/2021] [Indexed: 11/20/2022]
Abstract
Potent, specific ligands drive precision medicine and fundamental biology. Proteins, peptides, and small molecules constitute effective ligand classes. Yet greater molecular diversity would aid the pursuit of ligands to elicit precise biological activity against challenging targets. We demonstrate a platform to discover protein-small molecule (PriSM) hybrids to combine unique pharmacophore activities and shapes with constrained, efficiently engineerable proteins. In this platform, a fibronectin protein library is displayed on yeast with a single cysteine coupled to acetazolamide via a maleimide-poly(ethylene glycol) linker. Magnetic and flow cytometric sorts enrich specific binders to carbonic anhydrase isoforms. Isolated PriSMs exhibit potent, specific inhibition of carbonic anhydrase isoforms with efficacy superior to that of acetazolamide or protein alone, including an 80-fold specificity increase and 9-fold potency gain. PriSMs are engineered with multiple linker lengths, protein conjugation sites, and sequences against two different isoforms, which reveal platform flexibility and impacts of molecular designs. PriSMs advance the molecular diversity of efficiently engineerable ligands.
Collapse
|
10
|
From complete cross-docking to partners identification and binding sites predictions. PLoS Comput Biol 2022; 18:e1009825. [PMID: 35089918 PMCID: PMC8827487 DOI: 10.1371/journal.pcbi.1009825] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2021] [Revised: 02/09/2022] [Accepted: 01/11/2022] [Indexed: 11/19/2022] Open
Abstract
Proteins ensure their biological functions by interacting with each other. Hence, characterising protein interactions is fundamental for our understanding of the cellular machinery, and for improving medicine and bioengineering. Over the past years, a large body of experimental data has been accumulated on who interacts with whom and in what manner. However, these data are highly heterogeneous and sometimes contradictory, noisy, and biased. Ab initio methods provide a means to a "blind" protein-protein interaction network reconstruction. Here, we report on a molecular cross-docking-based approach for the identification of protein partners. The docking algorithm uses a coarse-grained representation of the protein structures and treats them as rigid bodies. We applied the approach to a few hundred of proteins, in the unbound conformations, and we systematically investigated the influence of several key ingredients, such as the size and quality of the interfaces, and the scoring function. We achieved some significant improvement compared to previous works, and a very high discriminative power on some specific functional classes. We provide a readout of the contributions of shape and physico-chemical complementarity, interface matching, and specificity, in the predictions. In addition, we assessed the ability of the approach to account for protein surface multiple usages, and we compared it with a sequence-based deep learning method. This work may contribute to guiding the exploitation of the large amounts of protein structural models now available toward the discovery of unexpected partners and their complex structure characterisation.
Collapse
|
11
|
Barradas-Bautista D, Cao Z, Vangone A, Oliva R, Cavallo L. A random forest classifier for protein-protein docking models. BIOINFORMATICS ADVANCES 2021; 2:vbab042. [PMID: 36699405 PMCID: PMC9710594 DOI: 10.1093/bioadv/vbab042] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/05/2021] [Revised: 11/11/2021] [Accepted: 12/06/2021] [Indexed: 01/28/2023]
Abstract
Herein, we present the results of a machine learning approach we developed to single out correct 3D docking models of protein-protein complexes obtained by popular docking software. To this aim, we generated 3 × 10 4 docking models for each of the 230 complexes in the protein-protein benchmark, version 5, using three different docking programs (HADDOCK, FTDock and ZDOCK), for a cumulative set of ≈ 7 × 10 6 docking models. Three different machine learning approaches (Random Forest, Supported Vector Machine and Perceptron) were used to train classifiers with 158 different scoring functions (features). The Random Forest algorithm outperformed the other two algorithms and was selected for further optimization. Using a features selection algorithm, and optimizing the random forest hyperparameters, allowed us to train and validate a random forest classifier, named COnservation Driven Expert System (CoDES). Testing of CoDES on independent datasets, as well as results of its comparative performance with machine learning methods recently developed in the field for the scoring of docking decoys, confirm its state-of-the-art ability to discriminate correct from incorrect decoys both in terms of global parameters and in terms of decoys ranked at the top positions. Supplementary information Supplementary data are available at Bioinformatics Advances online. Software and data availability statement The docking models are available at https://doi.org/10.5281/zenodo.4012018. The programs underlying this article will be shared on request to the corresponding authors.
Collapse
Affiliation(s)
- Didier Barradas-Bautista
- Kaust Catalysis Center, Physical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), 23955-6900 Thuwal, Saudi Arabia,To whom correspondence should be addressed. or or
| | - Zhen Cao
- Kaust Catalysis Center, Physical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), 23955-6900 Thuwal, Saudi Arabia
| | - Anna Vangone
- Pharma Research and Early Development, Therapeutic Modalities, Roche Innovation Center Munich Large Molecule Research, 82377 Penzberg, Germany
| | - Romina Oliva
- Department of Sciences and Technologies, University Parthenope of Naples, Centro Direzionale Isola C4, I-80143 Naples, Italy,To whom correspondence should be addressed. or or
| | - Luigi Cavallo
- Kaust Catalysis Center, Physical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), 23955-6900 Thuwal, Saudi Arabia,To whom correspondence should be addressed. or or
| |
Collapse
|
12
|
Ghadie M, Xia Y. Mutation Edgotype Drives Fitness Effect in Human. FRONTIERS IN BIOINFORMATICS 2021; 1:690769. [PMID: 36303776 PMCID: PMC9581054 DOI: 10.3389/fbinf.2021.690769] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2021] [Accepted: 08/18/2021] [Indexed: 11/24/2022] Open
Abstract
Missense mutations are known to perturb protein-protein interaction networks (known as interactome networks) in different ways. However, it remains unknown how different interactome perturbation patterns (“edgotypes”) impact organismal fitness. Here, we estimate the fitness effect of missense mutations with different interactome perturbation patterns in human, by calculating the fractions of neutral and deleterious mutations that do not disrupt PPIs (“quasi-wild-type”), or disrupt PPIs either by disrupting the binding interface (“edgetic”) or by disrupting overall protein stability (“quasi-null”). We first map pathogenic mutations and common non-pathogenic mutations onto homology-based three-dimensional structural models of proteins and protein-protein interactions in human. Next, we perform structure-based calculations to classify each mutation as either quasi-wild-type, edgetic, or quasi-null. Using our predicted as well as experimentally determined interactome perturbation patterns, we estimate that >∼40% of quasi-wild-type mutations are effectively neutral and the remaining are mostly mildly deleterious, that >∼75% of edgetic mutations are only mildly deleterious, and that up to ∼75% of quasi-null mutations may be strongly detrimental. These estimates are the first such estimates of fitness effect for different network perturbation patterns in any interactome. Our results suggest that while mutations that do not disrupt the interactome tend to be effectively neutral, the majority of human PPIs are under strong purifying selection and the stability of most human proteins is essential to human life.
Collapse
|
13
|
Das S, Chakrabarti S. Classification and prediction of protein-protein interaction interface using machine learning algorithm. Sci Rep 2021; 11:1761. [PMID: 33469042 PMCID: PMC7815773 DOI: 10.1038/s41598-020-80900-2] [Citation(s) in RCA: 33] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2020] [Accepted: 12/15/2020] [Indexed: 01/29/2023] Open
Abstract
Structural insight of the protein-protein interaction (PPI) interface can provide knowledge about the kinetics, thermodynamics and molecular functions of the complex while elucidating its role in diseases and further enabling it as a potential therapeutic target. However, owing to experimental lag in solving protein-protein complex structures, three-dimensional (3D) knowledge of the PPI interfaces can be gained via computational approaches like molecular docking and post-docking analyses. Despite development of numerous docking tools and techniques, success in identification of native like interfaces based on docking score functions is limited. Hence, we employed an in-depth investigation of the structural features of the interface that might successfully delineate native complexes from non-native ones. We identify interface properties, which show statistically significant difference between native and non-native interfaces belonging to homo and hetero, protein-protein complexes. Utilizing these properties, a support vector machine (SVM) based classification scheme has been implemented to differentiate native and non-native like complexes generated using docking decoys. Benchmarking and comparative analyses suggest very good performance of our SVM classifiers. Further, protein interactions, which are proven via experimental findings but not resolved structurally, were subjected to this approach where 3D-models of the complexes were generated and most likely interfaces were predicted. A web server called Protein Complex Prediction by Interface Properties (PCPIP) is developed to predict whether interface of a given protein-protein dimer complex resembles known protein interfaces. The server is freely available at http://www.hpppi.iicb.res.in/pcpip/ .
Collapse
Affiliation(s)
- Subhrangshu Das
- grid.417635.20000 0001 2216 5074Structural Biology and Bioinformatics Division, CSIR-Indian Institute of Chemical Biology, Kolkata, WB India
| | - Saikat Chakrabarti
- grid.417635.20000 0001 2216 5074Structural Biology and Bioinformatics Division, CSIR-Indian Institute of Chemical Biology, Kolkata, WB India
| |
Collapse
|
14
|
Dhawanjewar AS, Roy AA, Madhusudhan MS. A knowledge-based scoring function to assess quaternary associations of proteins. Bioinformatics 2020; 36:3739-3748. [PMID: 32246820 DOI: 10.1093/bioinformatics/btaa207] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2019] [Revised: 03/01/2020] [Accepted: 03/30/2020] [Indexed: 12/21/2022] Open
Abstract
MOTIVATION The elucidation of all inter-protein interactions would significantly enhance our knowledge of cellular processes at a molecular level. Given the enormity of the problem, the expenses and limitations of experimental methods, it is imperative that this problem is tackled computationally. In silico predictions of protein interactions entail sampling different conformations of the purported complex and then scoring these to assess for interaction viability. In this study, we have devised a new scheme for scoring protein-protein interactions. RESULTS Our method, PIZSA (Protein Interaction Z-Score Assessment), is a binary classification scheme for identification of native protein quaternary assemblies (binders/nonbinders) based on statistical potentials. The scoring scheme incorporates residue-residue contact preference on the interface with per residue-pair atomic contributions and accounts for clashes. PIZSA can accurately discriminate between native and non-native structural conformations from protein docking experiments and outperform other contact-based potential scoring functions. The method has been extensively benchmarked and is among the top 6 methods, outperforming 31 other statistical, physics based and machine learning scoring schemes. The PIZSA potentials can also distinguish crystallization artifacts from biological interactions. AVAILABILITY AND IMPLEMENTATION PIZSA is implemented as a web server at http://cospi.iiserpune.ac.in/pizsa and can be downloaded as a standalone package from http://cospi.iiserpune.ac.in/pizsa/Download/Download.html. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Abhilesh S Dhawanjewar
- Indian Institute of Science Education and Research, Pashan, Pune 411008, India.,School of Biological Sciences, University of Nebraska, Lincoln, NE 68588, USA
| | - Ankit A Roy
- Indian Institute of Science Education and Research, Pashan, Pune 411008, India
| | | |
Collapse
|
15
|
Rosell M, Fernández-Recio J. Docking approaches for modeling multi-molecular assemblies. Curr Opin Struct Biol 2020; 64:59-65. [PMID: 32615514 PMCID: PMC7324114 DOI: 10.1016/j.sbi.2020.05.016] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2020] [Revised: 05/13/2020] [Accepted: 05/21/2020] [Indexed: 12/12/2022]
Abstract
Computational docking approaches aim to overcome the limited availability of experimental structural data on protein-protein interactions, which are key in biology. The field is rapidly moving from the traditional docking methodologies for modeling of binary complexes to more integrative approaches using template-based, data-driven modeling of multi-molecular assemblies. We will review here the predictive capabilities of current docking methods in blind conditions, based on the results from the most recent community-wide blind experiments. Integration of template-based and ab initio docking approaches is emerging as the optimal strategy for modeling protein complexes and multimolecular assemblies. We will also review the new methodological advances on ab initio docking and integrative modeling.
Collapse
Affiliation(s)
- Mireia Rosell
- Barcelona Supercomputing Center (BSC), 08034 Barcelona, Spain; Instituto de Ciencias de la Vid y del Vino (ICVV), CSIC - Universidad de La Rioja - Gobierno de La Rioja, 26007 Logroño, Spain
| | - Juan Fernández-Recio
- Barcelona Supercomputing Center (BSC), 08034 Barcelona, Spain; Instituto de Ciencias de la Vid y del Vino (ICVV), CSIC - Universidad de La Rioja - Gobierno de La Rioja, 26007 Logroño, Spain.
| |
Collapse
|
16
|
Roy AA, Dhawanjewar AS, Sharma P, Singh G, Madhusudhan MS. Protein Interaction Z Score Assessment (PIZSA): an empirical scoring scheme for evaluation of protein-protein interactions. Nucleic Acids Res 2020; 47:W331-W337. [PMID: 31114890 PMCID: PMC6602501 DOI: 10.1093/nar/gkz368] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2019] [Revised: 04/24/2019] [Accepted: 05/15/2019] [Indexed: 11/24/2022] Open
Abstract
Our web server, PIZSA (http://cospi.iiserpune.ac.in/pizsa), assesses the likelihood of protein–protein interactions by assigning a Z Score computed from interface residue contacts. Our score takes into account the optimal number of atoms that mediate the interaction between pairs of residues and whether these contacts emanate from the main chain or side chain. We tested the score on 174 native interactions for which 100 decoys each were constructed using ZDOCK. The native structure scored better than any of the decoys in 146 cases and was able to rank within the 95th percentile in 162 cases. This easily outperforms a competing method, CIPS. We also benchmarked our scoring scheme on 15 targets from the CAPRI dataset and found that our method had results comparable to that of CIPS. Further, our method is able to analyse higher order protein complexes without the need to explicitly identify chains as receptors or ligands. The PIZSA server is easy to use and could be used to score any input three-dimensional structure and provide a residue pair-wise break up of the results. Attractively, our server offers a platform for users to upload their own potentials and could serve as an ideal testing ground for this class of scoring schemes.
Collapse
Affiliation(s)
- Ankit A Roy
- Indian Institute of Science Education and Research, Pune, Dr Homi Bhabha Road, Pashan, Pune 411008, India
| | - Abhilesh S Dhawanjewar
- Indian Institute of Science Education and Research, Pune, Dr Homi Bhabha Road, Pashan, Pune 411008, India.,presently at School of Biological Sciences, University of Nebraska, Lincoln, NE 68588, USA
| | - Parichit Sharma
- Indian Institute of Science Education and Research, Pune, Dr Homi Bhabha Road, Pashan, Pune 411008, India.,presently at School of Informatics, Computing & Engineering, Department of Computer Science, Indiana University, Bloomington, IN 47408, USA
| | - Gulzar Singh
- Indian Institute of Science Education and Research, Pune, Dr Homi Bhabha Road, Pashan, Pune 411008, India
| | - M S Madhusudhan
- Indian Institute of Science Education and Research, Pune, Dr Homi Bhabha Road, Pashan, Pune 411008, India
| |
Collapse
|
17
|
Ding W, Tan HY, Zhang JX, Wilczek LA, Hsieh KR, Mulkin JA, Bianco PR. The mechanism of Single strand binding protein-RecG binding: Implications for SSB interactome function. Protein Sci 2020; 29:1211-1227. [PMID: 32196797 PMCID: PMC7184773 DOI: 10.1002/pro.3855] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2020] [Revised: 03/11/2020] [Accepted: 03/13/2020] [Indexed: 01/10/2023]
Abstract
The Escherichia coli single-strand DNA binding protein (SSB) is essential to viability where it functions to regulate SSB interactome function. Here it binds to single-stranded DNA and to target proteins that comprise the interactome. The region of SSB that links these two essential protein functions is the intrinsically disordered linker. Key to linker function is the presence of three, conserved PXXP motifs that mediate binding to oligosaccharide-oligonucleotide binding folds (OB-fold) present in SSB and its interactome partners. Not surprisingly, partner OB-fold deletions eliminate SSB binding. Furthermore, single point mutations in either the PXXP motifs or, in the RecG OB-fold, obliterate SSB binding. The data also demonstrate that, and in contrast to the view currently held in the field, the C-terminal acidic tip of SSB is not required for interactome partner binding. Instead, we propose the tip has two roles. First, and consistent with the proposal of Dixon, to regulate the structure of the C-terminal domain in a biologically active conformation that prevents linkers from binding to SSB OB-folds until this interaction is required. Second, as a secondary binding domain. Finally, as OB-folds are present in SSB and many of its partners, we present the SSB interactome as the first family of OB-fold genome guardians identified in prokaryotes.
Collapse
Affiliation(s)
- Wenfei Ding
- Center for Single Molecule BiophysicsUniversity at BuffaloBuffaloNew YorkUnited States
- Department of BiochemistryUniversity at BuffaloBuffaloNew YorkUnited States
| | - Hui Yin Tan
- Center for Single Molecule BiophysicsUniversity at BuffaloBuffaloNew YorkUnited States
- Present address:
Department of Chemistry and BiochemistryUniversity of Notre DameSouth BendIndianaUnited States
| | - Jia Xiang Zhang
- Department of BiochemistryUniversity at BuffaloBuffaloNew YorkUnited States
| | - Luke A. Wilczek
- Center for Single Molecule BiophysicsUniversity at BuffaloBuffaloNew YorkUnited States
- Department of BiochemistryUniversity at BuffaloBuffaloNew YorkUnited States
- Present address:
Department of ChemistryBrown UniversityProvidenceRhode IslandUnited States
| | - Karin R. Hsieh
- Center for Single Molecule BiophysicsUniversity at BuffaloBuffaloNew YorkUnited States
| | - Jeffrey A. Mulkin
- Center for Single Molecule BiophysicsUniversity at BuffaloBuffaloNew YorkUnited States
| | - Piero R. Bianco
- Center for Single Molecule BiophysicsUniversity at BuffaloBuffaloNew YorkUnited States
- Department of BiochemistryUniversity at BuffaloBuffaloNew YorkUnited States
| |
Collapse
|
18
|
Guo F, Zou Q, Yang G, Wang D, Tang J, Xu J. Identifying protein-protein interface via a novel multi-scale local sequence and structural representation. BMC Bioinformatics 2019; 20:483. [PMID: 31874604 PMCID: PMC6929278 DOI: 10.1186/s12859-019-3048-2] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2019] [Accepted: 08/21/2019] [Indexed: 12/23/2022] Open
Abstract
Background Protein-protein interaction plays a key role in a multitude of biological processes, such as signal transduction, de novo drug design, immune responses, and enzymatic activities. Gaining insights of various binding abilities can deepen our understanding of the interaction. It is of great interest to understand how proteins in a complex interact with each other. Many efficient methods have been developed for identifying protein-protein interface. Results In this paper, we obtain the local information on protein-protein interface, through multi-scale local average block and hexagon structure construction. Given a pair of proteins, we use a trained support vector regression (SVR) model to select best configurations. On Benchmark v4.0, our method achieves average Irmsd value of 3.28Å and overall Fnat value of 63%, which improves upon Irmsd of 3.89Å and Fnat of 49% for ZRANK, and Irmsd of 3.99Å and Fnat of 46% for ClusPro. On CAPRI targets, our method achieves average Irmsd value of 3.45Å and overall Fnat value of 46%, which improves upon Irmsd of 4.18Å and Fnat of 40% for ZRANK, and Irmsd of 5.12Å and Fnat of 32% for ClusPro. The success rates by our method, FRODOCK 2.0, InterEvDock and SnapDock on Benchmark v4.0 are 41.5%, 29.0%, 29.4% and 37.0%, respectively. Conclusion Experiments show that our method performs better than some state-of-the-art methods, based on the prediction quality improved in terms of CAPRI evaluation criteria. All these results demonstrate that our method is a valuable technological tool for identifying protein-protein interface.
Collapse
Affiliation(s)
- Fei Guo
- College of Intelligence and Computing, Tianjin University, Tianjin, People's Republic of China.
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, People's Republic of China
| | - Guang Yang
- School of Economics, Nankai University, Tianjin, People's Republic of China
| | - Dan Wang
- Department of Computer Science, City University of Hong Kong, Kowloon Tong, Hong Kong
| | - Jijun Tang
- College of Intelligence and Computing, Tianjin University, Tianjin, People's Republic of China.,Department of Computer Science and Engineering, University of South Carolina, Columbia, USA
| | - Junhai Xu
- College of Intelligence and Computing, Tianjin University, Tianjin, People's Republic of China
| |
Collapse
|
19
|
Liu J, Gong X. Attention mechanism enhanced LSTM with residual architecture and its application for protein-protein interaction residue pairs prediction. BMC Bioinformatics 2019; 20:609. [PMID: 31775612 PMCID: PMC6882172 DOI: 10.1186/s12859-019-3199-1] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2018] [Accepted: 11/06/2019] [Indexed: 11/25/2022] Open
Abstract
Background Recurrent neural network(RNN) is a good way to process sequential data, but the capability of RNN to compute long sequence data is inefficient. As a variant of RNN, long short term memory(LSTM) solved the problem in some extent. Here we improved LSTM for big data application in protein-protein interaction interface residue pairs prediction based on the following two reasons. On the one hand, there are some deficiencies in LSTM, such as shallow layers, gradient explosion or vanishing, etc. With a dramatic data increasing, the imbalance between algorithm innovation and big data processing has been more serious and urgent. On the other hand, protein-protein interaction interface residue pairs prediction is an important problem in biology, but the low prediction accuracy compels us to propose new computational methods. Results In order to surmount aforementioned problems of LSTM, we adopt the residual architecture and add attention mechanism to LSTM. In detail, we redefine the block, and add a connection from front to back in every two layers and attention mechanism to strengthen the capability of mining information. Then we use it to predict protein-protein interaction interface residue pairs, and acquire a quite good accuracy over 72%. What’s more, we compare our method with random experiments, PPiPP, standard LSTM, and some other machine learning methods. Our method shows better performance than the methods mentioned above. Conclusion We present an attention mechanism enhanced LSTM with residual architecture, and make deeper network without gradient vanishing or explosion to a certain extent. Then we apply it to a significant problem– protein-protein interaction interface residue pairs prediction and obtain a better accuracy than other methods. Our method provides a new approach for protein-protein interaction computation, which will be helpful for related biomedical researches.
Collapse
Affiliation(s)
- Jiale Liu
- Mathematics Intelligence Application Lab, Institute for Mathematical Sciences, Renmin University of China, No. 59 Zhongguancun Street,Haidian District, Beijing, China
| | - Xinqi Gong
- Mathematics Intelligence Application Lab, Institute for Mathematical Sciences, Renmin University of China, No. 59 Zhongguancun Street,Haidian District, Beijing, China. .,Center for Mathematical Sciences and Applications,Harvard University, Boston, MA02138, USA.
| |
Collapse
|
20
|
Fong P, Wong HK. Evaluation of Scoring Function Performance on DNA-ligand Complexes. THE OPEN MEDICINAL CHEMISTRY JOURNAL 2019. [DOI: 10.2174/1874104501913010040] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/23/2023]
Abstract
Background:
DNA has been a pharmacological target for different types of treatment, such as antibiotics and chemotherapy agents, and is still a potential target in many drug discovery processes. However, most docking and scoring approaches were parameterised for protein-ligand interactions; their suitability for modelling DNA-ligand interactions is uncertain.
Objective:
This study investigated the performance of four scoring functions on DNA-ligand complexes.
Material & Methods:
Here, we explored the ability of four docking protocols and scoring functions to discriminate the native pose of 33 DNA-ligand complexes over a compiled set of 200 decoys for each DNA-ligand complexes. The four approaches were the AutoDock, ASP@GOLD, ChemScore@GOLD and GoldScore@GOLD.
Results:
Our results indicate that AutoDock performed the best when predicting binding mode and that ChemScore@GOLD achieved the best discriminative power. Rescoring of AutoDock-generated decoys with ChemScore@GOLD further enhanced their individual discriminative powers. All four approaches have no discriminative power in some DNA-ligand complexes, including both minor groove binders and intercalators.
Conclusion:
This study suggests that the evaluation for each DNA-ligand complex should be performed in order to obtain meaningful results for any drug discovery processes. Rescoring with different scoring functions can improve discriminative power.
Collapse
|
21
|
Shape complementarity at protein interfaces via global docking optimisation. J Mol Graph Model 2018; 84:69-73. [DOI: 10.1016/j.jmgm.2018.06.011] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2018] [Revised: 06/11/2018] [Accepted: 06/12/2018] [Indexed: 11/24/2022]
|