1
|
Barradas-Bautista D, Almajed A, Oliva R, Kalnis P, Cavallo L. Improving classification of correct and incorrect protein-protein docking models by augmenting the training set. BIOINFORMATICS ADVANCES 2023; 3:vbad012. [PMID: 36789292 PMCID: PMC9923443 DOI: 10.1093/bioadv/vbad012] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/27/2022] [Revised: 01/20/2023] [Accepted: 02/01/2023] [Indexed: 02/04/2023]
Abstract
Motivation Protein-protein interactions drive many relevant biological events, such as infection, replication and recognition. To control or engineer such events, we need to access the molecular details of the interaction provided by experimental 3D structures. However, such experiments take time and are expensive; moreover, the current technology cannot keep up with the high discovery rate of new interactions. Computational modeling, like protein-protein docking, can help to fill this gap by generating docking poses. Protein-protein docking generally consists of two parts, sampling and scoring. The sampling is an exhaustive search of the tridimensional space. The caveat of the sampling is that it generates a large number of incorrect poses, producing a highly unbalanced dataset. This limits the utility of the data to train machine learning classifiers. Results Using weak supervision, we developed a data augmentation method that we named hAIkal. Using hAIkal, we increased the labeled training data to train several algorithms. We trained and obtained different classifiers; the best classifier has 81% accuracy and 0.51 Matthews' correlation coefficient on the test set, surpassing the state-of-the-art scoring functions. Availability and implementation Docking models from Benchmark 5 are available at https://doi.org/10.5281/zenodo.4012018. Processed tabular data are available at https://repository.kaust.edu.sa/handle/10754/666961. Google colab is available at https://colab.research.google.com/drive/1vbVrJcQSf6\_C3jOAmZzgQbTpuJ5zC1RP?usp=sharing. Supplementary information Supplementary data are available at Bioinformatics Advances online.
Collapse
Affiliation(s)
| | - Ali Almajed
- Computer, Electrical and Mathematical Science and Engineering Division, Kaust Extreme Computing Center, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Saudi Arabia
| | - Romina Oliva
- Department of Sciences and Technologies, University of Naples “Parthenope”, I-80143 Naples, Italy
| | - Panos Kalnis
- Computer, Electrical and Mathematical Science and Engineering Division, Kaust Extreme Computing Center, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Saudi Arabia
| | - Luigi Cavallo
- Physical Sciences and Engineering Division, Kaust Catalysis Center, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Saudi Arabia
| |
Collapse
|
2
|
Cohen T, Halfon M, Carter L, Sharkey B, Jain T, Sivasubramanian A, Schneidman-Duhovny D. Multi-state modeling of antibody-antigen complexes with SAXS profiles and deep-learning models. Methods Enzymol 2022; 678:237-262. [PMID: 36641210 DOI: 10.1016/bs.mie.2022.11.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
Antibodies are an established class of human therapeutics. Epitope characterization is an important part of therapeutic antibody discovery. However, structural characterization of antibody-antigen complexes remains challenging. On the one hand, X-ray crystallography or cryo-electron microscopy provide atomic resolution characterization of the epitope, but the data collection process is typically long and the success rate is low. On the other hand, computational methods for modeling antibody-antigen structures from the individual components frequently suffer from a high false positive rate, rarely resulting in a unique solution. Recent deep learning models for structure prediction are also successful in predicting protein-protein complexes. However, they do not perform well for antibody-antigen complexes. Small Angle X-ray Scattering (SAXS) is a reliable technique for rapid structural characterization of protein samples in solution albeit at low resolution. Here, we present an integrative approach for modeling antigen-antibody complexes using the antibody sequence, antigen structure, and experimentally determined SAXS profiles of the antibody, antigen, and the complex. The method models antibody structures using a novel deep-learning approach, NanoNet. The structures of the antibodies and antigens are represented using multiple 3D conformations to account for compositional and conformational heterogeneity of the protein samples that are used to collect the SAXS data. The complexes are predicted by integrating the SAXS profiles with scoring functions for protein-protein interfaces that are based on statistical potentials and antibody-specific deep-learning models. We validated the method via application to four Fab:EGFR and one Fab:PCSK9 antibody:antigen complexes with experimentally available SAXS datasets. The integrative approach returns accurate predictions (interface RMSD<4Å) in the top five predictions for four out of five complexes (respective interface RMSD values of 1.95, 2.18, 2.66 and 3.87Å), providing support for the utility of such a computational pipeline for epitope characterization during therapeutic antibody discovery.
Collapse
Affiliation(s)
- Tomer Cohen
- The Rachel and Selim Benin School of Computer Science and Engineering, The Hebrew University of Jerusalem, Jerusalem, Israel
| | - Matan Halfon
- The Rachel and Selim Benin School of Computer Science and Engineering, The Hebrew University of Jerusalem, Jerusalem, Israel
| | - Lester Carter
- Stanford Synchrotron Radiation Lightsource, SLAC National Accelerator Laboratory, Menlo Park, CA, United States
| | - Beth Sharkey
- High-Throughput Expression, Adimab LLC, Lebanon, NH, United States
| | - Tushar Jain
- Computational Biology, Adimab LLC, Palo Alto, CA, United States
| | | | - Dina Schneidman-Duhovny
- The Rachel and Selim Benin School of Computer Science and Engineering, The Hebrew University of Jerusalem, Jerusalem, Israel.
| |
Collapse
|
3
|
Chen YC, Chen YH, Wright JD, Lim C. PPI-Hotspot DB: Database of Protein-Protein Interaction Hot Spots. J Chem Inf Model 2022; 62:1052-1060. [PMID: 35147037 DOI: 10.1021/acs.jcim.2c00025] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
Abstract
Single-point mutations of certain residues (so-called hot spots) impair/disrupt protein-protein interactions (PPIs), leading to pathogenesis and drug resistance. Conventionally, a PPI-hot spot is identified when its replacement decreased the binding free energy significantly, generally by ≥2 kcal/mol. The relatively few mutations with such a significant binding free energy drop limited the number of distinct PPI-hot spots. By defining PPI-hot spots based on mutations that have been manually curated in UniProtKB to significantly impair/disrupt PPIs in addition to binding free energy changes, we have greatly expanded the number of distinct PPI-hot spots by an order of magnitude. These experimentally determined PPI-hot spots along with available structures have been collected in a database called PPI-HotspotDB. We have applied the PPI-HotspotDB to create a nonredundant benchmark, PPI-Hotspot+PDBBM, for assessing methods to predict PPI-hot spots using the free structure as input. PPI-HotspotDB will benefit the design of mutagenesis experiments and development of PPI-hot spot prediction methods. The database and benchmark are freely available at https://ppihotspot.limlab.dnsalias.org.
Collapse
Affiliation(s)
- Yao Chi Chen
- Institute of Biomedical Sciences, Academia Sinica, Taipei 115, Taiwan
| | - Yu-Hsien Chen
- Institute of Biomedical Sciences, Academia Sinica, Taipei 115, Taiwan
| | - Jon D Wright
- Institute of Biomedical Sciences, Academia Sinica, Taipei 115, Taiwan
| | - Carmay Lim
- Institute of Biomedical Sciences, Academia Sinica, Taipei 115, Taiwan.,Department of Chemistry, National Tsing Hua University, Hsinchu 300, Taiwan
| |
Collapse
|
4
|
Meseguer A, Bota P, Fernández-Fuentes N, Oliva B. Prediction of Protein-Protein Binding Affinities from Unbound Protein Structures. Methods Mol Biol 2022; 2385:335-351. [PMID: 34888728 DOI: 10.1007/978-1-0716-1767-0_16] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Proteins are the workhorses of cells to carry out sophisticated and complex cellular processes. Such processes require a coordinated and regulated interactions between proteins that are both time and location specific. The strength, or binding affinity, of protein-protein interactions ranges between the micro- and the nanomolar association constant, often dictating the molecular mechanisms underlying the interaction and the longevity of the complex, i.e., transient or permanent. In consequence, there is a need to quantify the strength of protein-protein interactions for biological, biomedical, and biotechnological applications. While experimental methods are labor intensive and costly, computational ones are useful tools to predict the affinity of protein-protein interactions. In this chapter, we review the methods developed by us to address this question. We briefly present two methods to comprehend the structure of the protein complex derived by either comparative modeling or docking. Then we introduce BADOCK, a method to predict the binding energy without requiring the structure of the protein complex, thus overcoming one of the major limitations of structure-based methods for the prediction of binding affinity. BADOCK utilizes the structure of unbound proteins and the protein docking sampling space to predict protein-protein binding affinities. We present step-by-step protocols to utilize these methods, describing the inputs and potential pitfalls as well as their respective strengths and limitations.
Collapse
Affiliation(s)
- Alberto Meseguer
- Structural Bioinformatics Lab (GRIB-IMIM), Department of Experimental and Health Science, University Pompeu Fabra, Barcelona, Catalonia, Spain
| | - Patricia Bota
- Structural Bioinformatics Lab (GRIB-IMIM), Department of Experimental and Health Science, University Pompeu Fabra, Barcelona, Catalonia, Spain
- Department of Biosciences, U Science Tech, Universitat de Vic-Universitat Central de Catalunya, Catalonia, Spain
| | - Narcis Fernández-Fuentes
- Department of Biosciences, U Science Tech, Universitat de Vic-Universitat Central de Catalunya, Catalonia, Spain
- Institute of Biological, Environmental and Rural Sciences, Aberystwyth University, Aberystwyth, UK
| | - Baldo Oliva
- Structural Bioinformatics Lab (GRIB-IMIM), Department of Experimental and Health Science, University Pompeu Fabra, Barcelona, Catalonia, Spain.
| |
Collapse
|
5
|
Rosell M, Rodríguez-Lumbreras LA, Fernández-Recio J. Modeling of Protein Complexes and Molecular Assemblies with pyDock. Methods Mol Biol 2021; 2165:175-198. [PMID: 32621225 DOI: 10.1007/978-1-0716-0708-4_10] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
The study of the 3D structural details of protein interactions is essential to understand biomolecular functions at the molecular level. In this context, the limited availability of experimental structures of protein-protein complexes at atomic resolution is propelling the development of computational docking methods that aim to complement the current structural coverage of protein interactions. One of these docking approaches is pyDock, which uses van der Waals, electrostatics, and desolvation energy to score docking poses generated by a variety of sampling methods, typically FTDock or ZDOCK. The method has shown a consistently good prediction performance in community-wide assessment experiments like CAPRI or CASP, and has provided biological insights and insightful interpretation of experiments by modeling many biomolecular interactions of biomedical and biotechnological interest. Here, we describe in detail how to perform structural modeling of protein assemblies with pyDock, and the application of its modules to different biomolecular recognition phenomena, such as modeling of binding mode, interface, and hot-spot prediction, use of restraints based on experimental data, inclusion of low-resolution structural data, binding affinity estimation, or modeling of homo- and hetero-oligomeric assemblies.
Collapse
Affiliation(s)
- Mireia Rosell
- Barcelona Supercomputing Center (BSC), Barcelona, Spain.,Instituto de Ciencias de la Vid y del Vino (ICVV), Consejo Superior de Investigaciones Científicas (CSIC) - Universidad de La Rioja - Gobierno de La Rioja, Logroño, Spain
| | - Luis Angel Rodríguez-Lumbreras
- Barcelona Supercomputing Center (BSC), Barcelona, Spain.,Instituto de Ciencias de la Vid y del Vino (ICVV), Consejo Superior de Investigaciones Científicas (CSIC) - Universidad de La Rioja - Gobierno de La Rioja, Logroño, Spain
| | - Juan Fernández-Recio
- Barcelona Supercomputing Center (BSC), Barcelona, Spain. .,Instituto de Ciencias de la Vid y del Vino (ICVV), Consejo Superior de Investigaciones Científicas (CSIC) - Universidad de La Rioja - Gobierno de La Rioja, Logroño, Spain. .,Institut de Biologia Molecular de Barcelona (IBMB), Consejo Superior de Investigaciones Científicas (CSIC), Barcelona, Spain.
| |
Collapse
|
6
|
Abstract
Many of the biological functions of the cell are driven by protein-protein interactions. However, determining which proteins interact and exactly how they do so to enable their functions, remain major research questions. Functional interactions are dependent on a number of complicated factors; therefore, modeling the three-dimensional structure of protein-protein complexes is still considered a complex endeavor. Nevertheless, the rewards for modeling protein interactions to atomic level detail are substantial, and there are numerous examples of how models can provide useful information for drug design, protein engineering, systems biology, and understanding of the immune system. Here, we provide practical guidelines for docking proteins using the web-server, SwarmDock, a flexible protein-protein docking method. Moreover, we provide an overview of the factors that need to be considered when deciding whether docking is likely to be successful.
Collapse
Affiliation(s)
- Iain H Moal
- European Bioinformatics Institute, Hinxton, UK
| | | | | | - Paul A Bates
- Biomolecular Modelling Laboratory, The Francis Crick Institute, London, UK.
| |
Collapse
|
7
|
Abstract
There is a large gap between the numbers of known protein-protein interactions and the corresponding experimentally solved structures of protein complexes. Fortunately, this gap can be in part bridged by computational structure modeling methods. Currently, template-based modeling is the most accurate means to predict both individual protein structures and protein complexes. One of the major issues in template-based modeling is to identify homologous structures that could be utilized as templates. To simplify this task, we have developed the PPI3D web server. The server is not only able to search for homologous protein complexes, but also provides means to analyze identified interactions and to model protein complexes. In recent CASP and CAPRI experiments, PPI3D proved to be a useful tool for homology modeling of multimeric proteins. In this chapter, we provide a brief description of the PPI3D web server capabilities and how to use the server for modeling of protein complexes.
Collapse
Affiliation(s)
- Justas Dapkūnas
- Institute of Biotechnology, Life Sciences Center, Vilnius University, Vilnius, Lithuania
| | - Česlovas Venclovas
- Institute of Biotechnology, Life Sciences Center, Vilnius University, Vilnius, Lithuania.
| |
Collapse
|
8
|
Jankauskaite J, Jiménez-García B, Dapkunas J, Fernández-Recio J, Moal IH. SKEMPI 2.0: an updated benchmark of changes in protein-protein binding energy, kinetics and thermodynamics upon mutation. Bioinformatics 2019; 35:462-469. [PMID: 30020414 PMCID: PMC6361233 DOI: 10.1093/bioinformatics/bty635] [Citation(s) in RCA: 170] [Impact Index Per Article: 34.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2018] [Accepted: 07/17/2018] [Indexed: 11/18/2022] Open
Abstract
Motivation Understanding the relationship between the sequence, structure, binding energy, binding kinetics and binding thermodynamics of protein–protein interactions is crucial to understanding cellular signaling, the assembly and regulation of molecular complexes, the mechanisms through which mutations lead to disease, and protein engineering. Results We present SKEMPI 2.0, a major update to our database of binding free energy changes upon mutation for structurally resolved protein–protein interactions. This version now contains manually curated binding data for 7085 mutations, an increase of 133%, including changes in kinetics for 1844 mutations, enthalpy and entropy changes for 443 mutations, and 440 mutations, which abolish detectable binding. Availability and implementation The database is available as supplementary data and at https://life.bsc.es/pid/skempi2/. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Justina Jankauskaite
- Institute of Biotechnology, Life Sciences Center, Vilnius University, Vilnius, Lithuania
| | - Brian Jiménez-García
- Barcelona Supercomputing Center (BSC), Barcelona, Spain.,Bijvoet Center for Biomolecular Research, Faculty of Science, Utrecht University, Utrecht, the Netherlands
| | - Justas Dapkunas
- Institute of Biotechnology, Life Sciences Center, Vilnius University, Vilnius, Lithuania
| | - Juan Fernández-Recio
- Barcelona Supercomputing Center (BSC), Barcelona, Spain.,Institut de Biologia Molecular de Barcelona (IBMB), CSIC, Barcelona, Spain
| | - Iain H Moal
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridge, UK
| |
Collapse
|
9
|
Computational Modeling of Designed Ankyrin Repeat Protein Complexes with Their Targets. J Mol Biol 2019; 431:2852-2868. [DOI: 10.1016/j.jmb.2019.05.005] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2019] [Revised: 05/03/2019] [Accepted: 05/03/2019] [Indexed: 01/24/2023]
|
10
|
Abstract
The atomic structures of protein complexes can provide useful information for drug design, protein engineering, systems biology, and understanding pathology. Obtaining this information experimentally can be challenging. However, if the structures of the subunits are known, then it is often possible to model the complex computationally. This chapter provide practical guidelines for docking proteins using the SwarmDock flexible protein-protein docking method, providing an overview of the factors that need to be considered when deciding whether docking is likely to be successful, the preparation of structural input, generation of docked poses, analysis and ranking of docked poses, and the validation of models using external data.
Collapse
Affiliation(s)
- Iain H Moal
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Cambridge, UK.
| | | | - Paul A Bates
- Biomolecular Modelling Laboratory, The Francis Crick Institute, London, UK
| |
Collapse
|