1
|
Han J, Zhang S, Guan M, Li Q, Gao X, Liu J. GeoNet enables the accurate prediction of protein-ligand binding sites through interpretable geometric deep learning. Structure 2024:S0969-2126(24)00446-5. [PMID: 39488202 DOI: 10.1016/j.str.2024.10.011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2024] [Revised: 09/13/2024] [Accepted: 10/08/2024] [Indexed: 11/04/2024]
Abstract
The identification of protein binding residues is essential for understanding their functions in vivo. However, it remains a computational challenge to accurately identify binding sites due to the lack of known residue binding patterns. Local residue spatial distribution and its interactive biophysical environment both determine binding patterns. Previous methods could not capture both information simultaneously, resulting in unsatisfactory performance. Here, we present GeoNet, an interpretable geometric deep learning model for predicting DNA, RNA, and protein binding sites by learning the latent residue binding patterns. GeoNet achieves this by introducing a coordinate-free geometric representation to characterize local residue distributions and generating an eigenspace to depict local interactive biophysical environments. Evaluation shows that GeoNet is superior compared to other leading predictors and it shows a strong interpretability of learned representations. We present three test cases, where interaction interfaces were successfully identified with GeoNet.
Collapse
Affiliation(s)
- Jiyun Han
- School of Mathematics and Statistics, Shandong University, Weihai 264209, China
| | - Shizhuo Zhang
- School of Mathematics and Statistics, Shandong University, Weihai 264209, China
| | - Mingming Guan
- School of Mathematics and Statistics, Shandong University, Weihai 264209, China
| | - Qiuyu Li
- School of Mathematics and Statistics, Shandong University, Weihai 264209, China
| | - Xin Gao
- Computer Science Program, Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955, Saudi Arabia; Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955, Saudi Arabia.
| | - Juntao Liu
- School of Mathematics and Statistics, Shandong University, Weihai 264209, China.
| |
Collapse
|
2
|
Mirabello C, Wallner B, Nystedt B, Azinas S, Carroni M. Unmasking AlphaFold to integrate experiments and predictions in multimeric complexes. Nat Commun 2024; 15:8724. [PMID: 39379372 PMCID: PMC11461844 DOI: 10.1038/s41467-024-52951-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2024] [Accepted: 09/26/2024] [Indexed: 10/10/2024] Open
Abstract
Since the release of AlphaFold, researchers have actively refined its predictions and attempted to integrate it into existing pipelines for determining protein structures. These efforts have introduced a number of functionalities and optimisations at the latest Critical Assessment of protein Structure Prediction edition (CASP15), resulting in a marked improvement in the prediction of multimeric protein structures. However, AlphaFold's capability of predicting large protein complexes is still limited and integrating experimental data in the prediction pipeline is not straightforward. In this study, we introduce AF_unmasked to overcome these limitations. Our results demonstrate that AF_unmasked can integrate experimental information to build larger or hard to predict protein assemblies with high confidence. The resulting predictions can help interpret and augment experimental data. This approach generates high quality (DockQ score > 0.8) structures even when little to no evolutionary information is available and imperfect experimental structures are used as a starting point. AF_unmasked is developed and optimised to fill incomplete experimental structures (structural inpainting), which may provide insights into protein dynamics. In summary, AF_unmasked provides an easy-to-use method that efficiently integrates experiments to predict large protein complexes more confidently.
Collapse
Affiliation(s)
- Claudio Mirabello
- Dept of Physics, Chemistry and Biology, National Bioinformatics Infrastructure Sweden, Science for Life Laboratory, Linköping University, 581 83, Linköping, Sweden.
| | - Björn Wallner
- Dept of Physics, Chemistry and Biology, Linköping University, 581 83, Linköping, Sweden
| | - Björn Nystedt
- Dept of Cell and Molecular Biology, National Bioinformatics Infrastructure Sweden, Science for Life Laboratory, Uppsala University, Husargatan 3, SE-752 37, Uppsala, Sweden
| | - Stavros Azinas
- Dept of Biochemistry and Biophysics, Science for Life Laboratory, Stockholm University, Stockholm, Sweden
| | - Marta Carroni
- Dept of Biochemistry and Biophysics, Science for Life Laboratory, Stockholm University, Stockholm, Sweden
| |
Collapse
|
3
|
Rao J, Xie J, Yuan Q, Liu D, Wang Z, Lu Y, Zheng S, Yang Y. A variational expectation-maximization framework for balanced multi-scale learning of protein and drug interactions. Nat Commun 2024; 15:4476. [PMID: 38796523 PMCID: PMC11530528 DOI: 10.1038/s41467-024-48801-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2023] [Accepted: 05/14/2024] [Indexed: 05/28/2024] Open
Abstract
Protein functions are characterized by interactions with proteins, drugs, and other biomolecules. Understanding these interactions is essential for deciphering the molecular mechanisms underlying biological processes and developing new therapeutic strategies. Current computational methods mostly predict interactions based on either molecular network or structural information, without integrating them within a unified multi-scale framework. While a few multi-view learning methods are devoted to fusing the multi-scale information, these methods tend to rely intensively on a single scale and under-fitting the others, likely attributed to the imbalanced nature and inherent greediness of multi-scale learning. To alleviate the optimization imbalance, we present MUSE, a multi-scale representation learning framework based on a variant expectation maximization to optimize different scales in an alternating procedure over multiple iterations. This strategy efficiently fuses multi-scale information between atomic structure and molecular network scale through mutual supervision and iterative optimization. MUSE outperforms the current state-of-the-art models not only in molecular interaction (protein-protein, drug-protein, and drug-drug) tasks but also in protein interface prediction at the atomic structure scale. More importantly, the multi-scale learning framework shows potential for extension to other scales of computational drug discovery.
Collapse
Affiliation(s)
- Jiahua Rao
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, China
| | - Jiancong Xie
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, China
| | - Qianmu Yuan
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, China
| | - Deqin Liu
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, China
| | - Zhen Wang
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, China
| | - Yutong Lu
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, China.
| | - Shuangjia Zheng
- Global Institute of Future Technology, Shanghai Jiao Tong University, Shanghai, China.
| | - Yuedong Yang
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, China.
- Key Laboratory of Machine Intelligence and Advanced Computing (MOE), Sun Yat-sen University, Guangzhou, China.
- State Key Laboratory of Oncology in South China, Sun Yat-sen University, Guangzhou, China.
| |
Collapse
|
4
|
Li D, Pucci F, Rooman M. Prediction of Paratope-Epitope Pairs Using Convolutional Neural Networks. Int J Mol Sci 2024; 25:5434. [PMID: 38791470 PMCID: PMC11121317 DOI: 10.3390/ijms25105434] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2024] [Revised: 05/06/2024] [Accepted: 05/13/2024] [Indexed: 05/26/2024] Open
Abstract
Antibodies play a central role in the adaptive immune response of vertebrates through the specific recognition of exogenous or endogenous antigens. The rational design of antibodies has a wide range of biotechnological and medical applications, such as in disease diagnosis and treatment. However, there are currently no reliable methods for predicting the antibodies that recognize a specific antigen region (or epitope) and, conversely, epitopes that recognize the binding region of a given antibody (or paratope). To fill this gap, we developed ImaPEp, a machine learning-based tool for predicting the binding probability of paratope-epitope pairs, where the epitope and paratope patches were simplified into interacting two-dimensional patches, which were colored according to the values of selected features, and pixelated. The specific recognition of an epitope image by a paratope image was achieved by using a convolutional neural network-based model, which was trained on a set of two-dimensional paratope-epitope images derived from experimental structures of antibody-antigen complexes. Our method achieves good performances in terms of cross-validation with a balanced accuracy of 0.8. Finally, we showcase examples of application of ImaPep, including extensive screening of large libraries to identify paratope candidates that bind to a selected epitope, and rescoring and refining antibody-antigen docking poses.
Collapse
Affiliation(s)
- Dong Li
- Computational Biology and Bioinformatics, Université Libre de Bruxelles, 1050 Brussels, Belgium; (D.L.); (F.P.)
- Interuniversity Institute of Bioinformatics in Brussels, 1050 Brussels, Belgium
| | - Fabrizio Pucci
- Computational Biology and Bioinformatics, Université Libre de Bruxelles, 1050 Brussels, Belgium; (D.L.); (F.P.)
- Interuniversity Institute of Bioinformatics in Brussels, 1050 Brussels, Belgium
| | - Marianne Rooman
- Computational Biology and Bioinformatics, Université Libre de Bruxelles, 1050 Brussels, Belgium; (D.L.); (F.P.)
- Interuniversity Institute of Bioinformatics in Brussels, 1050 Brussels, Belgium
| |
Collapse
|
5
|
Mischley V, Maier J, Chen J, Karanicolas J. PPIscreenML: Structure-based screening for protein-protein interactions using AlphaFold. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.03.16.585347. [PMID: 38559274 PMCID: PMC10979958 DOI: 10.1101/2024.03.16.585347] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 04/04/2024]
Abstract
Protein-protein interactions underlie nearly all cellular processes. With the advent of protein structure prediction methods such as AlphaFold2 (AF2), models of specific protein pairs can be built extremely accurately in most cases. However, determining the relevance of a given protein pair remains an open question. It is presently unclear how to use best structure-based tools to infer whether a pair of candidate proteins indeed interact with one another: ideally, one might even use such information to screen amongst candidate pairings to build up protein interaction networks. Whereas methods for evaluating quality of modeled protein complexes have been co-opted for determining which pairings interact (e.g., pDockQ and iPTM), there have been no rigorously benchmarked methods for this task. Here we introduce PPIscreenML, a classification model trained to distinguish AF2 models of interacting protein pairs from AF2 models of compelling decoy pairings. We find that PPIscreenML out-performs methods such as pDockQ and iPTM for this task, and further that PPIscreenML exhibits impressive performance when identifying which ligand/receptor pairings engage one another across the structurally conserved tumor necrosis factor superfamily (TNFSF). Analysis of benchmark results using complexes not seen in PPIscreenML development strongly suggest that the model generalizes beyond training data, making it broadly applicable for identifying new protein complexes based on structural models built with AF2.
Collapse
Affiliation(s)
- Victoria Mischley
- Cancer Signaling & Microenvironment Program, Fox Chase Cancer Center, Philadelphia PA 19111
- Molecular Cell Biology and Genetics, Drexel University, Philadelphia PA 19102
| | | | | | - John Karanicolas
- Cancer Signaling & Microenvironment Program, Fox Chase Cancer Center, Philadelphia PA 19111
- Moulder Center for Drug Discovery Research, Temple University School of Pharmacy, Philadelphia PA 19140
| |
Collapse
|
6
|
Singh A, Copeland MM, Kundrotas PJ, Vakser IA. GRAMM Web Server for Protein Docking. Methods Mol Biol 2024; 2714:101-112. [PMID: 37676594 DOI: 10.1007/978-1-0716-3441-7_5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/08/2023]
Abstract
Prediction of the structure of protein complexes by docking methods is a well-established research field. The intermolecular energy landscapes in protein-protein interactions can be used to refine docking predictions and to detect macro-characteristics, such as the binding funnel. A new GRAMM web server for protein docking predicts a spectrum of docking poses that characterize the intermolecular energy landscape in protein interaction. A user-friendly interface provides options to choose free or template-based docking, as well as other advanced features, such as clustering of the docking poses, and interactive visualization of the docked models.
Collapse
Affiliation(s)
- Amar Singh
- Computational Biology Program and Department of Molecular Biosciences, The University of Kansas, Lawrence, KS, USA
| | - Matthew M Copeland
- Computational Biology Program and Department of Molecular Biosciences, The University of Kansas, Lawrence, KS, USA
| | - Petras J Kundrotas
- Computational Biology Program and Department of Molecular Biosciences, The University of Kansas, Lawrence, KS, USA.
| | - Ilya A Vakser
- Computational Biology Program and Department of Molecular Biosciences, The University of Kansas, Lawrence, KS, USA.
| |
Collapse
|
7
|
Collins KW, Copeland MM, Kotthoff I, Singh A, Kundrotas PJ, Vakser IA. Dockground resource for protein recognition studies. Protein Sci 2022; 31:e4481. [PMID: 36281025 PMCID: PMC9667896 DOI: 10.1002/pro.4481] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2022] [Revised: 10/19/2022] [Accepted: 10/20/2022] [Indexed: 12/13/2022]
Abstract
Structural information of protein-protein interactions is essential for characterization of life processes at the molecular level. While a small fraction of known protein interactions has experimentally determined structures, computational modeling of protein complexes (protein docking) has to fill the gap. The Dockground resource (http://dockground.compbio.ku.edu) provides a collection of datasets for the development and testing of protein docking techniques. Currently, Dockground contains datasets for the bound and the unbound (experimentally determined and simulated) protein structures, model-model complexes, docking decoys of experimentally determined and modeled proteins, and templates for comparative docking. The Dockground bound proteins dataset is a core set, from which other Dockground datasets are generated. It is devised as a relational PostgreSQL database containing information on experimentally determined protein-protein complexes. This report on the Dockground resource describes current status of the datasets, new automated update procedures and further development of the core datasets. We also present a new Dockground interactive web interface, which allows search by various parameters, such as release date, multimeric state, complex type, structure resolution, and so on, visualization of the search results with a number of customizable parameters, as well as downloadable datasets with predefined levels of sequence and structure redundancy.
Collapse
Affiliation(s)
| | | | - Ian Kotthoff
- Computational Biology ProgramThe University of KansasKansasUSA
| | - Amar Singh
- Computational Biology ProgramThe University of KansasKansasUSA
| | | | - Ilya A. Vakser
- Computational Biology ProgramThe University of KansasKansasUSA
- Department of Molecular BiosciencesThe University of KansasKansasUSA
| |
Collapse
|