1
|
Fasoulis R, Paliouras G, Kavraki LE. RankMHC: Learning to Rank Class-I Peptide-MHC Structural Models. J Chem Inf Model 2024; 64:8729-8742. [PMID: 39555889 PMCID: PMC11633655 DOI: 10.1021/acs.jcim.4c01278] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2024] [Revised: 10/16/2024] [Accepted: 11/07/2024] [Indexed: 11/19/2024]
Abstract
The binding of peptides to class-I Major Histocompability Complex (MHC) receptors and their subsequent recognition downstream by T-cell receptors are crucial processes for most multicellular organisms to be able to fight various diseases. Thus, the identification of peptide antigens that can elicit an immune response is of immense importance for developing successful therapies for bacterial and viral infections, even cancer. Recently, studies have demonstrated the importance of peptide-MHC (pMHC) structural analysis, with pMHC structural modeling methods gradually becoming more popular in peptide antigen identification workflows. Most of the pMHC structural modeling tools provide an ensemble of candidate peptide poses in the MHC-I cleft, each associated with a score stemming from a scoring function, with the top scoring pose assumed to be the most representative of the ensemble. However, identifying the binding mode, that is, the peptide pose from the ensemble that is closer to an unavailable native structure, is not trivial. Oftentimes, the peptide poses characterized as best by a protein-ligand scoring function are not the ones that are the most representative of the actual structure. In this work, we frame the peptide binding pose identification problem as a Learning-to-Rank (LTR) problem. We present RankMHC, an LTR-based pMHC binding mode identification predictor, which is specifically trained to predict the most accurate ranking of an ensemble of pMHC conformations. RankMHC outperforms classical peptide-ligand scoring functions, as well as previous Machine Learning (ML)-based binding pose predictors. We further demonstrate that RankMHC can be used with many pMHC structural modeling tools that use different structural modeling protocols.
Collapse
Affiliation(s)
- Romanos Fasoulis
- Department
of Computer Science, Rice University, Houston, Texas 77005, United States
| | - Georgios Paliouras
- Institute
of Informatics and Telecommunications, NCSR
Demokritos, Athens 15341, Greece
| | - Lydia E. Kavraki
- Department
of Computer Science, Rice University, Houston, Texas 77005, United States
- Ken
Kennedy Institute, Rice University, Houston, Texas 77005, United States
| |
Collapse
|
2
|
Li G, Yuan Y, Zhang R. Predicting Protein-Ligand Binding Affinity Using Fusion Model of Spatial-Temporal Graph Neural Network and 3D Structure-Based Complex Graph. Interdiscip Sci 2024:10.1007/s12539-024-00644-9. [PMID: 39541085 DOI: 10.1007/s12539-024-00644-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2024] [Revised: 07/09/2024] [Accepted: 07/16/2024] [Indexed: 11/16/2024]
Abstract
The investigation of molecular interactions between ligands and their target molecules is becoming more significant as protein structure data continues to develop. In this study, we introduce PLA-STGCNnet, a deep fusion spatial-temporal graph neural network designed to study protein-ligand interactions based on the 3D structural data of protein-ligand complexes. Unlike 1D protein sequences or 2D ligand graphs, the 3D graph representation offers a more precise portrayal of the complex interactions between proteins and ligands. Research studies have shown that our fusion model, PLA-STGCNnet, outperforms individual algorithms in accurately predicting binding affinity. The advantage of a fusion model is the ability to fully combine the advantages of multiple different models and improve overall performance by combining their features and outputs. Our fusion model shows satisfactory performance on different data sets, which proves its generalization ability and stability. The fusion-based model showed good performance in protein-ligand affinity prediction, and we successfully applied the model to drug screening. Our research underscores the promise of fusion spatial-temporal graph neural networks in addressing complex challenges in protein-ligand affinity prediction. The Python scripts for implementing various model components are accessible at https://github.com/ligaili01/PLA-STGCN.
Collapse
Affiliation(s)
- Gaili Li
- School of Information science and Engineering, Lanzhou University, lanzhou, 730000, China
| | - Yongna Yuan
- School of Information science and Engineering, Lanzhou University, lanzhou, 730000, China.
| | - Ruisheng Zhang
- School of Information science and Engineering, Lanzhou University, lanzhou, 730000, China.
| |
Collapse
|
3
|
Zhao Y, He S, Xing Y, Li M, Cao Y, Wang X, Zhao D, Bo X. A Point Cloud Graph Neural Network for Protein-Ligand Binding Site Prediction. Int J Mol Sci 2024; 25:9280. [PMID: 39273227 PMCID: PMC11394757 DOI: 10.3390/ijms25179280] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2024] [Revised: 08/25/2024] [Accepted: 08/26/2024] [Indexed: 09/15/2024] Open
Abstract
Predicting protein-ligand binding sites is an integral part of structural biology and drug design. A comprehensive understanding of these binding sites is essential for advancing drug innovation, elucidating mechanisms of biological function, and exploring the nature of disease. However, accurately identifying protein-ligand binding sites remains a challenging task. To address this, we propose PGpocket, a geometric deep learning-based framework to improve protein-ligand binding site prediction. Initially, the protein surface is converted into a point cloud, and then the geometric and chemical properties of each point are calculated. Subsequently, the point cloud graph is constructed based on the inter-point distances, and the point cloud graph neural network (GNN) is applied to extract and analyze the protein surface information to predict potential binding sites. PGpocket is trained on the scPDB dataset, and its performance is verified on two independent test sets, Coach420 and HOLO4K. The results show that PGpocket achieves a 58% success rate on the Coach420 dataset and a 56% success rate on the HOLO4K dataset. These results surpass competing algorithms, demonstrating PGpocket's advancement and practicality for protein-ligand binding site prediction.
Collapse
Affiliation(s)
- Yanpeng Zhao
- Academy of Military Medical Sciences, Beijing 100850, China
| | - Song He
- Academy of Military Medical Sciences, Beijing 100850, China
| | - Yuting Xing
- Defense Innovation Institute, Beijing 100071, China
| | - Mengfan Li
- Academy of Military Medical Sciences, Beijing 100850, China
| | - Yang Cao
- Academy of Military Medical Sciences, Beijing 100850, China
| | - Xuanze Wang
- Academy of Military Medical Sciences, Beijing 100850, China
| | - Dongsheng Zhao
- Academy of Military Medical Sciences, Beijing 100850, China
| | - Xiaochen Bo
- Academy of Military Medical Sciences, Beijing 100850, China
| |
Collapse
|
4
|
Prat A, Abdel Aty H, Bastas O, Kamuntavičius G, Paquet T, Norvaišas P, Gasparotto P, Tal R. HydraScreen: A Generalizable Structure-Based Deep Learning Approach to Drug Discovery. J Chem Inf Model 2024; 64:5817-5831. [PMID: 39037942 DOI: 10.1021/acs.jcim.4c00481] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/24/2024]
Abstract
We propose HydraScreen, a deep-learning framework for safe and robust accelerated drug discovery. HydraScreen utilizes a state-of-the-art 3D convolutional neural network designed for the effective representation of molecular structures and interactions in protein-ligand binding. We designed an end-to-end pipeline for high-throughput screening and lead optimization, targeting applications in structure-based drug design. We assessed our approach using established public benchmarks based on the CASF-2016 core set, achieving top-tier results in affinity and pose prediction (Pearson's r = 0.86, RMSE = 1.15, Top-1 = 0.95). We introduced a novel approach for interaction profiling, aimed at detecting potential biases within both the model and data sets. This approach not only enhanced interpretability but also reinforced the impartiality of our methodology. Finally, we demonstrated HydraScreen's ability to generalize effectively across novel proteins and ligands through a temporal split. We also provide insights into potential avenues for future development aimed at enhancing the robustness of machine learning scoring functions. HydraScreen (accessible at http://hydrascreen.ro5.ai/paper) provides a user-friendly GUI and a public API, facilitating the easy-access assessment of protein-ligand complexes.
Collapse
Affiliation(s)
- Alvaro Prat
- AI Chemistry, Ro5 2801 Gateway Drive, Irving, 75063 Texas, United States
| | - Hisham Abdel Aty
- AI Chemistry, Ro5 2801 Gateway Drive, Irving, 75063 Texas, United States
| | - Orestis Bastas
- AI Chemistry, Ro5 2801 Gateway Drive, Irving, 75063 Texas, United States
| | | | - Tanya Paquet
- AI Chemistry, Ro5 2801 Gateway Drive, Irving, 75063 Texas, United States
| | - Povilas Norvaišas
- AI Chemistry, Ro5 2801 Gateway Drive, Irving, 75063 Texas, United States
| | - Piero Gasparotto
- AI Chemistry, Ro5 2801 Gateway Drive, Irving, 75063 Texas, United States
| | - Roy Tal
- AI Chemistry, Ro5 2801 Gateway Drive, Irving, 75063 Texas, United States
| |
Collapse
|
5
|
Liu Y, Xu C, Yang X, Zhang Y, Chen Y, Liu H. Application progress of deep generative models in de novo drug design. Mol Divers 2024; 28:2411-2427. [PMID: 39097862 DOI: 10.1007/s11030-024-10942-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2024] [Accepted: 07/16/2024] [Indexed: 08/05/2024]
Abstract
The deep molecular generative model has recently become a research hotspot in pharmacy. This paper analyzes a large number of recent reports and reviews these models. In the central part of this paper, four compound databases and two molecular representation methods are compared. Five model architectures and applications for deep molecular generative models are emphatically introduced. Three evaluation metrics for model evaluation are listed. Finally, the limitations and challenges in this field are discussed to provide a reference and basis for developing and researching new models published in future.
Collapse
Affiliation(s)
- Yingxu Liu
- School of Science, China Pharmaceutical University, Nanjing, 210009, China
| | - Chengcheng Xu
- School of Science, China Pharmaceutical University, Nanjing, 210009, China
| | - Xinyi Yang
- School of Science, China Pharmaceutical University, Nanjing, 210009, China
| | - Yanmin Zhang
- School of Science, China Pharmaceutical University, Nanjing, 210009, China
| | - Yadong Chen
- School of Science, China Pharmaceutical University, Nanjing, 210009, China
| | - Haichun Liu
- School of Science, China Pharmaceutical University, Nanjing, 210009, China.
| |
Collapse
|
6
|
Yang R, Zhang L, Bu F, Sun F, Cheng B. AI-based prediction of protein-ligand binding affinity and discovery of potential natural product inhibitors against ERK2. BMC Chem 2024; 18:108. [PMID: 38831341 PMCID: PMC11145815 DOI: 10.1186/s13065-024-01219-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2024] [Accepted: 05/29/2024] [Indexed: 06/05/2024] Open
Abstract
Determination of protein-ligand binding affinity (PLA) is a key technological tool in hit discovery and lead optimization, which is critical to the drug development process. PLA can be determined directly by experimental methods, but it is time-consuming and costly. In recent years, deep learning has been widely applied to PLA prediction, the key of which lies in the comprehensive and accurate representation of proteins and ligands. In this study, we proposed a multi-modal deep learning model based on the early fusion strategy, called DeepLIP, to improve PLA prediction by integrating multi-level information, and further used it for virtual screening of extracellular signal-regulated protein kinase 2 (ERK2), an ideal target for cancer treatment. Experimental results from model evaluation showed that DeepLIP achieved superior performance compared to state-of-the-art methods on the widely used benchmark dataset. In addition, by combining previously developed machine learning models and molecular dynamics simulation, we screened three novel hits from a drug-like natural product library. These compounds not only had favorable physicochemical properties, but also bound stably to the target protein. We believe they have the potential to serve as starting molecules for the development of ERK2 inhibitors.
Collapse
Affiliation(s)
- Ruoqi Yang
- Affiliated Hospital of Shandong University of Traditional Chinese Medicine, Jinan, 250011, China.
- Shandong University of Traditional Chinese Medicine, Jinan, 250355, China.
| | - Lili Zhang
- Jinan Central Hospital Affiliated to Shandong First Medical University, Jinan, 250013, China
| | - Fanyou Bu
- Qingdao Municipal Hospital Group, Qingdao, 266000, China
| | - Fuqiang Sun
- Shandong University of Traditional Chinese Medicine, Jinan, 250355, China
| | - Bin Cheng
- Affiliated Hospital of Shandong University of Traditional Chinese Medicine, Jinan, 250011, China.
| |
Collapse
|
7
|
Kairys V, Baranauskiene L, Kazlauskiene M, Zubrienė A, Petrauskas V, Matulis D, Kazlauskas E. Recent advances in computational and experimental protein-ligand affinity determination techniques. Expert Opin Drug Discov 2024; 19:649-670. [PMID: 38715415 DOI: 10.1080/17460441.2024.2349169] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2024] [Accepted: 04/25/2024] [Indexed: 05/22/2024]
Abstract
INTRODUCTION Modern drug discovery revolves around designing ligands that target the chosen biomolecule, typically proteins. For this, the evaluation of affinities of putative ligands is crucial. This has given rise to a multitude of dedicated computational and experimental methods that are constantly being developed and improved. AREAS COVERED In this review, the authors reassess both the industry mainstays and the newest trends among the methods for protein - small-molecule affinity determination. They discuss both computational affinity predictions and experimental techniques, describing their basic principles, main limitations, and advantages. Together, this serves as initial guide to the currently most popular and cutting-edge ligand-binding assays employed in rational drug design. EXPERT OPINION The affinity determination methods continue to develop toward miniaturization, high-throughput, and in-cell application. Moreover, the availability of data analysis tools has been constantly increasing. Nevertheless, cross-verification of data using at least two different techniques and careful result interpretation remain of utmost importance.
Collapse
Affiliation(s)
- Visvaldas Kairys
- Department of Bioinformatics, Institute of Biotechnology, Life Sciences Center, Vilnius University, Vilnius, Lithuania
| | - Lina Baranauskiene
- Department of Biothermodynamics and Drug Design, Institute of Biotechnology, Life Sciences Center, Vilnius University, Vilnius, Lithuania
| | | | - Asta Zubrienė
- Department of Biothermodynamics and Drug Design, Institute of Biotechnology, Life Sciences Center, Vilnius University, Vilnius, Lithuania
| | - Vytautas Petrauskas
- Department of Biothermodynamics and Drug Design, Institute of Biotechnology, Life Sciences Center, Vilnius University, Vilnius, Lithuania
| | - Daumantas Matulis
- Department of Biothermodynamics and Drug Design, Institute of Biotechnology, Life Sciences Center, Vilnius University, Vilnius, Lithuania
| | - Egidijus Kazlauskas
- Department of Biothermodynamics and Drug Design, Institute of Biotechnology, Life Sciences Center, Vilnius University, Vilnius, Lithuania
| |
Collapse
|
8
|
Zhang R, Yuan R, Tian B. PointGAT: A Quantum Chemical Property Prediction Model Integrating Graph Attention and 3D Geometry. J Chem Theory Comput 2024; 20:4115-4128. [PMID: 38727259 DOI: 10.1021/acs.jctc.3c01420] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/29/2024]
Abstract
Predicting quantum chemical properties is a fundamental challenge for computational chemistry. While the development of graph neural networks has advanced molecular representation learning and property prediction, their performance could be further enhanced by incorporating three-dimensional (3D) structural geometry into two-dimensional (2D) molecular graph representation. In this study, we introduce the PointGAT model for quantum molecular property prediction, which integrates 3D molecular coordinates with graph-attention modeling. Comparison with other current models in molecular prediction tasks showed that PointGAT could provide higher predictive accuracy in various benchmark data sets from MoleculeNet, including ESOL, FreeSolv, Lipop, HIV, and 6 out of 12 tasks of the QM9 data set. To further examine PointGAT prediction of quantum mechanical (QM) energies, we constructed a C10 data set comprising 11,841 charged and chiral carbocation intermediates with QM energies calculated at the DM21/6-31G*//B3LYP/6-31G* levels. Notably, PointGAT achieved an R2 value of 0.950 and an MAE of 1.616 kcal/mol, outperforming even the best-performing graph neural network model with a reduction of 0.216 kcal/mol in MAE and an improvement of 0.050 in R2. Additional ablation studies indicated that incorporating molecular geometry into the model resulted in markedly higher predictive accuracy, reducing the MAE value from 1.802 to 1.616 kcal/mol. Moreover, visualization of PointGAT atomic attention weights suggested its predictions were interpretable. Findings in this study support the application of PointGAT as a powerful and versatile tool for quantum chemical property prediction that can facilitate high-accuracy modeling for fundamental exploration of chemical space as well as drug design and molecular engineering.
Collapse
Affiliation(s)
- Rong Zhang
- MOE Key Laboratory of Bioinformatics, State Key Laboratory of Molecular Oncology, School of Pharmaceutical Sciences, Tsinghua University, Beijing 100084, China
| | - Rongqing Yuan
- Department of Chemistry, Tsinghua University, Beijing 100084, China
| | - Boxue Tian
- MOE Key Laboratory of Bioinformatics, State Key Laboratory of Molecular Oncology, School of Pharmaceutical Sciences, Tsinghua University, Beijing 100084, China
| |
Collapse
|
9
|
Zhang H, Fan H, Wang J, Hou T, Saravanan KM, Xia W, Kan HW, Li J, Zhang JZH, Liang X, Chen Y. Revolutionizing GPCR-ligand predictions: DeepGPCR with experimental validation for high-precision drug discovery. Brief Bioinform 2024; 25:bbae281. [PMID: 38864340 PMCID: PMC11167311 DOI: 10.1093/bib/bbae281] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2024] [Revised: 05/05/2024] [Accepted: 05/29/2024] [Indexed: 06/13/2024] Open
Abstract
G-protein coupled receptors (GPCRs), crucial in various diseases, are targeted of over 40% of approved drugs. However, the reliable acquisition of experimental GPCRs structures is hindered by their lipid-embedded conformations. Traditional protein-ligand interaction models falter in GPCR-drug interactions, caused by limited and low-quality structures. Generalized models, trained on soluble protein-ligand pairs, are also inadequate. To address these issues, we developed two models, DeepGPCR_BC for binary classification and DeepGPCR_RG for affinity prediction. These models use non-structural GPCR-ligand interaction data, leveraging graph convolutional networks and mol2vec techniques to represent binding pockets and ligands as graphs. This approach significantly speeds up predictions while preserving critical physical-chemical and spatial information. In independent tests, DeepGPCR_BC surpassed Autodock Vina and Schrödinger Dock with an area under the curve of 0.72, accuracy of 0.68 and true positive rate of 0.73, whereas DeepGPCR_RG demonstrated a Pearson correlation of 0.39 and root mean squared error of 1.34. We applied these models to screen drug candidates for GPR35 (Q9HC97), yielding promising results with three (F545-1970, K297-0698, S948-0241) out of eight candidates. Furthermore, we also successfully obtained six active inhibitors for GLP-1R. Our GPCR-specific models pave the way for efficient and accurate large-scale virtual screening, potentially revolutionizing drug discovery in the GPCR field.
Collapse
Affiliation(s)
- Haiping Zhang
- Faculty of Synthetic Biology and Institute of Synthetic Biology, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, No. 1068 Xueyuan Boulevard, Nanshan District, Shenzhen 518055, Guangdong Province, China
| | - Hongjie Fan
- Ganjiang Chinese Medicine Innovation Center, Xinqizhou East Road 888, Ganjiang New Area, Nanchang 330000, China
| | - Jixia Wang
- Ganjiang Chinese Medicine Innovation Center, Xinqizhou East Road 888, Ganjiang New Area, Nanchang 330000, China
- CAS Key Laboratory of Separation Science for Analytical Chemistry, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, No. 457 Zhongshan Road, Dalian 116023, China
| | - Tao Hou
- Ganjiang Chinese Medicine Innovation Center, Xinqizhou East Road 888, Ganjiang New Area, Nanchang 330000, China
- CAS Key Laboratory of Separation Science for Analytical Chemistry, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, No. 457 Zhongshan Road, Dalian 116023, China
| | - Konda Mani Saravanan
- Department of Biotechnology, Bharath Institute of Higher Education and Research, Agharam Road 173, Selaiyur, Chennai, Tamil Nadu 600073, India
| | - Wei Xia
- Faculty of Synthetic Biology and Institute of Synthetic Biology, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, No. 1068 Xueyuan Boulevard, Nanshan District, Shenzhen 518055, Guangdong Province, China
| | - Hei Wun Kan
- Faculty of Synthetic Biology and Institute of Synthetic Biology, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, No. 1068 Xueyuan Boulevard, Nanshan District, Shenzhen 518055, Guangdong Province, China
| | - Junxin Li
- Shenzhen Laboratory of Human Antibody Engineering, Institute of Biomedicine and Biotechnology, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, No. 1068 Xueyuan Boulevard, Nanshan District, Shenzhen 518055, Guangdong Province, China
| | - John Z H Zhang
- Faculty of Synthetic Biology and Institute of Synthetic Biology, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, No. 1068 Xueyuan Boulevard, Nanshan District, Shenzhen 518055, Guangdong Province, China
| | - Xinmiao Liang
- Ganjiang Chinese Medicine Innovation Center, Xinqizhou East Road 888, Ganjiang New Area, Nanchang 330000, China
- CAS Key Laboratory of Separation Science for Analytical Chemistry, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, No. 457 Zhongshan Road, Dalian 116023, China
| | - Yang Chen
- Ganjiang Chinese Medicine Innovation Center, Xinqizhou East Road 888, Ganjiang New Area, Nanchang 330000, China
- CAS Key Laboratory of Separation Science for Analytical Chemistry, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, No. 457 Zhongshan Road, Dalian 116023, China
| |
Collapse
|
10
|
Zhang Y, Li S, Meng K, Sun S. Machine Learning for Sequence and Structure-Based Protein-Ligand Interaction Prediction. J Chem Inf Model 2024; 64:1456-1472. [PMID: 38385768 DOI: 10.1021/acs.jcim.3c01841] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/23/2024]
Abstract
Developing new drugs is too expensive and time -consuming. Accurately predicting the interaction between drugs and targets will likely change how the drug is discovered. Machine learning-based protein-ligand interaction prediction has demonstrated significant potential. In this paper, computational methods, focusing on sequence and structure to study protein-ligand interactions, are examined. Therefore, this paper starts by presenting an overview of the data sets applied in this area, as well as the various approaches applied for representing proteins and ligands. Then, sequence-based and structure-based classification criteria are subsequently utilized to categorize and summarize both the classical machine learning models and deep learning models employed in protein-ligand interaction studies. Moreover, the evaluation methods and interpretability of these models are proposed. Furthermore, delving into the diverse applications of protein-ligand interaction models in drug research is presented. Lastly, the current challenges and future directions in this field are addressed.
Collapse
Affiliation(s)
- Yunjiang Zhang
- Beijing Key Laboratory for Green Catalysis and Separation, The Faculty of Environment and Life, Beijing University of Technology, Beijing 100124, P. R. China
| | - Shuyuan Li
- Beijing Key Laboratory for Green Catalysis and Separation, The Faculty of Environment and Life, Beijing University of Technology, Beijing 100124, P. R. China
| | - Kong Meng
- Beijing Key Laboratory for Green Catalysis and Separation, The Faculty of Environment and Life, Beijing University of Technology, Beijing 100124, P. R. China
| | - Shaorui Sun
- Beijing Key Laboratory for Green Catalysis and Separation, The Faculty of Environment and Life, Beijing University of Technology, Beijing 100124, P. R. China
| |
Collapse
|
11
|
Li Z, Ren P, Yang H, Zheng J, Bai F. TEFDTA: a transformer encoder and fingerprint representation combined prediction method for bonded and non-bonded drug-target affinities. Bioinformatics 2024; 40:btad778. [PMID: 38141210 PMCID: PMC10777355 DOI: 10.1093/bioinformatics/btad778] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2023] [Revised: 11/23/2023] [Accepted: 12/22/2023] [Indexed: 12/25/2023] Open
Abstract
MOTIVATION The prediction of binding affinity between drug and target is crucial in drug discovery. However, the accuracy of current methods still needs to be improved. On the other hand, most deep learning methods focus only on the prediction of non-covalent (non-bonded) binding molecular systems, but neglect the cases of covalent binding, which has gained increasing attention in the field of drug development. RESULTS In this work, a new attention-based model, A Transformer Encoder and Fingerprint combined Prediction method for Drug-Target Affinity (TEFDTA) is proposed to predict the binding affinity for bonded and non-bonded drug-target interactions. To deal with such complicated problems, we used different representations for protein and drug molecules, respectively. In detail, an initial framework was built by training our model using the datasets of non-bonded protein-ligand interactions. For the widely used dataset Davis, an additional contribution of this study is that we provide a manually corrected Davis database. The model was subsequently fine-tuned on a smaller dataset of covalent interactions from the CovalentInDB database to optimize performance. The results demonstrate a significant improvement over existing approaches, with an average improvement of 7.6% in predicting non-covalent binding affinity and a remarkable average improvement of 62.9% in predicting covalent binding affinity compared to using BindingDB data alone. At the end, the potential ability of our model to identify activity cliffs was investigated through a case study. The prediction results indicate that our model is sensitive to discriminate the difference of binding affinities arising from small variances in the structures of compounds. AVAILABILITY AND IMPLEMENTATION The codes and datasets of TEFDTA are available at https://github.com/lizongquan01/TEFDTA.
Collapse
Affiliation(s)
- Zongquan Li
- School of Information Science and Technology, ShanghaiTech University, Shanghai, 201210, China
- Shanghai Institute for Advanced Immunochemical Studies and School of Life Science and Technology, ShanghaiTech University, Shanghai, 201210, China
| | - Pengxuan Ren
- Shanghai Institute for Advanced Immunochemical Studies and School of Life Science and Technology, ShanghaiTech University, Shanghai, 201210, China
| | - Hao Yang
- Shanghai Institute for Advanced Immunochemical Studies and School of Life Science and Technology, ShanghaiTech University, Shanghai, 201210, China
| | - Jie Zheng
- School of Information Science and Technology, ShanghaiTech University, Shanghai, 201210, China
| | - Fang Bai
- School of Information Science and Technology, ShanghaiTech University, Shanghai, 201210, China
- Shanghai Institute for Advanced Immunochemical Studies and School of Life Science and Technology, ShanghaiTech University, Shanghai, 201210, China
- Shanghai Clinical Research and Trial Center, Shanghai, 201210, China
| |
Collapse
|
12
|
Zhang S, Han J, Liu J. Protein-protein and protein-nucleic acid binding site prediction via interpretable hierarchical geometric deep learning. Gigascience 2024; 13:giae080. [PMID: 39484977 PMCID: PMC11528319 DOI: 10.1093/gigascience/giae080] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2024] [Revised: 08/29/2024] [Accepted: 09/25/2024] [Indexed: 11/03/2024] Open
Abstract
Identification of protein-protein and protein-nucleic acid binding sites provides insights into biological processes related to protein functions and technical guidance for disease diagnosis and drug design. However, accurate predictions by computational approaches remain highly challenging due to the limited knowledge of residue binding patterns. The binding pattern of a residue should be characterized by the spatial distribution of its neighboring residues combined with their physicochemical information interaction, which yet cannot be achieved by previous methods. Here, we design GraphRBF, a hierarchical geometric deep learning model to learn residue binding patterns from big data. To achieve it, GraphRBF describes physicochemical information interactions by designing an enhanced graph neural network and characterizes residue spatial distributions by introducing a prioritized radial basis function neural network. After training and testing, GraphRBF shows great improvements over existing state-of-the-art methods and strong interpretability of its learned representations. Applying GraphRBF to the SARS-CoV-2 omicron spike protein, it successfully identifies known epitopes of the protein. Moreover, it predicts multiple potential binding regions for new nanobodies or even new drugs with strong evidence. A user-friendly online server for GraphRBF is freely available at http://liulab.top/GraphRBF/server.
Collapse
Affiliation(s)
- Shizhuo Zhang
- School of Mathematics and Statistics, Shandong University (Weihai), Weihai 264209, China
| | - Jiyun Han
- School of Mathematics and Statistics, Shandong University (Weihai), Weihai 264209, China
| | - Juntao Liu
- School of Mathematics and Statistics, Shandong University (Weihai), Weihai 264209, China
| |
Collapse
|
13
|
Wang M, Li W, Yu X, Luo Y, Han K, Wang C, Jin Q. AffinityVAE: A multi-objective model for protein-ligand affinity prediction and drug design. Comput Biol Chem 2023; 107:107971. [PMID: 37852036 DOI: 10.1016/j.compbiolchem.2023.107971] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2023] [Revised: 09/23/2023] [Accepted: 10/08/2023] [Indexed: 10/20/2023]
Abstract
In the prediction of protein-ligand affinity, the traditional methods require a large amount of computing resources, and have certain limitations in predicting and simulating the structural changes. Although employing data-driven approaches can yield favorable outcomes in deep learning, it entails a lack of interpretability. Some methods may require additional structural information or domain knowledge to support the interpretation, which may limit their applicability. This paper proposes an affinity variational autoencoder (AffinityVAE) using interaction feature mapping and a variational autoencoder, which consists of a multi-objective model capable of end-to-end affinity prediction and drug discovery. In this study, the limitations of affinity prediction in terms of interpretability are tackled by proposing the concept of a protein-ligand interaction feature map. This increases the diversity and quantity of protein-ligand binding data by designing an adaptive autoencoder of target chemical properties to generate new ligands similar to known ligands and adding them to the original training set. AffinityVAE is then retrained using this extended training set to further validate the protein-ligand binding affinity prediction. Comparisons were conducted between the AffinityVAE and recent methods to demonstrate the high efficiency of the proposed model. The experimental results show that AffinityVAE has very high prediction performance, and it has the potential to enhance the diversity and the amount of protein-ligand binding data, which promotes the drug development.
Collapse
Affiliation(s)
- Mengying Wang
- School of Computer Engineering and Science, Shanghai University, Shanghai, China.
| | - Weimin Li
- School of Computer Engineering and Science, Shanghai University, Shanghai, China.
| | - Xiao Yu
- School of Computer Engineering and Science, Shanghai University, Shanghai, China
| | - Yin Luo
- School of Life Sciences, East China Normal University, China
| | - Ke Han
- Medical and Health Center, Liaocheng People's Hospital, LiaoCheng, China.
| | - Can Wang
- School of Information and Communication Technology, Griffith University, Australia
| | - Qun Jin
- Networked Information System Laboratory, Waseda University, Tokyo, Japan
| |
Collapse
|
14
|
Li G, Yuan Y, Zhang R. Ensemble of local and global information for Protein-Ligand Binding Affinity Prediction. Comput Biol Chem 2023; 107:107972. [PMID: 37883905 DOI: 10.1016/j.compbiolchem.2023.107972] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2023] [Revised: 10/07/2023] [Accepted: 10/17/2023] [Indexed: 10/28/2023]
Abstract
Accurately predicting protein-ligand binding affinities is crucial for determining molecular properties and understanding their physical effects. Neural networks and transformers are the predominant methods for sequence modeling, and both have been successfully applied independently for protein-ligand binding affinity prediction. As local and global information of molecules are vital for protein-ligand binding affinity prediction, we aim to combine bi-directional gated recurrent unit (BiGRU) and convolutional neural network (CNN) to effectively capture both local and global molecular information. Additionally, attention mechanisms can be incorporated to automatically learn and adjust the level of attention given to local and global information, thereby enhancing the performance of the model. To achieve this, we propose the PLAsformer approach, which encodes local and global information of molecules using 3DCNN and BiGRU with attention mechanism, respectively. This approach enhances the model's ability to encode comprehensive local and global molecular information. PLAsformer achieved a Pearson's correlation coefficient of 0.812 and a Root Mean Square Error (RMSE) of 1.284 when comparing experimental and predicted affinity on the PDBBind-2016 dataset. These results surpass the current state-of-the-art methods for binding affinity prediction. The high accuracy of PLAsformer's predictive performance, along with its excellent generalization ability, is clearly demonstrated by these findings.
Collapse
Affiliation(s)
- Gaili Li
- School of Information science and Engineering, Lanzhou University, Lanzhou 730000, China.
| | - Yongna Yuan
- School of Information science and Engineering, Lanzhou University, Lanzhou 730000, China.
| | - Ruisheng Zhang
- School of Information science and Engineering, Lanzhou University, Lanzhou 730000, China.
| |
Collapse
|
15
|
Wang J, Chen C, Yao G, Ding J, Wang L, Jiang H. Intelligent Protein Design and Molecular Characterization Techniques: A Comprehensive Review. Molecules 2023; 28:7865. [PMID: 38067593 PMCID: PMC10707872 DOI: 10.3390/molecules28237865] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2023] [Revised: 11/13/2023] [Accepted: 11/23/2023] [Indexed: 12/18/2023] Open
Abstract
In recent years, the widespread application of artificial intelligence algorithms in protein structure, function prediction, and de novo protein design has significantly accelerated the process of intelligent protein design and led to many noteworthy achievements. This advancement in protein intelligent design holds great potential to accelerate the development of new drugs, enhance the efficiency of biocatalysts, and even create entirely new biomaterials. Protein characterization is the key to the performance of intelligent protein design. However, there is no consensus on the most suitable characterization method for intelligent protein design tasks. This review describes the methods, characteristics, and representative applications of traditional descriptors, sequence-based and structure-based protein characterization. It discusses their advantages, disadvantages, and scope of application. It is hoped that this could help researchers to better understand the limitations and application scenarios of these methods, and provide valuable references for choosing appropriate protein characterization techniques for related research in the field, so as to better carry out protein research.
Collapse
Affiliation(s)
| | | | | | - Junjie Ding
- State Key Laboratory of NBC Protection for Civilian, Beijing 102205, China; (J.W.); (C.C.); (G.Y.)
| | - Liangliang Wang
- State Key Laboratory of NBC Protection for Civilian, Beijing 102205, China; (J.W.); (C.C.); (G.Y.)
| | - Hui Jiang
- State Key Laboratory of NBC Protection for Civilian, Beijing 102205, China; (J.W.); (C.C.); (G.Y.)
| |
Collapse
|
16
|
Dong T, Yang Z, Zhou J, Chen CYC. Equivariant Flexible Modeling of the Protein-Ligand Binding Pose with Geometric Deep Learning. J Chem Theory Comput 2023; 19:8446-8459. [PMID: 37938978 DOI: 10.1021/acs.jctc.3c00273] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2023]
Abstract
Flexible modeling of the protein-ligand complex structure is a fundamental challenge for in silico drug development. Recent studies have improved commonly used docking tools by incorporating extra-deep learning-based steps. However, such strategies limit their accuracy and efficiency because they retain massive sampling pressure and lack consideration for flexible biomolecular changes. In this study, we propose FlexPose, a geometric graph network capable of direct flexible modeling of complex structures in Euclidean space without the following conventional sampling and scoring strategies. Our model adopts two key designs: scalar-vector dual feature representation and SE(3)-equivariant network, to manage dynamic structural changes, as well as two strategies: conformation-aware pretraining and weakly supervised learning, to boost model generalizability in unseen chemical space. Benefiting from these paradigms, our model dramatically outperforms all tested popular docking tools and recently advanced deep learning methods, especially in tasks involving protein conformation changes. We further investigate the impact of protein and ligand similarity on the model performance with two conformation-aware strategies. Moreover, FlexPose provides an affinity estimation and model confidence for postanalysis.
Collapse
Affiliation(s)
- Tiejun Dong
- Intelligent Medical Research Center, School of Intelligent Systems Engineering, Sun Yat-sen University, Shenzhen, Guangdong 510275, China
| | - Ziduo Yang
- Intelligent Medical Research Center, School of Intelligent Systems Engineering, Sun Yat-sen University, Shenzhen, Guangdong 510275, China
| | - Jun Zhou
- Intelligent Medical Research Center, School of Intelligent Systems Engineering, Sun Yat-sen University, Shenzhen, Guangdong 510275, China
| | - Calvin Yu-Chian Chen
- Intelligent Medical Research Center, School of Intelligent Systems Engineering, Sun Yat-sen University, Shenzhen, Guangdong 510275, China
- AI for Science (AI4S)-Preferred Program, Peking University Shenzhen Graduate School, Shenzhen, Guangdong 518055, China
- School of Electronic and Computer Engineering, Peking University Shenzhen Graduate School, Shenzhen, Guangdong 518055, China
- Department of Medical Research, China Medical University Hospital, Taichung 40447, Taiwan
- Department of Bioinformatics and Medical Engineering, Asia University, Taichung 41354, Taiwan
| |
Collapse
|
17
|
McGibbon M, Shave S, Dong J, Gao Y, Houston DR, Xie J, Yang Y, Schwaller P, Blay V. From intuition to AI: evolution of small molecule representations in drug discovery. Brief Bioinform 2023; 25:bbad422. [PMID: 38033290 PMCID: PMC10689004 DOI: 10.1093/bib/bbad422] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2023] [Revised: 10/13/2023] [Accepted: 11/01/2023] [Indexed: 12/02/2023] Open
Abstract
Within drug discovery, the goal of AI scientists and cheminformaticians is to help identify molecular starting points that will develop into safe and efficacious drugs while reducing costs, time and failure rates. To achieve this goal, it is crucial to represent molecules in a digital format that makes them machine-readable and facilitates the accurate prediction of properties that drive decision-making. Over the years, molecular representations have evolved from intuitive and human-readable formats to bespoke numerical descriptors and fingerprints, and now to learned representations that capture patterns and salient features across vast chemical spaces. Among these, sequence-based and graph-based representations of small molecules have become highly popular. However, each approach has strengths and weaknesses across dimensions such as generality, computational cost, inversibility for generative applications and interpretability, which can be critical in informing practitioners' decisions. As the drug discovery landscape evolves, opportunities for innovation continue to emerge. These include the creation of molecular representations for high-value, low-data regimes, the distillation of broader biological and chemical knowledge into novel learned representations and the modeling of up-and-coming therapeutic modalities.
Collapse
Affiliation(s)
- Miles McGibbon
- Institute of Quantitative Biology, Biochemistry and Biotechnology, University of Edinburgh, Edinburgh, Scotland EH9 3BF, United Kingdom
| | - Steven Shave
- Institute of Quantitative Biology, Biochemistry and Biotechnology, University of Edinburgh, Edinburgh, Scotland EH9 3BF, United Kingdom
| | - Jie Dong
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha, 410013, China
| | - Yumiao Gao
- Institute of Quantitative Biology, Biochemistry and Biotechnology, University of Edinburgh, Edinburgh, Scotland EH9 3BF, United Kingdom
| | - Douglas R Houston
- Institute of Quantitative Biology, Biochemistry and Biotechnology, University of Edinburgh, Edinburgh, Scotland EH9 3BF, United Kingdom
| | - Jiancong Xie
- Key Laboratory of Machine Intelligence and Advanced Computing, Sun Yat-Sen University, Guangzhou, 510000, China
| | - Yuedong Yang
- Key Laboratory of Machine Intelligence and Advanced Computing, Sun Yat-Sen University, Guangzhou, 510000, China
| | - Philippe Schwaller
- Laboratory of Artificial Chemical Intelligence (LIAC), Institut des Sciences et Ingénierie Chimiques, Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
| | - Vincent Blay
- Institute of Quantitative Biology, Biochemistry and Biotechnology, University of Edinburgh, Edinburgh, Scotland EH9 3BF, United Kingdom
| |
Collapse
|
18
|
Libouban PY, Aci-Sèche S, Gómez-Tamayo JC, Tresadern G, Bonnet P. The Impact of Data on Structure-Based Binding Affinity Predictions Using Deep Neural Networks. Int J Mol Sci 2023; 24:16120. [PMID: 38003312 PMCID: PMC10671244 DOI: 10.3390/ijms242216120] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2023] [Revised: 10/30/2023] [Accepted: 11/01/2023] [Indexed: 11/26/2023] Open
Abstract
Artificial intelligence (AI) has gained significant traction in the field of drug discovery, with deep learning (DL) algorithms playing a crucial role in predicting protein-ligand binding affinities. Despite advancements in neural network architectures, system representation, and training techniques, the performance of DL affinity prediction has reached a plateau, prompting the question of whether it is truly solved or if the current performance is overly optimistic and reliant on biased, easily predictable data. Like other DL-related problems, this issue seems to stem from the training and test sets used when building the models. In this work, we investigate the impact of several parameters related to the input data on the performance of neural network affinity prediction models. Notably, we identify the size of the binding pocket as a critical factor influencing the performance of our statistical models; furthermore, it is more important to train a model with as much data as possible than to restrict the training to only high-quality datasets. Finally, we also confirm the bias in the typically used current test sets. Therefore, several types of evaluation and benchmarking are required to understand models' decision-making processes and accurately compare the performance of models.
Collapse
Affiliation(s)
- Pierre-Yves Libouban
- Institute of Organic and Analytical Chemistry (ICOA), UMR7311, Université d’Orléans, CNRS, Pôle de Chimie rue de Chartres, 45067 Orléans, CEDEX 2, France; (P.-Y.L.); (S.A.-S.)
| | - Samia Aci-Sèche
- Institute of Organic and Analytical Chemistry (ICOA), UMR7311, Université d’Orléans, CNRS, Pôle de Chimie rue de Chartres, 45067 Orléans, CEDEX 2, France; (P.-Y.L.); (S.A.-S.)
| | - Jose Carlos Gómez-Tamayo
- Computational Chemistry, Janssen Research & Development, Janssen Pharmaceutica N. V., B-2340 Beerse, Belgium; (J.C.G.-T.); (G.T.)
| | - Gary Tresadern
- Computational Chemistry, Janssen Research & Development, Janssen Pharmaceutica N. V., B-2340 Beerse, Belgium; (J.C.G.-T.); (G.T.)
| | - Pascal Bonnet
- Institute of Organic and Analytical Chemistry (ICOA), UMR7311, Université d’Orléans, CNRS, Pôle de Chimie rue de Chartres, 45067 Orléans, CEDEX 2, France; (P.-Y.L.); (S.A.-S.)
| |
Collapse
|
19
|
Li S, Tian T, Zhang Z, Zou Z, Zhao D, Zeng J. PocketAnchor: Learning structure-based pocket representations for protein-ligand interaction prediction. Cell Syst 2023; 14:692-705.e6. [PMID: 37516103 DOI: 10.1016/j.cels.2023.05.005] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2022] [Revised: 11/25/2022] [Accepted: 05/19/2023] [Indexed: 07/31/2023]
Abstract
Protein-ligand interactions are essential for cellular activities and drug discovery processes. Appropriately and effectively representing protein features is of vital importance for developing computational approaches, especially data-driven methods, for predicting protein-ligand interactions. However, existing approaches may not fully investigate the features of the ligand-occupying regions in the protein pockets. Here, we design a structure-based protein representation method, named PocketAnchor, for capturing the local environmental and spatial features of protein pockets to facilitate protein-ligand interaction-related learning tasks. We define "anchors" as probe points reaching into the cavities and those located near the surface of proteins, and we design a specific message passing strategy for gathering local information from the atoms and surface neighboring these anchors. Comprehensive evaluation of our method demonstrated its successful applications in pocket detection and binding affinity prediction, which indicated that our anchor-based approach can provide effective protein feature representations for improving the prediction of protein-ligand interactions.
Collapse
Affiliation(s)
- Shuya Li
- Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing 100084, China
| | - Tingzhong Tian
- Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing 100084, China
| | - Ziting Zhang
- Department of Automation, Tsinghua University, Beijing 100084, China; MOE Key Laboratory of Bioinformatics, Tsinghua University, Beijing 100084, China
| | - Ziheng Zou
- Silexon AI Technology, Nanjing, Jiangsu Province 210023, China
| | - Dan Zhao
- Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing 100084, China.
| | - Jianyang Zeng
- Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing 100084, China.
| |
Collapse
|
20
|
Zhang H, Saravanan KM, Zhang JZH. DeepBindGCN: Integrating Molecular Vector Representation with Graph Convolutional Neural Networks for Protein-Ligand Interaction Prediction. Molecules 2023; 28:4691. [PMID: 37375246 PMCID: PMC10301867 DOI: 10.3390/molecules28124691] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2023] [Revised: 06/08/2023] [Accepted: 06/09/2023] [Indexed: 06/29/2023] Open
Abstract
The core of large-scale drug virtual screening is to select the binders accurately and efficiently with high affinity from large libraries of small molecules in which non-binders are usually dominant. The binding affinity is significantly influenced by the protein pocket, ligand spatial information, and residue types/atom types. Here, we used the pocket residues or ligand atoms as the nodes and constructed edges with the neighboring information to comprehensively represent the protein pocket or ligand information. Moreover, the model with pre-trained molecular vectors performed better than the one-hot representation. The main advantage of DeepBindGCN is that it is independent of docking conformation, and concisely keeps the spatial information and physical-chemical features. Using TIPE3 and PD-L1 dimer as proof-of-concept examples, we proposed a screening pipeline integrating DeepBindGCN and other methods to identify strong-binding-affinity compounds. It is the first time a non-complex-dependent model has achieved a root mean square error (RMSE) value of 1.4190 and Pearson r value of 0.7584 in the PDBbind v.2016 core set, respectively, thereby showing a comparable prediction power with the state-of-the-art affinity prediction models that rely upon the 3D complex. DeepBindGCN provides a powerful tool to predict the protein-ligand interaction and can be used in many important large-scale virtual screening application scenarios.
Collapse
Affiliation(s)
- Haiping Zhang
- Shenzhen Institute of Synthetic Biology, Faculty of Synthetic Biology, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China
| | - Konda Mani Saravanan
- Department of Biotechnology, Bharath Institute of Higher Education and Research, Chennai 600073, Tamil Nadu, India;
| | - John Z. H. Zhang
- Shenzhen Institute of Synthetic Biology, Faculty of Synthetic Biology, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China
- School of Chemistry and Molecular Engineering, East China Normal University, Shanghai 200062, China
- NYU-ECNU Center for Computational Chemistry at NYU Shanghai, Shanghai 200062, China
| |
Collapse
|
21
|
Yang Z, Zhong W, Lv Q, Dong T, Yu-Chian Chen C. Geometric Interaction Graph Neural Network for Predicting Protein-Ligand Binding Affinities from 3D Structures (GIGN). J Phys Chem Lett 2023; 14:2020-2033. [PMID: 36794930 DOI: 10.1021/acs.jpclett.2c03906] [Citation(s) in RCA: 24] [Impact Index Per Article: 24.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/18/2023]
Abstract
Predicting protein-ligand binding affinities (PLAs) is a core problem in drug discovery. Recent advances have shown great potential in applying machine learning (ML) for PLA prediction. However, most of them omit the 3D structures of complexes and physical interactions between proteins and ligands, which are considered essential to understanding the binding mechanism. This paper proposes a geometric interaction graph neural network (GIGN) that incorporates 3D structures and physical interactions for predicting protein-ligand binding affinities. Specifically, we design a heterogeneous interaction layer that unifies covalent and noncovalent interactions into the message passing phase to learn node representations more effectively. The heterogeneous interaction layer also follows fundamental biological laws, including invariance to translations and rotations of the complexes, thus avoiding expensive data augmentation strategies. GIGN achieves state-of-the-art performance on three external test sets. Moreover, by visualizing learned representations of protein-ligand complexes, we show that the predictions of GIGN are biologically meaningful.
Collapse
Affiliation(s)
- Ziduo Yang
- Intelligent Medical Research Center, School of Intelligent Systems Engineering, Sun Yat-sen University, Shenzhen, Guangdong 510275, China
| | - Weihe Zhong
- Intelligent Medical Research Center, School of Intelligent Systems Engineering, Sun Yat-sen University, Shenzhen, Guangdong 510275, China
| | - Qiujie Lv
- Intelligent Medical Research Center, School of Intelligent Systems Engineering, Sun Yat-sen University, Shenzhen, Guangdong 510275, China
| | - Tiejun Dong
- Intelligent Medical Research Center, School of Intelligent Systems Engineering, Sun Yat-sen University, Shenzhen, Guangdong 510275, China
| | - Calvin Yu-Chian Chen
- Intelligent Medical Research Center, School of Intelligent Systems Engineering, Sun Yat-sen University, Shenzhen, Guangdong 510275, China
- Department of Medical Research, China Medical University Hospital, Taichung 40447, Taiwan
- Department of Bioinformatics and Medical Engineering, Asia University, Taichung 41354, Taiwan
| |
Collapse
|
22
|
Zou Y, Wang R, Du M, Wang X, Xu D. Identifying Protein-Ligand Interactions via a Novel Distance Self-Feedback Biomolecular Interaction Network. J Phys Chem B 2023; 127:899-911. [PMID: 36657025 DOI: 10.1021/acs.jpcb.2c07592] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/20/2023]
Abstract
Efficient and accurate characterizations of protein-ligand interactions are key to understanding biology at the molecular level. They are particularly useful in pharmaceutical industry applications. They are usually computationally demanding for those widely applied dynamics-based methods in identifying important residues or calculating ligand binding free energy. In this work, we proposed a graph deep learning (DL) framework, namely, the distance self-feedback biomolecular interaction network (DSBIN), in which the relationship between the complex structure and binding affinity can be established by means of a carefully designed distance self-feedback module and interaction layer. Our model can directly provide a quantitative evaluation of inhibitor binding affinities (pKd). More importantly, the DSBIN model efficiently identifies key interactions for inhibitor binding and thus intrinsically bears the interpretability. Its generalization performance was further verified using 1405 unseen structures. The predicted binding free energies' deviations were calculated to be less than 1.37 kcal/mol for more than 55% structures. Moreover, we also compared the DSBIN model with a commonly used theoretical method in calculating the substrate binding free energy, MM/GBSA. Our results show that the current DL model has generally better performance in predicting the binding free energy. For a specific complex system, mannopentaose/TmCBM27, the DSBIN predicted binding free energy is -8.21 kcal/mol, which is very close to experimentally measured -7.76 kcal/mol and MM/GBSA calculated -7.16 kcal/mol. Meanwhile, all important aromatic residues around the binding pocket can be identified by our DL model. Considering the accuracy and efficiency of the newly developed DL model, it may be very helpful in the field of drug design and molecular recognition.
Collapse
Affiliation(s)
- Yurong Zou
- MOE Key Laboratory of Green Chemistry and Technology, College of Chemistry, Sichuan University, Chengdu, Sichuan610064, PR China
| | - Ruihan Wang
- MOE Key Laboratory of Green Chemistry and Technology, College of Chemistry, Sichuan University, Chengdu, Sichuan610064, PR China
| | - Meng Du
- MOE Key Laboratory of Green Chemistry and Technology, College of Chemistry, Sichuan University, Chengdu, Sichuan610064, PR China
| | - Xin Wang
- MOE Key Laboratory of Green Chemistry and Technology, College of Chemistry, Sichuan University, Chengdu, Sichuan610064, PR China
| | - Dingguo Xu
- MOE Key Laboratory of Green Chemistry and Technology, College of Chemistry, Sichuan University, Chengdu, Sichuan610064, PR China.,Research Center for Materials Genome Engineering, Sichuan University, Chengdu, Sichuan610065, PR China
| |
Collapse
|
23
|
Yue ZX, Yan TC, Xu HQ, Liu YH, Hong YF, Chen GX, Xie T, Tao L. A systematic review on the state-of-the-art strategies for protein representation. Comput Biol Med 2023; 152:106440. [PMID: 36543002 DOI: 10.1016/j.compbiomed.2022.106440] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2022] [Revised: 12/08/2022] [Accepted: 12/15/2022] [Indexed: 12/23/2022]
Abstract
The study of drug-target protein interaction is a key step in drug research. In recent years, machine learning techniques have become attractive for research, including drug research, due to their automated nature, predictive power, and expected efficiency. Protein representation is a key step in the study of drug-target protein interaction by machine learning, which plays a fundamental role in the ultimate accomplishment of accurate research. With the progress of machine learning, protein representation methods have gradually attracted attention and have consequently developed rapidly. Therefore, in this review, we systematically classify current protein representation methods, comprehensively review them, and discuss the latest advances of interest. According to the information extraction methods and information sources, these representation methods are generally divided into structure and sequence-based representation methods. Each primary class can be further divided into specific subcategories. As for the particular representation methods involve both traditional and the latest approaches. This review contains a comprehensive assessment of the various methods which researchers can use as a reference for their specific protein-related research requirements, including drug research.
Collapse
Affiliation(s)
- Zi-Xuan Yue
- Key Laboratory of Elemene Class Anti-cancer Chinese Medicines, School of Pharmacy, Hangzhou Normal University, Hangzhou, 311121, China
| | - Tian-Ci Yan
- Key Laboratory of Elemene Class Anti-cancer Chinese Medicines, School of Pharmacy, Hangzhou Normal University, Hangzhou, 311121, China
| | - Hong-Quan Xu
- Key Laboratory of Elemene Class Anti-cancer Chinese Medicines, School of Pharmacy, Hangzhou Normal University, Hangzhou, 311121, China
| | - Yu-Hong Liu
- Key Laboratory of Elemene Class Anti-cancer Chinese Medicines, School of Pharmacy, Hangzhou Normal University, Hangzhou, 311121, China
| | - Yan-Feng Hong
- Key Laboratory of Elemene Class Anti-cancer Chinese Medicines, School of Pharmacy, Hangzhou Normal University, Hangzhou, 311121, China
| | - Gong-Xing Chen
- Key Laboratory of Elemene Class Anti-cancer Chinese Medicines, School of Pharmacy, Hangzhou Normal University, Hangzhou, 311121, China
| | - Tian Xie
- Key Laboratory of Elemene Class Anti-cancer Chinese Medicines, School of Pharmacy, Hangzhou Normal University, Hangzhou, 311121, China.
| | - Lin Tao
- Key Laboratory of Elemene Class Anti-cancer Chinese Medicines, School of Pharmacy, Hangzhou Normal University, Hangzhou, 311121, China.
| |
Collapse
|
24
|
Meli R, Morris GM, Biggin PC. Scoring Functions for Protein-Ligand Binding Affinity Prediction using Structure-Based Deep Learning: A Review. FRONTIERS IN BIOINFORMATICS 2022; 2:885983. [PMID: 36187180 PMCID: PMC7613667 DOI: 10.3389/fbinf.2022.885983] [Citation(s) in RCA: 32] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2022] [Accepted: 05/11/2022] [Indexed: 01/01/2023] Open
Abstract
The rapid and accurate in silico prediction of protein-ligand binding free energies or binding affinities has the potential to transform drug discovery. In recent years, there has been a rapid growth of interest in deep learning methods for the prediction of protein-ligand binding affinities based on the structural information of protein-ligand complexes. These structure-based scoring functions often obtain better results than classical scoring functions when applied within their applicability domain. Here we review structure-based scoring functions for binding affinity prediction based on deep learning, focussing on different types of architectures, featurization strategies, data sets, methods for training and evaluation, and the role of explainable artificial intelligence in building useful models for real drug-discovery applications.
Collapse
Affiliation(s)
- Rocco Meli
- Department of Biochemistry, University of Oxford, Oxford, United Kingdom
| | - Garrett M. Morris
- Department of Statistics, University of Oxford, Oxford, United Kingdom
| | - Philip C. Biggin
- Department of Biochemistry, University of Oxford, Oxford, United Kingdom
| |
Collapse
|