1
|
Fusti-Molnar L. Integrating Quantum Mechanics into Protein-Ligand Docking: Toward Higher Accuracy and Reliability. RESEARCH SQUARE 2024:rs.3.rs-5433993. [PMID: 39678339 PMCID: PMC11643324 DOI: 10.21203/rs.3.rs-5433993/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/17/2024]
Abstract
I introduce two new methods, QFVina and QFVinardo, for protein-ligand docking that leverage precomputed high-quality conformational libraries with QM-optimized geometries and ab initio DFT-D4-based conformational rankings and strain energies. These methods provide greater accuracy in docking-based virtual screening by addressing the inaccuracies in intramolecular relative energies of conformations, a critical component often misrepresented in flexible ligand docking calculations. I demonstrate that numerous force field-based methods widely used today exhibit substantial errors in conformational relative energies, and that it is unrealistic to expect better accuracy from the faster scoring functions typically employed in docking. Consistent with these findings, I show that traditional flexible ligand docking often produces geometries with significant strain energies and large deviations, with magnitudes comparable to the protein-ligand binding energies themselves and much larger than the differences we aim to estimate in docking hitlists. By using physically realistic ligand conformations with accurate strain energies in the scoring function, QFVina and QFVinardo produce markedly different docking results, even with the same docking parameters and scoring functions for protein-ligand interaction energies. I analyzed these differences in docking hitlists and selected protein-ligand interactions using three protein targets from COVID-19 research.
Collapse
|
2
|
Zhang D, Meng Q, Guo F. Incorporating Water Molecules into Highly Accurate Binding Affinity Prediction for Proteins and Ligands. Int J Mol Sci 2024; 25:12676. [PMID: 39684398 DOI: 10.3390/ijms252312676] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2024] [Revised: 11/16/2024] [Accepted: 11/24/2024] [Indexed: 12/18/2024] Open
Abstract
In the binding process between proteins and ligand molecules, water molecules play a pivotal role by forming hydrogen bonds that enable proteins and ligand molecules to bind more strongly. However, current methodologies for predicting binding affinity overlook the importance of water molecules. Therefore, we developed a model called GraphWater-Net, specifically designed for predicting protein-ligand binding affinity, by incorporating water molecules. GraphWater-Net employs topological structures to represent protein atoms, ligand atoms and water molecules, and their interactions. Leveraging the Graphormer network, the model extracts interaction features between nodes within the topology, alongside the interaction features of edges and nodes. Subsequently, it generates embeddings with attention weights, inputs them into a Softmax function for regression prediction, and ultimately outputs the predicted binding affinity value. Experimental results on the Comparative Assessment of Scoring Functions (CASF) 2016 test set show that the introduction of water molecules into the complex significantly improves the prediction performance of the proposed model for protein and ligand binding affinity. Specifically, the Pearson correlation coefficient (Rp) exceeds that of current state-of-the-art methods by a margin of 0.022 to 0.129. By integrating water molecules, GraphWater-Net has the potential to facilitate the rational design of protein-ligand interactions and aid in drug discovery.
Collapse
Affiliation(s)
- Diya Zhang
- School of Computer Science and Engineering, Central South University, Changsha 410000, China
| | - Qiaozhen Meng
- School of Computer Science, Xiangtan University, Xiangtan 411105, China
| | - Fei Guo
- School of Computer Science and Engineering, Central South University, Changsha 410000, China
| |
Collapse
|
3
|
Dong C, Huang YP, Lin X, Zhang H, Gao YQ. DSDPFlex: Flexible-Receptor Docking with GPU Acceleration. J Chem Inf Model 2024; 64:8537-8548. [PMID: 39514506 DOI: 10.1021/acs.jcim.4c01715] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2024]
Abstract
Molecular docking is an essential tool in structure-based drug discovery, widely utilized to model ligand-protein interactions and enrich potential hits. Among the different docking strategies, semiflexible docking (rigid-receptor and flexible-ligand model) is the most popular, benefiting from its balance of docking accuracy and speed. However, this approach ignores the conformational changes of proteins and hence demands suitable protein conformations as input. When the binding interaction adheres to an induced-fit model, flexible methods such as molecular dynamics simulation can be utilized, but they are computationally demanding. To balance between speed and accuracy, the flexible docking approach is an effective choice, as exemplified by AutoDock Vina and AutoDockFR, which treat selected protein side chains as flexible parts. However, the efficiency of flexible docking methods is yet to be improved for virtual screening usage. In this article, we introduce DSDPFlex, an improved flexible-receptor docking method accelerated by GPU parallelization. Beyond acceleration, optimizations with respect to sampling, scoring, and search space are implemented in DSDPFlex to further improve its capability in flexible tasks. In cross-docking evaluation, DSDPFlex demonstrates superior accuracy compared to AutoDock Vina and is 100 times faster than Vina in flexible-receptor tasks. We also show the advantage of flexible-receptor methods on suboptimal pockets and validate the advantage of DSDPFlex in screening on apo and AlphaFold2-predicted structures. With improvements in both efficiency and accuracy, DSDPFlex is expected to hold potential in future docking-based studies.
Collapse
Affiliation(s)
- Chengwei Dong
- College of Chemistry and Molecular Engineering, Peking University, Beijing 100871, China
| | - Yu-Peng Huang
- College of Chemistry and Molecular Engineering, Peking University, Beijing 100871, China
| | - Xiaohan Lin
- College of Chemistry and Molecular Engineering, Peking University, Beijing 100871, China
| | - Hong Zhang
- Changping Laboratory, Yard 28, Science Park Road, Changping District, Beijing 102200, China
| | - Yi Qin Gao
- College of Chemistry and Molecular Engineering, Peking University, Beijing 100871, China
- Changping Laboratory, Yard 28, Science Park Road, Changping District, Beijing 102200, China
- Biomedical Pioneering Innovation Center, Peking University, Beijing 100871, China
| |
Collapse
|
4
|
Li B, Tan K, Lao AR, Wang H, Zheng H, Zhang L. A comprehensive review of artificial intelligence for pharmacology research. Front Genet 2024; 15:1450529. [PMID: 39290983 PMCID: PMC11405247 DOI: 10.3389/fgene.2024.1450529] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2024] [Accepted: 08/26/2024] [Indexed: 09/19/2024] Open
Abstract
With the innovation and advancement of artificial intelligence, more and more artificial intelligence techniques are employed in drug research, biomedical frontier research, and clinical medicine practice, especially, in the field of pharmacology research. Thus, this review focuses on the applications of artificial intelligence in drug discovery, compound pharmacokinetic prediction, and clinical pharmacology. We briefly introduced the basic knowledge and development of artificial intelligence, presented a comprehensive review, and then summarized the latest studies and discussed the strengths and limitations of artificial intelligence models. Additionally, we highlighted several important studies and pointed out possible research directions.
Collapse
Affiliation(s)
- Bing Li
- College of Computer Science, Sichuan University, Chengdu, China
| | - Kan Tan
- College of Computer Science, Sichuan University, Chengdu, China
| | - Angelyn R Lao
- Department of Mathematics and Statistics, De La Salle University, Manila, Philippines
| | - Haiying Wang
- School of Computing, Ulster University, Belfast, United Kingdom
| | - Huiru Zheng
- School of Computing, Ulster University, Belfast, United Kingdom
| | - Le Zhang
- College of Computer Science, Sichuan University, Chengdu, China
| |
Collapse
|
5
|
Prat A, Abdel Aty H, Bastas O, Kamuntavičius G, Paquet T, Norvaišas P, Gasparotto P, Tal R. HydraScreen: A Generalizable Structure-Based Deep Learning Approach to Drug Discovery. J Chem Inf Model 2024; 64:5817-5831. [PMID: 39037942 DOI: 10.1021/acs.jcim.4c00481] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/24/2024]
Abstract
We propose HydraScreen, a deep-learning framework for safe and robust accelerated drug discovery. HydraScreen utilizes a state-of-the-art 3D convolutional neural network designed for the effective representation of molecular structures and interactions in protein-ligand binding. We designed an end-to-end pipeline for high-throughput screening and lead optimization, targeting applications in structure-based drug design. We assessed our approach using established public benchmarks based on the CASF-2016 core set, achieving top-tier results in affinity and pose prediction (Pearson's r = 0.86, RMSE = 1.15, Top-1 = 0.95). We introduced a novel approach for interaction profiling, aimed at detecting potential biases within both the model and data sets. This approach not only enhanced interpretability but also reinforced the impartiality of our methodology. Finally, we demonstrated HydraScreen's ability to generalize effectively across novel proteins and ligands through a temporal split. We also provide insights into potential avenues for future development aimed at enhancing the robustness of machine learning scoring functions. HydraScreen (accessible at http://hydrascreen.ro5.ai/paper) provides a user-friendly GUI and a public API, facilitating the easy-access assessment of protein-ligand complexes.
Collapse
Affiliation(s)
- Alvaro Prat
- AI Chemistry, Ro5 2801 Gateway Drive, Irving, 75063 Texas, United States
| | - Hisham Abdel Aty
- AI Chemistry, Ro5 2801 Gateway Drive, Irving, 75063 Texas, United States
| | - Orestis Bastas
- AI Chemistry, Ro5 2801 Gateway Drive, Irving, 75063 Texas, United States
| | | | - Tanya Paquet
- AI Chemistry, Ro5 2801 Gateway Drive, Irving, 75063 Texas, United States
| | - Povilas Norvaišas
- AI Chemistry, Ro5 2801 Gateway Drive, Irving, 75063 Texas, United States
| | - Piero Gasparotto
- AI Chemistry, Ro5 2801 Gateway Drive, Irving, 75063 Texas, United States
| | - Roy Tal
- AI Chemistry, Ro5 2801 Gateway Drive, Irving, 75063 Texas, United States
| |
Collapse
|
6
|
Kairys V, Baranauskiene L, Kazlauskiene M, Zubrienė A, Petrauskas V, Matulis D, Kazlauskas E. Recent advances in computational and experimental protein-ligand affinity determination techniques. Expert Opin Drug Discov 2024; 19:649-670. [PMID: 38715415 DOI: 10.1080/17460441.2024.2349169] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2024] [Accepted: 04/25/2024] [Indexed: 05/22/2024]
Abstract
INTRODUCTION Modern drug discovery revolves around designing ligands that target the chosen biomolecule, typically proteins. For this, the evaluation of affinities of putative ligands is crucial. This has given rise to a multitude of dedicated computational and experimental methods that are constantly being developed and improved. AREAS COVERED In this review, the authors reassess both the industry mainstays and the newest trends among the methods for protein - small-molecule affinity determination. They discuss both computational affinity predictions and experimental techniques, describing their basic principles, main limitations, and advantages. Together, this serves as initial guide to the currently most popular and cutting-edge ligand-binding assays employed in rational drug design. EXPERT OPINION The affinity determination methods continue to develop toward miniaturization, high-throughput, and in-cell application. Moreover, the availability of data analysis tools has been constantly increasing. Nevertheless, cross-verification of data using at least two different techniques and careful result interpretation remain of utmost importance.
Collapse
Affiliation(s)
- Visvaldas Kairys
- Department of Bioinformatics, Institute of Biotechnology, Life Sciences Center, Vilnius University, Vilnius, Lithuania
| | - Lina Baranauskiene
- Department of Biothermodynamics and Drug Design, Institute of Biotechnology, Life Sciences Center, Vilnius University, Vilnius, Lithuania
| | | | - Asta Zubrienė
- Department of Biothermodynamics and Drug Design, Institute of Biotechnology, Life Sciences Center, Vilnius University, Vilnius, Lithuania
| | - Vytautas Petrauskas
- Department of Biothermodynamics and Drug Design, Institute of Biotechnology, Life Sciences Center, Vilnius University, Vilnius, Lithuania
| | - Daumantas Matulis
- Department of Biothermodynamics and Drug Design, Institute of Biotechnology, Life Sciences Center, Vilnius University, Vilnius, Lithuania
| | - Egidijus Kazlauskas
- Department of Biothermodynamics and Drug Design, Institute of Biotechnology, Life Sciences Center, Vilnius University, Vilnius, Lithuania
| |
Collapse
|
7
|
Qu X, Dong L, Luo D, Si Y, Wang B. Water Network-Augmented Two-State Model for Protein-Ligand Binding Affinity Prediction. J Chem Inf Model 2024; 64:2263-2274. [PMID: 37433009 DOI: 10.1021/acs.jcim.3c00567] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/13/2023]
Abstract
Water network rearrangement from the ligand-unbound state to the ligand-bound state is known to have significant effects on the protein-ligand binding interactions, but most of the current machine learning-based scoring functions overlook these effects. In this study, we endeavor to construct a comprehensive and realistic deep learning model by incorporating water network information into both ligand-unbound and -bound states. In particular, extended connectivity interaction features were integrated into graph representation, and graph transformer operator was employed to extract features of the ligand-unbound and -bound states. Through these efforts, we developed a water network-augmented two-state model called ECIFGraph::HM-Holo-Apo. Our new model exhibits satisfactory performance in terms of scoring, ranking, docking, screening, and reverse screening power tests on the CASF-2016 benchmark. In addition, it can achieve superior performance in large-scale docking-based virtual screening tests on the DEKOIS2.0 data set. Our study highlights that the use of a water network-augmented two-state model can be an effective strategy to bolster the robustness and applicability of machine learning-based scoring functions, particularly for targets with hydrophilic or solvent-exposed binding pockets.
Collapse
Affiliation(s)
- Xiaoyang Qu
- State Key Laboratory of Physical Chemistry of Solid Surfaces and Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, P. R. China
| | - Lina Dong
- State Key Laboratory of Physical Chemistry of Solid Surfaces and Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, P. R. China
| | - Ding Luo
- State Key Laboratory of Physical Chemistry of Solid Surfaces and Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, P. R. China
| | - Yubing Si
- College of Chemistry, Zhengzhou University, Zhengzhou 450001, P. R. China
| | - Binju Wang
- State Key Laboratory of Physical Chemistry of Solid Surfaces and Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, P. R. China
- Innovation Laboratory for Sciences and Technologies of Energy Materials of Fujian Province (IKKEM), Xiamen 361005, P. R. China
| |
Collapse
|
8
|
Mqawass G, Popov P. graphLambda: Fusion Graph Neural Networks for Binding Affinity Prediction. J Chem Inf Model 2024; 64:2323-2330. [PMID: 38366974 DOI: 10.1021/acs.jcim.3c00771] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/19/2024]
Abstract
Predicting the binding affinity of protein-ligand complexes is crucial for computer-aided drug discovery (CADD) and the identification of potential drug candidates. The deep learning-based scoring functions have emerged as promising predictors of binding constants. Building on recent advancements in graph neural networks, we present graphLambda for protein-ligand binding affinity prediction, which utilizes graph convolutional, attention, and isomorphism blocks to enhance the predictive capabilities. The graphLambda model exhibits superior performance across CASF16 and CSAR HiQ NRC benchmarks and demonstrates robustness with respect to different types of train-validation set partitions. The development of graphLambda underscores the potential of graph neural networks in advancing binding affinity prediction models, contributing to more effective CADD methodologies.
Collapse
Affiliation(s)
- Ghaith Mqawass
- Faculty of Computer Science, University of Vienna, Vienna A-1090, Austria
- UniVie Doctoral School Computer Science, University of Vienna, Vienna A-1090, Austria
| | - Petr Popov
- Tetra-d, Rheinweg 9, Schaffhausen 8200, Switzerland
- School of Science, Constructor University Bremen gGmbH, Bremen 28759, Germany
| |
Collapse
|
9
|
Rayka M, Mirzaei M, Mohammad Latifi A. An ensemble-based approach to estimate confidence of predicted protein-ligand binding affinity values. Mol Inform 2024; 43:e202300292. [PMID: 38358080 DOI: 10.1002/minf.202300292] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2023] [Revised: 01/22/2024] [Accepted: 02/02/2024] [Indexed: 02/16/2024]
Abstract
When designing a machine learning-based scoring function, we access a limited number of protein-ligand complexes with experimentally determined binding affinity values, representing only a fraction of all possible protein-ligand complexes. Consequently, it is crucial to report a measure of confidence and quantify the uncertainty in the model's predictions during test time. Here, we adopt the conformal prediction technique to evaluate the confidence of a prediction for each member of the core set of the CASF 2016 benchmark. The conformal prediction technique requires a diverse ensemble of predictors for uncertainty estimation. To this end, we introduce ENS-Score as an ensemble predictor, which includes 30 models with different protein-ligand representation approaches and achieves Pearson's correlation of 0.842 on the core set of the CASF 2016 benchmark. Also, we comprehensively investigate the residual error of each data point to assess the normality behavior of the distribution of the residual errors and their correlation to the structural features of the ligands, such as hydrophobic interactions and halogen bonding. In the end, we provide a local host web application to facilitate the usage of ENS-Score. All codes to repeat results are provided at https://github.com/miladrayka/ENS_Score.
Collapse
Affiliation(s)
- Milad Rayka
- Applied Biotechnology Research Center, Baqiyatallah University of Medical Sciences, Tehran, Iran
| | - Morteza Mirzaei
- Applied Biotechnology Research Center, Baqiyatallah University of Medical Sciences, Tehran, Iran
| | - Ali Mohammad Latifi
- Applied Biotechnology Research Center, Baqiyatallah University of Medical Sciences, Tehran, Iran
| |
Collapse
|
10
|
Zhang Y, Li S, Meng K, Sun S. Machine Learning for Sequence and Structure-Based Protein-Ligand Interaction Prediction. J Chem Inf Model 2024; 64:1456-1472. [PMID: 38385768 DOI: 10.1021/acs.jcim.3c01841] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/23/2024]
Abstract
Developing new drugs is too expensive and time -consuming. Accurately predicting the interaction between drugs and targets will likely change how the drug is discovered. Machine learning-based protein-ligand interaction prediction has demonstrated significant potential. In this paper, computational methods, focusing on sequence and structure to study protein-ligand interactions, are examined. Therefore, this paper starts by presenting an overview of the data sets applied in this area, as well as the various approaches applied for representing proteins and ligands. Then, sequence-based and structure-based classification criteria are subsequently utilized to categorize and summarize both the classical machine learning models and deep learning models employed in protein-ligand interaction studies. Moreover, the evaluation methods and interpretability of these models are proposed. Furthermore, delving into the diverse applications of protein-ligand interaction models in drug research is presented. Lastly, the current challenges and future directions in this field are addressed.
Collapse
Affiliation(s)
- Yunjiang Zhang
- Beijing Key Laboratory for Green Catalysis and Separation, The Faculty of Environment and Life, Beijing University of Technology, Beijing 100124, P. R. China
| | - Shuyuan Li
- Beijing Key Laboratory for Green Catalysis and Separation, The Faculty of Environment and Life, Beijing University of Technology, Beijing 100124, P. R. China
| | - Kong Meng
- Beijing Key Laboratory for Green Catalysis and Separation, The Faculty of Environment and Life, Beijing University of Technology, Beijing 100124, P. R. China
| | - Shaorui Sun
- Beijing Key Laboratory for Green Catalysis and Separation, The Faculty of Environment and Life, Beijing University of Technology, Beijing 100124, P. R. China
| |
Collapse
|
11
|
Dong L, Shi S, Qu X, Luo D, Wang B. Ligand binding affinity prediction with fusion of graph neural networks and 3D structure-based complex graph. Phys Chem Chem Phys 2023; 25:24110-24120. [PMID: 37655493 DOI: 10.1039/d3cp03651k] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/02/2023]
Abstract
Accurate prediction of protein-ligand binding affinity is pivotal for drug design and discovery. Here, we proposed a novel deep fusion graph neural networks framework named FGNN to learn the protein-ligand interactions from the 3D structures of protein-ligand complexes. Unlike 1D sequences for proteins or 2D graphs for ligands, the 3D graph of protein-ligand complex enables the more accurate representations of the protein-ligand interactions. Benchmark studies have shown that our fusion models FGNN can achieve more accurate prediction of binding affinity than any individual algorithm. The advantages of fusion strategies have been demonstrated in terms of expressive power of data, learning efficiency and model interpretability. Our fusion models show satisfactory performances on diverse data sets, demonstrating their generalization ability. Given the good performances in both binding affinity prediction and virtual screening, our fusion models are expected to be practically applied for drug screening and design. Our work highlights the potential of the fusion graph neural network algorithm in solving complex prediction problems in computational biology and chemistry. The fusion graph neural networks (FGNN) model is freely available in https://github.com/LinaDongXMU/FGNN.
Collapse
Affiliation(s)
- Lina Dong
- State Key Laboratory of Physical Chemistry of Solid Surfaces and Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, iChEM, College of Chemistry and Chemical Engineering, Xiamen University, Xiamen, 361005, China.
| | - Shuai Shi
- Department of Algorithm, TuringQ Co., Ltd., Shanghai, 200240, China
| | - Xiaoyang Qu
- State Key Laboratory of Physical Chemistry of Solid Surfaces and Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, iChEM, College of Chemistry and Chemical Engineering, Xiamen University, Xiamen, 361005, China.
| | - Ding Luo
- State Key Laboratory of Physical Chemistry of Solid Surfaces and Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, iChEM, College of Chemistry and Chemical Engineering, Xiamen University, Xiamen, 361005, China.
| | - Binju Wang
- State Key Laboratory of Physical Chemistry of Solid Surfaces and Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, iChEM, College of Chemistry and Chemical Engineering, Xiamen University, Xiamen, 361005, China.
- Innovation Laboratory for Sciences and Technologies of Energy Materials of Fujian Province (IKKEM), Xiamen, 361005, China
| |
Collapse
|
12
|
Javali PS, Thirumurugan K. Embelin targets PI3K/AKT and MAPK in age-related ulcerative colitis: an integrated approach of microarray analysis, network pharmacology, molecular docking, and molecular dynamics. J Biomol Struct Dyn 2023; 42:10114-10128. [PMID: 37691456 DOI: 10.1080/07391102.2023.2255674] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2023] [Accepted: 08/30/2023] [Indexed: 09/12/2023]
Abstract
Vaibhdang, an Ayurvedic treatment for Crohn's and UC, has been used for centuries. The main component of Vaibhdang is embelin derived from Embelia ribes. However, the pharmacological and molecular mechanisms of embelin in UC remain unclear. This study investigated the molecular targets and mechanisms of action of embelin in UC using microarray analysis, network pharmacology, molecular docking, and molecular dynamics simulations. Embelin targets were obtained by Swiss Target, TargetNet, STITCH, ChEMBL, and TCMSP. Ulcerative colitis targets were mapped using DisGenNET, Genecards, TCMSP, Therapeutic targets, and GEO databases (GSE87466). Co-targets between ulcerative colitis and embelin were identified, and a PPI network was constructed using the STRING database. To identify the core targets, we used Cytoscape to analyze the topology of the PPI network. There were 545 effective Embelin targets and 5171 effective ulcerative colitis targets, including 1470 DEG targets. ShinyGo and AutoDock were used to analyze GO and KEGG enrichment pathways and docking studies, respectively. Venn diagram analysis revealed 327 core targets of embelin in UC. An enrichment study showed that embelin is involved in PI3K-AKT, MAPK, RAS, and chemokine signalling. The top ten core targets docked with embelin and AKT1, MAPK1, and SRC complexes were utilized as representations and simulated using GROMACS for 100 ns. A comparison of native proteins and their complex interactions with embelin revealed that embelin might act on various PI3K/AKT and MAPK targets to treat ulcerative colitis. This study provides insights into the molecular targets and mechanisms of action of embelin against ulcerative colitis.Communicated by Ramaswamy H. Sarma.
Collapse
Affiliation(s)
- Prashanth S Javali
- Structural Biology Lab, Pearl Research Park, School of Biosciences & Technology, Vellore Institute of Technology, Vellore, Tamil Nadu, India
| | - Kavitha Thirumurugan
- Structural Biology Lab, Pearl Research Park, School of Biosciences & Technology, Vellore Institute of Technology, Vellore, Tamil Nadu, India
| |
Collapse
|
13
|
Hagg A, Kirschner KN. Open-Source Machine Learning in Computational Chemistry. J Chem Inf Model 2023; 63:4505-4532. [PMID: 37466636 PMCID: PMC10430767 DOI: 10.1021/acs.jcim.3c00643] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2023] [Indexed: 07/20/2023]
Abstract
The field of computational chemistry has seen a significant increase in the integration of machine learning concepts and algorithms. In this Perspective, we surveyed 179 open-source software projects, with corresponding peer-reviewed papers published within the last 5 years, to better understand the topics within the field being investigated by machine learning approaches. For each project, we provide a short description, the link to the code, the accompanying license type, and whether the training data and resulting models are made publicly available. Based on those deposited in GitHub repositories, the most popular employed Python libraries are identified. We hope that this survey will serve as a resource to learn about machine learning or specific architectures thereof by identifying accessible codes with accompanying papers on a topic basis. To this end, we also include computational chemistry open-source software for generating training data and fundamental Python libraries for machine learning. Based on our observations and considering the three pillars of collaborative machine learning work, open data, open source (code), and open models, we provide some suggestions to the community.
Collapse
Affiliation(s)
- Alexander Hagg
- Institute
of Technology, Resource and Energy-Efficient Engineering (TREE), University of Applied Sciences Bonn-Rhein-Sieg, 53757 Sankt Augustin, Germany
- Department
of Electrical Engineering, Mechanical Engineering and Technical Journalism, University of Applied Sciences Bonn-Rhein-Sieg, 53757 Sankt Augustin, Germany
| | - Karl N. Kirschner
- Institute
of Technology, Resource and Energy-Efficient Engineering (TREE), University of Applied Sciences Bonn-Rhein-Sieg, 53757 Sankt Augustin, Germany
- Department
of Computer Science, University of Applied
Sciences Bonn-Rhein-Sieg, 53757 Sankt Augustin, Germany
| |
Collapse
|
14
|
Shen C, Zhang X, Hsieh CY, Deng Y, Wang D, Xu L, Wu J, Li D, Kang Y, Hou T, Pan P. A generalized protein-ligand scoring framework with balanced scoring, docking, ranking and screening powers. Chem Sci 2023; 14:8129-8146. [PMID: 37538816 PMCID: PMC10395315 DOI: 10.1039/d3sc02044d] [Citation(s) in RCA: 18] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2023] [Accepted: 07/03/2023] [Indexed: 08/05/2023] Open
Abstract
Applying machine learning algorithms to protein-ligand scoring functions has aroused widespread attention in recent years due to the high predictive accuracy and affordable computational cost. Nevertheless, most machine learning-based scoring functions are only applicable to a specific task, e.g., binding affinity prediction, binding pose prediction or virtual screening, suggesting that the development of a scoring function with balanced performance in all critical tasks remains a grand challenge. To this end, we propose a novel parameterization strategy by introducing an adjustable binding affinity term that represents the correlation between the predicted outcomes and experimental data into the training of mixture density network. The resulting residue-atom distance likelihood potential not only retains the superior docking and screening power over all the other state-of-the-art approaches, but also achieves a remarkable improvement in scoring and ranking performance. We emphatically explore the impacts of several key elements on prediction accuracy as well as the task preference, and demonstrate that the performance of scoring/ranking and docking/screening tasks of a certain model could be well balanced through an appropriate manner. Overall, our study highlights the potential utility of our innovative parameterization strategy as well as the resulting scoring framework in future structure-based drug design.
Collapse
Affiliation(s)
- Chao Shen
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University Hangzhou 310058 Zhejiang China
- State Key Lab of CAD&CG, Zhejiang University Hangzhou 310058 Zhejiang China
- School of Public Health, Zhejiang University Hangzhou 310058 Zhejiang China
- CarbonSilicon AI Technology Co., Ltd Hangzhou 310018 Zhejiang China
| | - Xujun Zhang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University Hangzhou 310058 Zhejiang China
| | - Chang-Yu Hsieh
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University Hangzhou 310058 Zhejiang China
| | - Yafeng Deng
- CarbonSilicon AI Technology Co., Ltd Hangzhou 310018 Zhejiang China
| | - Dong Wang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University Hangzhou 310058 Zhejiang China
| | - Lei Xu
- Institute of Bioinformatics and Medical Engineering, School of Electrical and Information Engineering, Jiangsu University of Technology Changzhou 213001 China
| | - Jian Wu
- School of Public Health, Zhejiang University Hangzhou 310058 Zhejiang China
| | - Dan Li
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University Hangzhou 310058 Zhejiang China
| | - Yu Kang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University Hangzhou 310058 Zhejiang China
| | - Tingjun Hou
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University Hangzhou 310058 Zhejiang China
- State Key Lab of CAD&CG, Zhejiang University Hangzhou 310058 Zhejiang China
| | - Peichen Pan
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University Hangzhou 310058 Zhejiang China
| |
Collapse
|
15
|
Zhang H, Saravanan KM, Zhang JZH. DeepBindGCN: Integrating Molecular Vector Representation with Graph Convolutional Neural Networks for Protein-Ligand Interaction Prediction. Molecules 2023; 28:4691. [PMID: 37375246 PMCID: PMC10301867 DOI: 10.3390/molecules28124691] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2023] [Revised: 06/08/2023] [Accepted: 06/09/2023] [Indexed: 06/29/2023] Open
Abstract
The core of large-scale drug virtual screening is to select the binders accurately and efficiently with high affinity from large libraries of small molecules in which non-binders are usually dominant. The binding affinity is significantly influenced by the protein pocket, ligand spatial information, and residue types/atom types. Here, we used the pocket residues or ligand atoms as the nodes and constructed edges with the neighboring information to comprehensively represent the protein pocket or ligand information. Moreover, the model with pre-trained molecular vectors performed better than the one-hot representation. The main advantage of DeepBindGCN is that it is independent of docking conformation, and concisely keeps the spatial information and physical-chemical features. Using TIPE3 and PD-L1 dimer as proof-of-concept examples, we proposed a screening pipeline integrating DeepBindGCN and other methods to identify strong-binding-affinity compounds. It is the first time a non-complex-dependent model has achieved a root mean square error (RMSE) value of 1.4190 and Pearson r value of 0.7584 in the PDBbind v.2016 core set, respectively, thereby showing a comparable prediction power with the state-of-the-art affinity prediction models that rely upon the 3D complex. DeepBindGCN provides a powerful tool to predict the protein-ligand interaction and can be used in many important large-scale virtual screening application scenarios.
Collapse
Affiliation(s)
- Haiping Zhang
- Shenzhen Institute of Synthetic Biology, Faculty of Synthetic Biology, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China
| | - Konda Mani Saravanan
- Department of Biotechnology, Bharath Institute of Higher Education and Research, Chennai 600073, Tamil Nadu, India;
| | - John Z. H. Zhang
- Shenzhen Institute of Synthetic Biology, Faculty of Synthetic Biology, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China
- School of Chemistry and Molecular Engineering, East China Normal University, Shanghai 200062, China
- NYU-ECNU Center for Computational Chemistry at NYU Shanghai, Shanghai 200062, China
| |
Collapse
|
16
|
Rayka M, Firouzi R. GB-score: Minimally designed machine learning scoring function based on distance-weighted interatomic contact features. Mol Inform 2023; 42:e2200135. [PMID: 36722733 DOI: 10.1002/minf.202200135] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2022] [Revised: 11/24/2022] [Accepted: 11/28/2022] [Indexed: 02/02/2023]
Abstract
In recent years, thanks to advances in computer hardware and dataset availability, data-driven approaches (like machine learning) have become one of the essential parts of the drug design framework to accelerate drug discovery procedures. Constructing a new scoring function, a function that can predict the binding score for a generated protein-ligand pose during docking procedure or a crystal complex, based on machine and deep learning has become an active research area in computer-aided drug design. GB-Score is a state-of-the-art machine learning-based scoring function that utilizes distance-weighted interatomic contact features, PDBbind-v2019 general set, and Gradient Boosting Trees algorithm to the binding affinity prediction. The distance-weighted interatomic contact featurization method used the distance between different ligand and protein atom types for numerical representation of the protein-ligand complex. GB-Score attains Pearson's correlation 0.862 and RMSE 1.190 on the CASF-2016 benchmark test in the scoring power metric. GB-Score's codes are freely available on the web at https://github.com/miladrayka/GB_Score.
Collapse
Affiliation(s)
- Milad Rayka
- Department of Physical Chemistry, Chemistry and Chemical Engineering Research Center of Iran, Tehran, Iran
| | - Rohoullah Firouzi
- Department of Physical Chemistry, Chemistry and Chemical Engineering Research Center of Iran, Tehran, Iran
| |
Collapse
|
17
|
Jiang D, Ye Z, Hsieh CY, Yang Z, Zhang X, Kang Y, Du H, Wu Z, Wang J, Zeng Y, Zhang H, Wang X, Wang M, Yao X, Zhang S, Wu J, Hou T. MetalProGNet: a structure-based deep graph model for metalloprotein-ligand interaction predictions. Chem Sci 2023; 14:2054-2069. [PMID: 36845922 PMCID: PMC9945430 DOI: 10.1039/d2sc06576b] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2022] [Accepted: 01/11/2023] [Indexed: 01/21/2023] Open
Abstract
Metalloproteins play indispensable roles in various biological processes ranging from reaction catalysis to free radical scavenging, and they are also pertinent to numerous pathologies including cancer, HIV infection, neurodegeneration, and inflammation. Discovery of high-affinity ligands for metalloproteins powers the treatment of these pathologies. Extensive efforts have been made to develop in silico approaches, such as molecular docking and machine learning (ML)-based models, for fast identification of ligands binding to heterogeneous proteins, but few of them have exclusively concentrated on metalloproteins. In this study, we first compiled the largest metalloprotein-ligand complex dataset containing 3079 high-quality structures, and systematically evaluated the scoring and docking powers of three competitive docking tools (i.e., PLANTS, AutoDock Vina and Glide SP) for metalloproteins. Then, a structure-based deep graph model called MetalProGNet was developed to predict metalloprotein-ligand interactions. In the model, the coordination interactions between metal ions and protein atoms and the interactions between metal ions and ligand atoms were explicitly modelled through graph convolution. The binding features were then predicted by the informative molecular binding vector learned from a noncovalent atom-atom interaction network. The evaluation on the internal metalloprotein test set, the independent ChEMBL dataset towards 22 different metalloproteins and the virtual screening dataset indicated that MetalProGNet outperformed various baselines. Finally, a noncovalent atom-atom interaction masking technique was employed to interpret MetalProGNet, and the learned knowledge accords with our understanding of physics.
Collapse
Affiliation(s)
- Dejun Jiang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University Hangzhou 310058 Zhejiang China .,Tencent Quantum Laboratory, Tencent Shenzhen 518057 Guangdong China .,College of Computer Science and Technology, Zhejiang University Hangzhou 310006 Zhejiang China
| | - Zhaofeng Ye
- Tencent Quantum Laboratory, Tencent Shenzhen 518057 Guangdong China
| | - Chang-Yu Hsieh
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University Hangzhou 310058 Zhejiang China
| | - Ziyi Yang
- Tencent Quantum Laboratory, Tencent Shenzhen 518057 Guangdong China
| | - Xujun Zhang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University Hangzhou 310058 Zhejiang China
| | - Yu Kang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University Hangzhou 310058 Zhejiang China
| | - Hongyan Du
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University Hangzhou 310058 Zhejiang China
| | - Zhenxing Wu
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University Hangzhou 310058 Zhejiang China
| | - Jike Wang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University Hangzhou 310058 Zhejiang China
| | - Yundian Zeng
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University Hangzhou 310058 Zhejiang China
| | - Haotian Zhang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University Hangzhou 310058 Zhejiang China
| | - Xiaorui Wang
- State Key Laboratory of Quality Research in Chinese Medicines, Macau University of Science and TechnologyMacao
| | - Mingyang Wang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University Hangzhou 310058 Zhejiang China
| | - Xiaojun Yao
- State Key Laboratory of Quality Research in Chinese Medicines, Macau University of Science and TechnologyMacao
| | - Shengyu Zhang
- Tencent Quantum Laboratory, Tencent Shenzhen 518057 Guangdong China
| | - Jian Wu
- College of Computer Science and Technology, Zhejiang University Hangzhou 310006 Zhejiang China
| | - Tingjun Hou
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University Hangzhou 310058 Zhejiang China
| |
Collapse
|
18
|
Qu X, Dong L, Zhang J, Si Y, Wang B. Systematic Improvement of the Performance of Machine Learning Scoring Functions by Incorporating Features of Protein-Bound Water Molecules. J Chem Inf Model 2022; 62:4369-4379. [PMID: 36083808 DOI: 10.1021/acs.jcim.2c00916] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Water molecules at the ligand-protein interfaces play crucial roles in the binding of the ligands, but the behavior of protein-bound water is largely ignored in many currently used machine learning (ML)-based scoring functions (SFs). In an attempt to improve the prediction performance of existing ML-based SFs, we estimated the water distribution with a HydraMap (HM) method and then incorporated the features extracted from protein-bound waters obtained in this way into three ML-based SFs: RF-Score, ECIF, and PLEC. It was found that a combination of HM-based features can consistently improve the performance of all three SFs, including their scoring, ranking, and docking power. HydraMap-based features show consistently good performance with both crystal structures and docked structures, demonstrating their robustness for SFs. Overall, HM-based features, which are a statistical representation of hydration sites at protein-ligand interfaces, are expected to improve the prediction performance for diverse SFs.
Collapse
Affiliation(s)
- Xiaoyang Qu
- State Key Laboratory of Physical Chemistry of Solid Surfaces and Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, College of Chemistry and Chemical Engineering and Innovation Laboratory for Sciences and Technologies of Energy Materials of Fujian Province (IKKEM), Xiamen University, Xiamen 361005 P. R. China
| | - Lina Dong
- State Key Laboratory of Physical Chemistry of Solid Surfaces and Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, College of Chemistry and Chemical Engineering and Innovation Laboratory for Sciences and Technologies of Energy Materials of Fujian Province (IKKEM), Xiamen University, Xiamen 361005 P. R. China
| | - Jinyan Zhang
- State Key Laboratory of Physical Chemistry of Solid Surfaces and Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, College of Chemistry and Chemical Engineering and Innovation Laboratory for Sciences and Technologies of Energy Materials of Fujian Province (IKKEM), Xiamen University, Xiamen 361005 P. R. China
| | - Yubing Si
- College of Chemistry, Zhengzhou University, Zhengzhou 450001, P. R. China
| | - Binju Wang
- State Key Laboratory of Physical Chemistry of Solid Surfaces and Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, College of Chemistry and Chemical Engineering and Innovation Laboratory for Sciences and Technologies of Energy Materials of Fujian Province (IKKEM), Xiamen University, Xiamen 361005 P. R. China
| |
Collapse
|
19
|
Monteiro NR, Oliveira JL, Arrais JP. DTITR: End-to-end drug–target binding affinity prediction with transformers. Comput Biol Med 2022; 147:105772. [DOI: 10.1016/j.compbiomed.2022.105772] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2022] [Revised: 06/07/2022] [Accepted: 06/19/2022] [Indexed: 11/03/2022]
|
20
|
Dong L, Qu X, Wang B. XLPFE: A Simple and Effective Machine Learning Scoring Function for Protein-Ligand Scoring and Ranking. ACS OMEGA 2022; 7:21727-21735. [PMID: 35785279 PMCID: PMC9245135 DOI: 10.1021/acsomega.2c01723] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/22/2022] [Accepted: 05/30/2022] [Indexed: 06/15/2023]
Abstract
Prediction of protein-ligand binding affinities is a central issue in structure-based computer-aided drug design. In recent years, much effort has been devoted to the prediction of the binding affinity in protein-ligand complexes using machine learning (ML). Due to the remarkable ability of ML methods in nonlinear fitting, ML-based scoring functions (SFs) can deliver much improved performance on a selected test set, such as the comparative assessment of scoring functions (CASF), when compared to the classical SFs. However, the performance of ML-based SFs heavily relies on the overall similarity of the training set and the test set. To improve the performance and transferability of an SF, we have tried to combine various features including energy terms from X-score and AutoDock Vina, the properties of ligands, and the statistical sequence-related information from either the binding site or the full protein. In conjunction with extreme trees (ET), an ML model, we have developed XLPFE, a new SF. Compared with other tested methods such as X-score, AutoDock Vina, ΔvinaXGB, PSH-ML, or CNN-score, XLPFE achieves consistently better scoring and ranking power for various types of protein-ligand complex structures beyond the CASF, suggesting that XLPFE has superior transferability. In particular, XLPFE performs better with metalloenzymes. With its faster speed, improved accuracy, and better transferability, XLPFE could be usefully applied to a diverse range of protein-ligand complexes.
Collapse
Affiliation(s)
- Lina Dong
- State
Key Laboratory of Physical Chemistry of Solid Surfaces and Fujian
Provincial Key Laboratory of Theoretical and Computational Chemistry,
iChEM, College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 360015, P. R. China
| | - Xiaoyang Qu
- State
Key Laboratory of Physical Chemistry of Solid Surfaces and Fujian
Provincial Key Laboratory of Theoretical and Computational Chemistry,
College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 360015, P. R. China
| | - Binju Wang
- State
Key Laboratory of Physical Chemistry of Solid Surfaces and Fujian
Provincial Key Laboratory of Theoretical and Computational Chemistry,
College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 360015, P. R. China
| |
Collapse
|
21
|
Monteiro NRC, Simões CJV, Ávila HV, Abbasi M, Oliveira JL, Arrais JP. Explainable deep drug-target representations for binding affinity prediction. BMC Bioinformatics 2022; 23:237. [PMID: 35715734 PMCID: PMC9204982 DOI: 10.1186/s12859-022-04767-y] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2022] [Accepted: 05/25/2022] [Indexed: 11/10/2022] Open
Abstract
Background Several computational advances have been achieved in the drug discovery field, promoting the identification of novel drug–target interactions and new leads. However, most of these methodologies have been overlooking the importance of providing explanations to the decision-making process of deep learning architectures. In this research study, we explore the reliability of convolutional neural networks (CNNs) at identifying relevant regions for binding, specifically binding sites and motifs, and the significance of the deep representations extracted by providing explanations to the model’s decisions based on the identification of the input regions that contributed the most to the prediction. We make use of an end-to-end deep learning architecture to predict binding affinity, where CNNs are exploited in their capacity to automatically identify and extract discriminating deep representations from 1D sequential and structural data. Results The results demonstrate the effectiveness of the deep representations extracted from CNNs in the prediction of drug–target interactions. CNNs were found to identify and extract features from regions relevant for the interaction, where the weight associated with these spots was in the range of those with the highest positive influence given by the CNNs in the prediction. The end-to-end deep learning model achieved the highest performance both in the prediction of the binding affinity and on the ability to correctly distinguish the interaction strength rank order when compared to baseline approaches. Conclusions This research study validates the potential applicability of an end-to-end deep learning architecture in the context of drug discovery beyond the confined space of proteins and ligands with determined 3D structure. Furthermore, it shows the reliability of the deep representations extracted from the CNNs by providing explainability to the decision-making process. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-022-04767-y.
Collapse
Affiliation(s)
- Nelson R C Monteiro
- Univ Coimbra, Centre for Informatics and Systems of the University of Coimbra, Department of Informatics Engineering, Coimbra, Portugal.
| | | | - Henrique V Ávila
- Univ Coimbra, Centre for Informatics and Systems of the University of Coimbra, Department of Informatics Engineering, Coimbra, Portugal
| | - Maryam Abbasi
- Univ Coimbra, Centre for Informatics and Systems of the University of Coimbra, Department of Informatics Engineering, Coimbra, Portugal
| | - José L Oliveira
- IEETA, Department of Electronics, Telecommunications and Informatics, University of Aveiro, Aveiro, Portugal
| | - Joel P Arrais
- Univ Coimbra, Centre for Informatics and Systems of the University of Coimbra, Department of Informatics Engineering, Coimbra, Portugal
| |
Collapse
|
22
|
Meli R, Morris GM, Biggin PC. Scoring Functions for Protein-Ligand Binding Affinity Prediction using Structure-Based Deep Learning: A Review. FRONTIERS IN BIOINFORMATICS 2022; 2:885983. [PMID: 36187180 PMCID: PMC7613667 DOI: 10.3389/fbinf.2022.885983] [Citation(s) in RCA: 32] [Impact Index Per Article: 10.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2022] [Accepted: 05/11/2022] [Indexed: 01/01/2023] Open
Abstract
The rapid and accurate in silico prediction of protein-ligand binding free energies or binding affinities has the potential to transform drug discovery. In recent years, there has been a rapid growth of interest in deep learning methods for the prediction of protein-ligand binding affinities based on the structural information of protein-ligand complexes. These structure-based scoring functions often obtain better results than classical scoring functions when applied within their applicability domain. Here we review structure-based scoring functions for binding affinity prediction based on deep learning, focussing on different types of architectures, featurization strategies, data sets, methods for training and evaluation, and the role of explainable artificial intelligence in building useful models for real drug-discovery applications.
Collapse
Affiliation(s)
- Rocco Meli
- Department of Biochemistry, University of Oxford, Oxford, United Kingdom
| | - Garrett M. Morris
- Department of Statistics, University of Oxford, Oxford, United Kingdom
| | - Philip C. Biggin
- Department of Biochemistry, University of Oxford, Oxford, United Kingdom
| |
Collapse
|
23
|
Yang C, Zhang Y. Delta Machine Learning to Improve Scoring-Ranking-Screening Performances of Protein-Ligand Scoring Functions. J Chem Inf Model 2022; 62:2696-2712. [PMID: 35579568 DOI: 10.1021/acs.jcim.2c00485] [Citation(s) in RCA: 31] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
Abstract
Protein-ligand scoring functions are widely used in structure-based drug design for fast evaluation of protein-ligand interactions, and it is of strong interest to develop scoring functions with machine-learning approaches. In this work, by expanding the training set, developing physically meaningful features, employing our recently developed linear empirical scoring function Lin_F9 (Yang, C. J. Chem. Inf. Model. 2021, 61, 4630-4644) as the baseline, and applying extreme gradient boosting (XGBoost) with Δ-machine learning, we have further improved the robustness and applicability of machine-learning scoring functions. Besides the top performances for scoring-ranking-screening power tests of the CASF-2016 benchmark, the new scoring function ΔLin_F9XGB also achieves superior scoring and ranking performances in different structure types that mimic real docking applications. The scoring powers of ΔLin_F9XGB for locally optimized poses, flexible redocked poses, and ensemble docked poses of the CASF-2016 core set achieve Pearson's correlation coefficient (R) values of 0.853, 0.839, and 0.813, respectively. In addition, the large-scale docking-based virtual screening test on the LIT-PCBA data set demonstrates the reliability and robustness of ΔLin_F9XGB in virtual screening application. The ΔLin_F9XGB scoring function and its code are freely available on the web at (https://yzhang.hpc.nyu.edu/Delta_LinF9_XGB).
Collapse
Affiliation(s)
- Chao Yang
- Department of Chemistry, New York University, New York, New York 10003, United States
| | - Yingkai Zhang
- Department of Chemistry, New York University, New York, New York 10003, United States.,NYU-ECNU Center for Computational Chemistry at NYU Shanghai, Shanghai 200062, China
| |
Collapse
|
24
|
Moon S, Zhung W, Yang S, Lim J, Kim WY. PIGNet: a physics-informed deep learning model toward generalized drug-target interaction predictions. Chem Sci 2022; 13:3661-3673. [PMID: 35432900 PMCID: PMC8966633 DOI: 10.1039/d1sc06946b] [Citation(s) in RCA: 70] [Impact Index Per Article: 23.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2021] [Accepted: 02/06/2022] [Indexed: 12/21/2022] Open
Abstract
Recently, deep neural network (DNN)-based drug-target interaction (DTI) models were highlighted for their high accuracy with affordable computational costs. Yet, the models' insufficient generalization remains a challenging problem in the practice of in silico drug discovery. We propose two key strategies to enhance generalization in the DTI model. The first is to predict the atom-atom pairwise interactions via physics-informed equations parameterized with neural networks and provides the total binding affinity of a protein-ligand complex as their sum. We further improved the model generalization by augmenting a broader range of binding poses and ligands to training data. We validated our model, PIGNet, in the comparative assessment of scoring functions (CASF) 2016, demonstrating the outperforming docking and screening powers than previous methods. Our physics-informing strategy also enables the interpretation of predicted affinities by visualizing the contribution of ligand substructures, providing insights for further ligand optimization.
Collapse
Affiliation(s)
- Seokhyun Moon
- Department of Chemistry, KAIST 291 Daehak-ro, Yuseong-gu Daejeon 34141 Republic of Korea
| | - Wonho Zhung
- Department of Chemistry, KAIST 291 Daehak-ro, Yuseong-gu Daejeon 34141 Republic of Korea
| | - Soojung Yang
- Department of Chemistry, KAIST 291 Daehak-ro, Yuseong-gu Daejeon 34141 Republic of Korea
| | - Jaechang Lim
- HITS Incorporation 124 Teheran-ro, Gangnam-gu Seoul 06234 Republic of Korea
| | - Woo Youn Kim
- Department of Chemistry, KAIST 291 Daehak-ro, Yuseong-gu Daejeon 34141 Republic of Korea
- HITS Incorporation 124 Teheran-ro, Gangnam-gu Seoul 06234 Republic of Korea
- KI for Artificial Intelligence, KAIST 291 Daehak-ro, Yuseong-gu Daejeon 34141 Republic of Korea
| |
Collapse
|