1
|
Vittorio S, Lunghini F, Morerio P, Gadioli D, Orlandini S, Silva P, Jan Martinovic, Pedretti A, Bonanni D, Del Bue A, Palermo G, Vistoli G, Beccari AR. Addressing docking pose selection with structure-based deep learning: Recent advances, challenges and opportunities. Comput Struct Biotechnol J 2024; 23:2141-2151. [PMID: 38827235 PMCID: PMC11141151 DOI: 10.1016/j.csbj.2024.05.024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2024] [Revised: 05/15/2024] [Accepted: 05/15/2024] [Indexed: 06/04/2024] Open
Abstract
Molecular docking is a widely used technique in drug discovery to predict the binding mode of a given ligand to its target. However, the identification of the near-native binding pose in docking experiments still represents a challenging task as the scoring functions currently employed by docking programs are parametrized to predict the binding affinity, and, therefore, they often fail to correctly identify the ligand native binding conformation. Selecting the correct binding mode is crucial to obtaining meaningful results and to conveniently optimizing new hit compounds. Deep learning (DL) algorithms have been an area of a growing interest in this sense for their capability to extract the relevant information directly from the protein-ligand structure. Our review aims to present the recent advances regarding the development of DL-based pose selection approaches, discussing limitations and possible future directions. Moreover, a comparison between the performances of some classical scoring functions and DL-based methods concerning their ability to select the correct binding mode is reported. In this regard, two novel DL-based pose selectors developed by us are presented.
Collapse
Affiliation(s)
- Serena Vittorio
- Dipartimento di Scienze Farmaceutiche, Università degli Studi di Milano, Via Luigi Mangiagalli 25, I-20133 Milano, Italy
| | - Filippo Lunghini
- EXSCALATE, Dompé Farmaceutici SpA, Via Tommaso de Amicis 95, 80123 Naples, Italy
| | - Pietro Morerio
- Pattern Analysis and Computer Vision, Fondazione Istituto Italiano di Tecnologia, Via Morego, 30, 16163 Genova, Italy
| | - Davide Gadioli
- Dipartimento di Elettronica Informazione e Bioingegneria, Politecnico di Milano, Via Ponzio 34/5, I-20133 Milano, Italy
| | - Sergio Orlandini
- SCAI, SuperComputing Applications and Innovation Department, CINECA, Via dei Tizii 6, Rome 00185, Italy
| | - Paulo Silva
- IT4Innovations, VSB – Technical University of Ostrava, 17. listopadu 2172/15, 70800 Ostrava-Poruba, Czech Republic
| | - Jan Martinovic
- IT4Innovations, VSB – Technical University of Ostrava, 17. listopadu 2172/15, 70800 Ostrava-Poruba, Czech Republic
| | - Alessandro Pedretti
- Dipartimento di Scienze Farmaceutiche, Università degli Studi di Milano, Via Luigi Mangiagalli 25, I-20133 Milano, Italy
| | - Domenico Bonanni
- Department of Physical and Chemical Sciences, University of L′Aquila, via Vetoio, L′Aquila 67010, Italy
| | - Alessio Del Bue
- Pattern Analysis and Computer Vision, Fondazione Istituto Italiano di Tecnologia, Via Morego, 30, 16163 Genova, Italy
| | - Gianluca Palermo
- Dipartimento di Elettronica Informazione e Bioingegneria, Politecnico di Milano, Via Ponzio 34/5, I-20133 Milano, Italy
| | - Giulio Vistoli
- Dipartimento di Scienze Farmaceutiche, Università degli Studi di Milano, Via Luigi Mangiagalli 25, I-20133 Milano, Italy
| | - Andrea R. Beccari
- EXSCALATE, Dompé Farmaceutici SpA, Via Tommaso de Amicis 95, 80123 Naples, Italy
| |
Collapse
|
2
|
Gangwal A, Ansari A, Ahmad I, Azad AK, Wan Sulaiman WMA. Current strategies to address data scarcity in artificial intelligence-based drug discovery: A comprehensive review. Comput Biol Med 2024; 179:108734. [PMID: 38964243 DOI: 10.1016/j.compbiomed.2024.108734] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2024] [Revised: 06/01/2024] [Accepted: 06/08/2024] [Indexed: 07/06/2024]
Abstract
Artificial intelligence (AI) has played a vital role in computer-aided drug design (CADD). This development has been further accelerated with the increasing use of machine learning (ML), mainly deep learning (DL), and computing hardware and software advancements. As a result, initial doubts about the application of AI in drug discovery have been dispelled, leading to significant benefits in medicinal chemistry. At the same time, it is crucial to recognize that AI is still in its infancy and faces a few limitations that need to be addressed to harness its full potential in drug discovery. Some notable limitations are insufficient, unlabeled, and non-uniform data, the resemblance of some AI-generated molecules with existing molecules, unavailability of inadequate benchmarks, intellectual property rights (IPRs) related hurdles in data sharing, poor understanding of biology, focus on proxy data and ligands, lack of holistic methods to represent input (molecular structures) to prevent pre-processing of input molecules (feature engineering), etc. The major component in AI infrastructure is input data, as most of the successes of AI-driven efforts to improve drug discovery depend on the quality and quantity of data, used to train and test AI algorithms, besides a few other factors. Additionally, data-gulping DL approaches, without sufficient data, may collapse to live up to their promise. Current literature suggests a few methods, to certain extent, effectively handle low data for better output from the AI models in the context of drug discovery. These are transferring learning (TL), active learning (AL), single or one-shot learning (OSL), multi-task learning (MTL), data augmentation (DA), data synthesis (DS), etc. One different method, which enables sharing of proprietary data on a common platform (without compromising data privacy) to train ML model, is federated learning (FL). In this review, we compare and discuss these methods, their recent applications, and limitations while modeling small molecule data to get the improved output of AI methods in drug discovery. Article also sums up some other novel methods to handle inadequate data.
Collapse
Affiliation(s)
- Amit Gangwal
- Department of Natural Product Chemistry, Shri Vile Parle Kelavani Mandal's Institute of Pharmacy, Dhule, 424001, Maharashtra, India.
| | - Azim Ansari
- Computer Aided Drug Design Center, Shri Vile Parle Kelavani Mandal's Institute of Pharmacy, Dhule, 424001, Maharashtra, India
| | - Iqrar Ahmad
- Department of Pharmaceutical Chemistry, Prof. Ravindra Nikam College of Pharmacy, Gondur, Dhule, 424002, Maharashtra, India.
| | - Abul Kalam Azad
- Faculty of Pharmacy, University College of MAIWP International, Batu Caves, 68100, Kuala Lumpur, Malaysia.
| | | |
Collapse
|
3
|
Chen G, Qin Y, Sheng R. Integrating Prior Chemical Knowledge into the Graph Transformer Network to Predict the Stability Constants of Chelating Agents and Metal Ions. J Chem Inf Model 2024; 64:5867-5877. [PMID: 39075943 DOI: 10.1021/acs.jcim.4c00614] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/31/2024]
Abstract
The latest advancements in nuclear medicine indicate that radioactive isotopes and associated metal chelators play crucial roles in the diagnosis and treatment of diseases. The development of metal chelators mainly relies on traditional trial-and-error methods, lacking rational guidance and design. In this study, we propose the structure-aware transformer (SAT) combined with molecular fingerprint (SATCMF), a novel graph transformer network framework that incorporates prior chemical knowledge to construct coordination edges and learns the interactions between chelating agents and metal ions. SATCMF is trained on stability data collected from metal ion-ligand complexes, leveraging the SAT network to extract structural features relevant to the binding of ligands with metal ions. It further integrates molecular fingerprint features to refine the prediction of the stability constants of the chelating agents and metal ions. The experimental results on benchmark data set demonstrate that SATCMF achieves state-of-the-art performance based on four different graph neural network architectures. Additionally, visualizing the learned molecular attention distribution provides interpretable insights from the prediction results, offering valuable guidance for the development of novel metal chelators.
Collapse
Affiliation(s)
- Geng Chen
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, P. R. China
| | - Yiyang Qin
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, P. R. China
| | - Rong Sheng
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, P. R. China
- Jinhua Institute of Zhejiang University, Zhejiang University, Jinhua 321036, P. R. China
| |
Collapse
|
4
|
Choi S, Seo S, Kim BJ, Park C, Park S. PIDiff: Physics informed diffusion model for protein pocket-specific 3D molecular generation. Comput Biol Med 2024; 180:108865. [PMID: 39067153 DOI: 10.1016/j.compbiomed.2024.108865] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2024] [Revised: 07/02/2024] [Accepted: 07/07/2024] [Indexed: 07/30/2024]
Abstract
Designing drugs capable of binding to the structure of target proteins for treating diseases is essential in drug development. Recent remarkable advancements in geometric deep learning have led to unprecedented progress in three-dimensional (3D) generation of ligands that can bind to the protein pocket. However, most existing methods primarily focus on modeling the geometric information of ligands in 3D space. Consequently, these methods fail to consider that the binding of proteins and ligands is a phenomenon driven by intrinsic physicochemical principles. Motivated by this understanding, we propose PIDiff, a model for generating molecules by accounting in the physicochemical principles of protein-ligand binding. Our model learns not only the structural information of proteins and ligands but also to minimize the binding free energy between them. To evaluate the proposed model, we introduce an experimental framework that surpasses traditional assessment methods by encompassing various essential aspects for the practical application of generative models to actual drug development. The results confirm that our model outperforms baseline models on the CrossDocked2020 benchmark dataset, demonstrating its superiority. Through diverse experiments, we have illustrated the promising potential of the proposed model in practical drug development.
Collapse
Affiliation(s)
- Seungyeon Choi
- Department of Computer Science, Yonsei University, Seoul, 03722, Republic of Korea
| | - Sangmin Seo
- Department of Computer Science, Yonsei University, Seoul, 03722, Republic of Korea
| | - Byung Ju Kim
- UBLBio Corporation, Suwon, 16679, Republic of Korea
| | - Chihyun Park
- Department of Computer Science and Engineering, Kangwon National University, Chuncheon, 24341, Republic of Korea
| | - Sanghyun Park
- Department of Computer Science, Yonsei University, Seoul, 03722, Republic of Korea.
| |
Collapse
|
5
|
Liu T, Simine L. DeltaGzip: Computing Biopolymer-Ligand Binding Affinity via Kolmogorov Complexity and Lossless Compression. J Chem Inf Model 2024; 64:5617-5623. [PMID: 38980667 DOI: 10.1021/acs.jcim.4c00461] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/10/2024]
Abstract
The design of biosequences for biosensing and therapeutics is a challenging multistep search and optimization task. In principle, computational modeling may speed up the design process by virtual screening of sequences based on their binding affinities to target molecules. However, in practice, existing machine-learned models trained to predict binding affinities lack the flexibility with respect to reaction conditions, and molecular dynamics simulations that can incorporate reaction conditions suffer from high computational costs. Here, we describe a computational approach called DeltaGzip that evaluates the free energy of binding in biopolymer-ligand complexes from ultrashort equilibrium molecular dynamics simulations. The entropy of binding is evaluated using the Kolmogorov complexity definition of entropy and approximated using a lossless compression algorithm, Gzip. We benchmark the method on a well-studied data set of protein-ligand complexes comparing the predictions of DeltaGzip to the free energies of binding obtained using Jarzynski equality and experimental measurements.
Collapse
Affiliation(s)
- Tao Liu
- Department of Chemistry, McGill University, Montreal, Quebec H3A 0B8, Canada
| | - Lena Simine
- Department of Chemistry, McGill University, Montreal, Quebec H3A 0B8, Canada
| |
Collapse
|
6
|
Chen S, Noh J, Jang J, Kim S, Gu GH, Jung Y. Reaction Templates: Bridging Synthesis Knowledge and Artificial Intelligence. Acc Chem Res 2024; 57:1964-1972. [PMID: 38924502 DOI: 10.1021/acs.accounts.4c00261] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/28/2024]
Abstract
ConspectusThe field of chemical research boasts a long history of developing software to automate synthesis planning and reaction prediction. Early software relied heavily on expert systems, requiring significant effort to encode vast amounts of synthesis knowledge into a computer-readable format. However, recent advancements in deep learning have shifted the focus toward AI models, offering improved prediction capabilities. Despite these advancements, current AI models often lack the integration of known synthesis rules and intuitions, creating a gap that hinders interpretability and future development of the models. To bridge them, our research group has been actively working on incorporating reaction templates into deep learning models, achieving promising results across various applications.In this Account, we present our latest works to incorporate the known synthesis knowledge into the deep learning models through the utilization of reaction templates. We begin by highlighting the limitations of early computer programs heavily reliant on hand-coded rules. These programs, while providing a foundation for the field, presented limitations in scalability and adaptability. We then introduce SMARTS (SMILES arbitrary target specification), a popular Python-readable format for representing chemical reactions. This format of reaction encoding facilitates the quick integration of synthesis knowledge into AI models built using the Python language. With the SMARTS-based reaction templates, we introduce our recent efforts of developing an AI model for reaction-based molecule optimization. Subsequently, we discuss the recent efforts to automate the extraction of reaction templates from vast chemical reaction databases. This approach eliminates the previously required manual effort of encoding knowledge, a process that could be time-consuming and prone to error when dealing with large data sets. By customizing the automated extraction algorithm, we have developed powerful AI models for specific tasks such as retrosynthesis (LocalRetro), reaction outcome prediction (LocalTransform), and atom-to-atom mapping (LocalMapper). These models, aligned with the intuition of chemists, demonstrate the effectiveness of incorporating reaction templates into deep learning frameworks.Looking toward the future, we believe that utilizing reaction templates to connect known chemical knowledge and AI models holds immense potential for various applications. Not only can this approach significantly benefit future AI models focused on challenging tasks like reaction mechanism labeling and prediction, but we anticipate it can also extend its reach to the realm of inorganic synthesis. By integrating synthesis knowledge, we can not only achieve improved performance but also enhance the interpretability of AI models, paving the way for further advancements in AI-powered chemical synthesis.
Collapse
Affiliation(s)
- Shuan Chen
- Department of Chemical and Biological Engineering, and Institute of Chemical Process, Seoul National University, 1 Gwanak-ro, Gwanak-gu, Seoul 08826, South Korea
| | - Juhwan Noh
- Chemical Data-Driven Research Center, Korea Research Institute of Chemical Technology (KRICT), 141 Gajeong-ro, Yuseong-gu, Daejeon 34114, South Korea
| | - Jidon Jang
- Data Convergence Drug Research Center, Korea Research Institute of Chemical Technology (KRICT), 141 Gajeong-ro, Yuseong-gu, Daejeon 34114, South Korea
| | - Seongmin Kim
- Department of Chemical and Biomolecular Engineering, Korea Advanced Institute of Science and Technology (KAIST), 291, Daehak-ro, Yuseong-gu, Daejeon 34141, South Korea
| | - Geun Ho Gu
- Department of Energy Engineering, Korea Institute of Energy Technology (KENTECH), 21 Kentech-gil, Naju, Jeonnam 58330, South Korea
| | - Yousung Jung
- Department of Chemical and Biological Engineering, and Institute of Chemical Process, Seoul National University, 1 Gwanak-ro, Gwanak-gu, Seoul 08826, South Korea
- Institute of Engineering Research, Seoul National University, 1 Gwanak-ro, Gwanak-gu, Seoul 08826, South Korea
- Interdisciplinary Program in Artificial Intelligence, Seoul National University, 1 Gwanak-ro, Gwanak-gu, Seoul 08826, South Korea
| |
Collapse
|
7
|
Chen Y, Liang X, Du W, Liang Y, Wong G, Chen L. Drug-Target Interaction Prediction Based on an Interactive Inference Network. Int J Mol Sci 2024; 25:7753. [PMID: 39062996 PMCID: PMC11277210 DOI: 10.3390/ijms25147753] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2024] [Revised: 06/25/2024] [Accepted: 06/27/2024] [Indexed: 07/28/2024] Open
Abstract
Drug-target interactions underlie the actions of chemical substances in medicine. Moreover, drug repurposing can expand use profiles while reducing costs and development time by exploiting potential multi-functional pharmacological properties based upon additional target interactions. Nonetheless, drug repurposing relies on the accurate identification and validation of drug-target interactions (DTIs). In this study, a novel drug-target interaction prediction model was developed. The model, based on an interactive inference network, contains embedding, encoding, interaction, feature extraction, and output layers. In addition, this study used Morgan and PubChem molecular fingerprints as additional information for drug encoding. The interaction layer in our model simulates the drug-target interaction process, which assists in understanding the interaction by representing the interaction space. Our method achieves high levels of predictive performance, as well as interpretability of drug-target interactions. Additionally, we predicted and validated 22 Alzheimer's disease-related targets, suggesting our model is robust and effective and thus may be beneficial for drug repurposing.
Collapse
Affiliation(s)
- Yuqi Chen
- College of Mathematics and Computer, Shantou University, Shantou 515063, China; (Y.C.); (X.L.)
| | - Xiaomin Liang
- College of Mathematics and Computer, Shantou University, Shantou 515063, China; (Y.C.); (X.L.)
| | - Wei Du
- Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun 130012, China; (W.D.); (Y.L.)
| | - Yanchun Liang
- Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun 130012, China; (W.D.); (Y.L.)
| | - Garry Wong
- Faculty of Health Sciences, University of Macau, Taipa, Macau SAR 999078, China;
| | - Liang Chen
- College of Mathematics and Computer, Shantou University, Shantou 515063, China; (Y.C.); (X.L.)
| |
Collapse
|
8
|
Kairys V, Baranauskiene L, Kazlauskiene M, Zubrienė A, Petrauskas V, Matulis D, Kazlauskas E. Recent advances in computational and experimental protein-ligand affinity determination techniques. Expert Opin Drug Discov 2024; 19:649-670. [PMID: 38715415 DOI: 10.1080/17460441.2024.2349169] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2024] [Accepted: 04/25/2024] [Indexed: 05/22/2024]
Abstract
INTRODUCTION Modern drug discovery revolves around designing ligands that target the chosen biomolecule, typically proteins. For this, the evaluation of affinities of putative ligands is crucial. This has given rise to a multitude of dedicated computational and experimental methods that are constantly being developed and improved. AREAS COVERED In this review, the authors reassess both the industry mainstays and the newest trends among the methods for protein - small-molecule affinity determination. They discuss both computational affinity predictions and experimental techniques, describing their basic principles, main limitations, and advantages. Together, this serves as initial guide to the currently most popular and cutting-edge ligand-binding assays employed in rational drug design. EXPERT OPINION The affinity determination methods continue to develop toward miniaturization, high-throughput, and in-cell application. Moreover, the availability of data analysis tools has been constantly increasing. Nevertheless, cross-verification of data using at least two different techniques and careful result interpretation remain of utmost importance.
Collapse
Affiliation(s)
- Visvaldas Kairys
- Department of Bioinformatics, Institute of Biotechnology, Life Sciences Center, Vilnius University, Vilnius, Lithuania
| | - Lina Baranauskiene
- Department of Biothermodynamics and Drug Design, Institute of Biotechnology, Life Sciences Center, Vilnius University, Vilnius, Lithuania
| | | | - Asta Zubrienė
- Department of Biothermodynamics and Drug Design, Institute of Biotechnology, Life Sciences Center, Vilnius University, Vilnius, Lithuania
| | - Vytautas Petrauskas
- Department of Biothermodynamics and Drug Design, Institute of Biotechnology, Life Sciences Center, Vilnius University, Vilnius, Lithuania
| | - Daumantas Matulis
- Department of Biothermodynamics and Drug Design, Institute of Biotechnology, Life Sciences Center, Vilnius University, Vilnius, Lithuania
| | - Egidijus Kazlauskas
- Department of Biothermodynamics and Drug Design, Institute of Biotechnology, Life Sciences Center, Vilnius University, Vilnius, Lithuania
| |
Collapse
|
9
|
Abubakar ML, Kapoor N, Sharma A, Gambhir L, Jasuja ND, Sharma G. Artificial Intelligence in Drug Identification and Validation: A Scoping Review. Drug Res (Stuttg) 2024; 74:208-219. [PMID: 38830370 DOI: 10.1055/a-2306-8311] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/05/2024]
Abstract
The end-to-end process in the discovery of drugs involves therapeutic candidate identification, validation of identified targets, identification of hit compound series, lead identification and optimization, characterization, and formulation and development. The process is lengthy, expensive, tedious, and inefficient, with a large attrition rate for novel drug discovery. Today, the pharmaceutical industry is focused on improving the drug discovery process. Finding and selecting acceptable drug candidates effectively can significantly impact the price and profitability of new medications. Aside from the cost, there is a need to reduce the end-to-end process time, limiting the number of experiments at various stages. To achieve this, artificial intelligence (AI) has been utilized at various stages of drug discovery. The present study aims to identify the recent work that has developed AI-based models at various stages of drug discovery, identify the stages that need more concern, present the taxonomy of AI methods in drug discovery, and provide research opportunities. From January 2016 to September 1, 2023, the study identified all publications that were cited in the electronic databases including Scopus, NCBI PubMed, MEDLINE, Anthropology Plus, Embase, APA PsycInfo, SOCIndex, and CINAHL. Utilising a standardized form, data were extracted, and presented possible research prospects based on the analysis of the extracted data.
Collapse
Affiliation(s)
| | - Neha Kapoor
- School of Applied Sciences, Suresh Gyan Vihar University, Jaipur, Rajasthan, India
| | - Asha Sharma
- Department of Zoology, Swargiya P. N. K. S. Govt. PG College, Dausa, Rajasthan, India
| | - Lokesh Gambhir
- School of Basic and Applied Sciences, Shri Guru Ram Rai University, Dehradun, Uttarakhand, India
| | | | - Gaurav Sharma
- School of Applied Sciences, Suresh Gyan Vihar University, Jaipur, Rajasthan, India
| |
Collapse
|
10
|
Backenköhler M, Groß J, Wolf V, Volkamer A. Guided Docking as a Data Generation Approach Facilitates Structure-Based Machine Learning on Kinases. J Chem Inf Model 2024; 64:4009-4020. [PMID: 38751014 DOI: 10.1021/acs.jcim.4c00055] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/28/2024]
Abstract
Drug discovery pipelines nowadays rely on machine learning models to explore and evaluate large chemical spaces. While including 3D structural information is considered beneficial, structural models are hindered by the availability of protein-ligand complex structures. Exemplified for kinase drug discovery, we address this issue by generating kinase-ligand complex data using template docking for the kinase compound subset of available ChEMBL assay data. To evaluate the benefit of the created complex data, we use it to train a structure-based E(3)-invariant graph neural network. Our evaluation shows that binding affinities can be predicted with significantly higher precision by models that take synthetic binding poses into account compared to ligand- or drug-target interaction models alone.
Collapse
Affiliation(s)
- Michael Backenköhler
- Data Driven Drug Design, Center for Bioinformatics, Saarland University, Saarbrücken 66123, Germany
| | - Joschka Groß
- Modeling and Simulation, Saarland University, Saarbrücken 66123, Germany
| | - Verena Wolf
- Modeling and Simulation, Saarland University, Saarbrücken 66123, Germany
| | - Andrea Volkamer
- Data Driven Drug Design, Center for Bioinformatics, Saarland University, Saarbrücken 66123, Germany
- Structural Bioinformatics and in Silico Toxicology Institute of Physiology, Universitätsmedizin Berlin, Berlin 10117, Germany
| |
Collapse
|
11
|
Zhang X, Shen C, Zhang H, Kang Y, Hsieh CY, Hou T. Advancing Ligand Docking through Deep Learning: Challenges and Prospects in Virtual Screening. Acc Chem Res 2024; 57:1500-1509. [PMID: 38577892 DOI: 10.1021/acs.accounts.4c00093] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/06/2024]
Abstract
Molecular docking, also termed ligand docking (LD), is a pivotal element of structure-based virtual screening (SBVS) used to predict the binding conformations and affinities of protein-ligand complexes. Traditional LD methodologies rely on a search and scoring framework, utilizing heuristic algorithms to explore binding conformations and scoring functions to evaluate binding strengths. However, to meet the efficiency demands of SBVS, these algorithms and functions are often simplified, prioritizing speed over accuracy.The emergence of deep learning (DL) has exerted a profound impact on diverse fields, ranging from natural language processing to computer vision and drug discovery. DeepMind's AlphaFold2 has impressively exhibited its ability to accurately predict protein structures solely from amino acid sequences, highlighting the remarkable potential of DL in conformation prediction. This groundbreaking advancement circumvents the traditional search-scoring frameworks in LD, enhancing both accuracy and processing speed and thereby catalyzing a broader adoption of DL algorithms in binding pose prediction. Nevertheless, a consensus on certain aspects remains elusive.In this Account, we delineate the current status of employing DL to augment LD within the VS paradigm, highlighting our contributions to this domain. Furthermore, we discuss the challenges and future prospects, drawing insights from our scholarly investigations. Initially, we present an overview of VS and LD, followed by an introduction to DL paradigms, which deviate significantly from traditional search-scoring frameworks. Subsequently, we delve into the challenges associated with the development of DL-based LD (DLLD), encompassing evaluation metrics, application scenarios, and physical plausibility of the predicted conformations. In the evaluation of LD algorithms, it is essential to recognize the multifaceted nature of the metrics. While the accuracy of binding pose prediction, often measured by the success rate, is a pivotal aspect, the scoring/screening power and computational speed of these algorithms are equally important given the pivotal role of LD tools in VS. Regarding application scenarios, early methods focused on blind docking, where the binding site is unknown. However, recent studies suggest a shift toward identifying binding sites rather than solely predicting binding poses within these models. In contrast, LD with a known pocket in VS has been shown to be more practical. Physical plausibility poses another significant challenge. Although DLLD models often achieve higher success rates compared to traditional methods, they may generate poses with implausible local structures, such as incorrect bond angles or lengths, which are disadvantageous for postprocessing tasks like visualization. Finally, we discuss the future perspectives for DLLD, emphasizing the need to improve generalization ability, strike a balance between speed and accuracy, account for protein conformation flexibility, and enhance physical plausibility. Additionally, we delve into the comparison between generative and regression algorithms in this context, exploring their respective strengths and potential.
Collapse
Affiliation(s)
- Xujun Zhang
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
- Hangzhou Carbonsilicon AI Technology Co., Ltd, Hangzhou 310018, Zhejiang, China
| | - Chao Shen
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
- Hangzhou Carbonsilicon AI Technology Co., Ltd, Hangzhou 310018, Zhejiang, China
| | - Haotian Zhang
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
- Hangzhou Carbonsilicon AI Technology Co., Ltd, Hangzhou 310018, Zhejiang, China
| | - Yu Kang
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Chang-Yu Hsieh
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Tingjun Hou
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| |
Collapse
|
12
|
Li J, Guan X, Zhang O, Sun K, Wang Y, Bagni D, Head-Gordon T. Leak Proof PDBBind: A Reorganized Dataset of Protein-Ligand Complexes for More Generalizable Binding Affinity Prediction. ARXIV 2024:arXiv:2308.09639v2. [PMID: 37645037 PMCID: PMC10462179] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Subscribe] [Scholar Register] [Indexed: 08/31/2023]
Abstract
Many physics-based and machine-learned scoring functions (SFs) used to predict protein-ligand binding free energies have been trained on the PDBBind dataset. However, it is controversial as to whether new SFs are actually improving since the general, refined, and core datasets of PDBBind are cross-contaminated with proteins and ligands with high similarity, and hence they may not perform comparably well in binding prediction of new protein-ligand complexes. In this work we have carefully prepared a cleaned PDBBind data set of non-covalent binders that are split into training, validation, and test datasets to control for data leakage, defined as proteins and ligands with high sequence and structural similarity. The resulting leak-proof (LP)-PDBBind data is used to retrain four popular SFs: AutoDock Vina, Random Forest (RF)-Score, InteractionGraphNet (IGN), and DeepDTA, to better test their capabilities when applied to new protein-ligand complexes. In particular we have formulated a new independent data set, BDB2020+, by matching high quality binding free energies from BindingDB with co-crystalized ligand-protein complexes from the PDB that have been deposited since 2020. Based on all the benchmark results, the retrained models using LP-PDBBind consistently perform better, with IGN especially being recommended for scoring and ranking applications for new protein-ligand systems.
Collapse
|
13
|
Kumar N, Acharya V. Advances in machine intelligence-driven virtual screening approaches for big-data. Med Res Rev 2024; 44:939-974. [PMID: 38129992 DOI: 10.1002/med.21995] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2022] [Revised: 07/15/2023] [Accepted: 10/29/2023] [Indexed: 12/23/2023]
Abstract
Virtual screening (VS) is an integral and ever-evolving domain of drug discovery framework. The VS is traditionally classified into ligand-based (LB) and structure-based (SB) approaches. Machine intelligence or artificial intelligence has wide applications in the drug discovery domain to reduce time and resource consumption. In combination with machine intelligence algorithms, VS has emerged into revolutionarily progressive technology that learns within robust decision orders for data curation and hit molecule screening from large VS libraries in minutes or hours. The exponential growth of chemical and biological data has evolved as "big-data" in the public domain demands modern and advanced machine intelligence-driven VS approaches to screen hit molecules from ultra-large VS libraries. VS has evolved from an individual approach (LB and SB) to integrated LB and SB techniques to explore various ligand and target protein aspects for the enhanced rate of appropriate hit molecule prediction. Current trends demand advanced and intelligent solutions to handle enormous data in drug discovery domain for screening and optimizing hits or lead with fewer or no false positive hits. Following the big-data drift and tremendous growth in computational architecture, we presented this review. Here, the article categorized and emphasized individual VS techniques, detailed literature presented for machine learning implementation, modern machine intelligence approaches, and limitations and deliberated the future prospects.
Collapse
Affiliation(s)
- Neeraj Kumar
- Artificial Intelligence for Computational Biology Lab (AICoB), Biotechnology Division, CSIR-Institute of Himalayan Bioresource Technology, Palampur, Himachal Pradesh, India
- Academy of Scientific and Innovative Research, Ghaziabad, India
| | - Vishal Acharya
- Artificial Intelligence for Computational Biology Lab (AICoB), Biotechnology Division, CSIR-Institute of Himalayan Bioresource Technology, Palampur, Himachal Pradesh, India
- Academy of Scientific and Innovative Research, Ghaziabad, India
| |
Collapse
|
14
|
Qiu Y, Cheng F. Artificial intelligence for drug discovery and development in Alzheimer's disease. Curr Opin Struct Biol 2024; 85:102776. [PMID: 38335558 DOI: 10.1016/j.sbi.2024.102776] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2023] [Revised: 12/29/2023] [Accepted: 01/15/2024] [Indexed: 02/12/2024]
Abstract
The complex molecular mechanism and pathophysiology of Alzheimer's disease (AD) limits the development of effective therapeutics or prevention strategies. Artificial Intelligence (AI)-guided drug discovery combined with genetics/multi-omics (genomics, epigenomics, transcriptomics, proteomics, and metabolomics) analysis contributes to the understanding of the pathophysiology and precision medicine of the disease, including AD and AD-related dementia. In this review, we summarize the AI-driven methodologies for AD-agnostic drug discovery and development, including de novo drug design, virtual screening, and prediction of drug-target interactions, all of which have shown potentials. In particular, AI-based drug repurposing emerges as a compelling strategy to identify new indications for existing drugs for AD. We provide several emerging AD targets from human genetics and multi-omics findings and highlight recent AI-based technologies and their applications in drug discovery using AD as a prototypical example. In closing, we discuss future challenges and directions in AI-based drug discovery for AD and other neurodegenerative diseases.
Collapse
Affiliation(s)
- Yunguang Qiu
- Genomic Medicine Institute, Lerner Research Institute, Cleveland Clinic, Cleveland, OH 44195, USA. https://twitter.com/YunguangQiu
| | - Feixiong Cheng
- Genomic Medicine Institute, Lerner Research Institute, Cleveland Clinic, Cleveland, OH 44195, USA; Department of Molecular Medicine, Cleveland Clinic Lerner College of Medicine, Case Western Reserve University, Cleveland, OH 44195, USA; Cleveland Clinic Genome Center, Lerner Research Institute, Cleveland Clinic, Cleveland, OH 44195, USA.
| |
Collapse
|
15
|
Luo D, Liu D, Qu X, Dong L, Wang B. Enhancing Generalizability in Protein-Ligand Binding Affinity Prediction with Multimodal Contrastive Learning. J Chem Inf Model 2024; 64:1892-1906. [PMID: 38441880 DOI: 10.1021/acs.jcim.3c01961] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/26/2024]
Abstract
Improving the generalization ability of scoring functions remains a major challenge in protein-ligand binding affinity prediction. Many machine learning methods are limited by their reliance on single-modal representations, hindering a comprehensive understanding of protein-ligand interactions. We introduce a graph-neural-network-based scoring function that utilizes a triplet contrastive learning loss to improve protein-ligand representations. In this model, three-dimensional complex representations and the fusion of two-dimensional ligand and coarse-grained pocket representations converge while distancing from decoy representations in latent space. After rigorous validation on multiple external data sets, our model exhibits commendable generalization capabilities compared to those of other deep learning-based scoring functions, marking it as a promising tool in the realm of drug discovery. In the future, our training framework can be extended to other biophysical- and biochemical-related problems such as protein-protein interaction and protein mutation prediction.
Collapse
Affiliation(s)
- Ding Luo
- State Key Laboratory of Physical Chemistry of Solid Surfaces and Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, P. R. China
| | - Dandan Liu
- State Key Laboratory of Physical Chemistry of Solid Surfaces and Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, P. R. China
| | - Xiaoyang Qu
- School of Pharmacy and Medical Technology, Putian University, Putian 351100, P. R. China
- Key Laboratory of Pharmaceutical Analysis and Laboratory Medicine (Putian University), Fujian Province University, Putian 351100, P. R. China
| | - Lina Dong
- State Key Laboratory of Physical Chemistry of Solid Surfaces and Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, P. R. China
| | - Binju Wang
- State Key Laboratory of Physical Chemistry of Solid Surfaces and Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, P. R. China
- Innovation Laboratory for Sciences and Technologies of Energy Materials of Fujian Province (IKKEM), Xiamen 361005, P. R. China
| |
Collapse
|
16
|
Chang J, Ye JC. Bidirectional generation of structure and properties through a single molecular foundation model. Nat Commun 2024; 15:2323. [PMID: 38485914 PMCID: PMC10940637 DOI: 10.1038/s41467-024-46440-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2023] [Accepted: 02/27/2024] [Indexed: 03/18/2024] Open
Abstract
Recent successes of foundation models in artificial intelligence have prompted the emergence of large-scale chemical pre-trained models. Despite the growing interest in large molecular pre-trained models that provide informative representations for downstream tasks, attempts for multimodal pre-training approaches on the molecule domain were limited. To address this, here we present a multimodal molecular pre-trained model that incorporates the modalities of structure and biochemical properties, drawing inspiration from recent advances in multimodal learning techniques. Our proposed model pipeline of data handling and training objectives aligns the structure/property features in a common embedding space, which enables the model to regard bidirectional information between the molecules' structure and properties. These contributions emerge synergistic knowledge, allowing us to tackle both multimodal and unimodal downstream tasks through a single model. Through extensive experiments, we demonstrate that our model has the capabilities to solve various meaningful chemical challenges, including conditional molecule generation, property prediction, molecule classification, and reaction prediction.
Collapse
Affiliation(s)
- Jinho Chang
- Graduate School of AI, KAIST, Daejeon, South Korea
| | - Jong Chul Ye
- Graduate School of AI, KAIST, Daejeon, South Korea.
| |
Collapse
|
17
|
Zhang Y, Li S, Meng K, Sun S. Machine Learning for Sequence and Structure-Based Protein-Ligand Interaction Prediction. J Chem Inf Model 2024; 64:1456-1472. [PMID: 38385768 DOI: 10.1021/acs.jcim.3c01841] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/23/2024]
Abstract
Developing new drugs is too expensive and time -consuming. Accurately predicting the interaction between drugs and targets will likely change how the drug is discovered. Machine learning-based protein-ligand interaction prediction has demonstrated significant potential. In this paper, computational methods, focusing on sequence and structure to study protein-ligand interactions, are examined. Therefore, this paper starts by presenting an overview of the data sets applied in this area, as well as the various approaches applied for representing proteins and ligands. Then, sequence-based and structure-based classification criteria are subsequently utilized to categorize and summarize both the classical machine learning models and deep learning models employed in protein-ligand interaction studies. Moreover, the evaluation methods and interpretability of these models are proposed. Furthermore, delving into the diverse applications of protein-ligand interaction models in drug research is presented. Lastly, the current challenges and future directions in this field are addressed.
Collapse
Affiliation(s)
- Yunjiang Zhang
- Beijing Key Laboratory for Green Catalysis and Separation, The Faculty of Environment and Life, Beijing University of Technology, Beijing 100124, P. R. China
| | - Shuyuan Li
- Beijing Key Laboratory for Green Catalysis and Separation, The Faculty of Environment and Life, Beijing University of Technology, Beijing 100124, P. R. China
| | - Kong Meng
- Beijing Key Laboratory for Green Catalysis and Separation, The Faculty of Environment and Life, Beijing University of Technology, Beijing 100124, P. R. China
| | - Shaorui Sun
- Beijing Key Laboratory for Green Catalysis and Separation, The Faculty of Environment and Life, Beijing University of Technology, Beijing 100124, P. R. China
| |
Collapse
|
18
|
Wang Z, Brand R, Adolf-Bryfogle J, Grewal J, Qi Y, Combs SA, Golovach N, Alford R, Rangwala H, Clark PM. EGGNet, a Generalizable Geometric Deep Learning Framework for Protein Complex Pose Scoring. ACS OMEGA 2024; 9:7471-7479. [PMID: 38405499 PMCID: PMC10882658 DOI: 10.1021/acsomega.3c04889] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/21/2023] [Revised: 01/19/2024] [Accepted: 01/23/2024] [Indexed: 02/27/2024]
Abstract
Computational prediction of molecule-protein interactions has been key for developing new molecules to interact with a target protein for therapeutics development. Previous work includes two independent streams of approaches: (1) predicting protein-protein interactions (PPIs) between naturally occurring proteins and (2) predicting binding affinities between proteins and small-molecule ligands [also known as drug-target interaction (DTI)]. Studying the two problems in isolation has limited the ability of these computational models to generalize across the PPI and DTI tasks, both of which ultimately involve noncovalent interactions with a protein target. In this work, we developed Equivariant Graph of Graphs neural Network (EGGNet), a geometric deep learning (GDL) framework, for molecule-protein binding predictions that can handle three types of molecules for interacting with a target protein: (1) small molecules, (2) synthetic peptides, and (3) natural proteins. EGGNet leverages a graph of graphs (GoG) representation constructed from the molecular structures at atomic resolution and utilizes a multiresolution equivariant graph neural network to learn from such representations. In addition, EGGNet leverages the underlying biophysics and makes use of both atom- and residue-level interactions, which improve EGGNet's ability to rank candidate poses from blind docking. EGGNet achieves competitive performance on both a public protein-small-molecule binding affinity prediction task (80.2% top 1 success rate on CASF-2016) and a synthetic protein interface prediction task (88.4% area under the precision-recall curve). We envision that the proposed GDL framework can generalize to many other protein interaction prediction problems, such as binding site prediction and molecular docking, helping accelerate protein engineering and structure-based drug development.
Collapse
Affiliation(s)
- Zichen Wang
- Amazon
Web Services, Amazon, Seattle, Washington 98109-5210, United
States
| | - Ryan Brand
- Amazon
Web Services, Amazon, Seattle, Washington 98109-5210, United
States
| | - Jared Adolf-Bryfogle
- Janssen
Biotherapeutics, Janssen Pharmaceutical
Companies of Johnson & Johnson, Spring House, Titusville, New Jersey 08560-1504, United States
| | - Jasleen Grewal
- Amazon
Web Services, Amazon, Seattle, Washington 98109-5210, United
States
| | - Yanjun Qi
- Amazon
Web Services, Amazon, Seattle, Washington 98109-5210, United
States
| | - Steven A. Combs
- Janssen
Biotherapeutics, Janssen Pharmaceutical
Companies of Johnson & Johnson, Spring House, Titusville, New Jersey 08560-1504, United States
| | - Nataliya Golovach
- Janssen
Biotherapeutics, Janssen Pharmaceutical
Companies of Johnson & Johnson, Spring House, Titusville, New Jersey 08560-1504, United States
| | - Rebecca Alford
- Janssen
Biotherapeutics, Janssen Pharmaceutical
Companies of Johnson & Johnson, Spring House, Titusville, New Jersey 08560-1504, United States
| | - Huzefa Rangwala
- Amazon
Web Services, Amazon, Seattle, Washington 98109-5210, United
States
| | - Peter M. Clark
- Janssen
Biotherapeutics, Janssen Pharmaceutical
Companies of Johnson & Johnson, Spring House, Titusville, New Jersey 08560-1504, United States
| |
Collapse
|
19
|
Dawson JRD, Wadman GM, Zhang P, Tebben A, Carter PH, Gu S, Shroka T, Borrega-Roman L, Salanga CL, Handel TM, Kufareva I. Molecular determinants of antagonist interactions with chemokine receptors CCR2 and CCR5. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.11.15.567150. [PMID: 38014122 PMCID: PMC10680698 DOI: 10.1101/2023.11.15.567150] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/29/2023]
Abstract
By driving monocyte chemotaxis, the chemokine receptor CCR2 shapes inflammatory responses and the formation of tumor microenvironments. This makes it a promising target in inflammation and immuno-oncology; however, despite extensive efforts, there are no FDA-approved CCR2-targeting therapeutics. Cited challenges include the redundancy of the chemokine system, suboptimal properties of compound candidates, and species differences that confound the translation of results from animals to humans. Structure-based drug design can rationalize and accelerate the discovery and optimization of CCR2 antagonists to address these challenges. The prerequisites for such efforts include an atomic-level understanding of the molecular determinants of action of existing antagonists. In this study, using molecular docking and artificial-intelligence-powered compound library screening, we uncover the structural principles of small molecule antagonism and selectivity towards CCR2 and its sister receptor CCR5. CCR2 orthosteric inhibitors are shown to universally occupy an inactive-state-specific tunnel between receptor helices 1 and 7; we also discover an unexpected role for an extra-helical groove accessible through this tunnel, suggesting its potential as a new targetable interface for CCR2 and CCR5 modulation. By contrast, only shape complementarity and limited helix 8 hydrogen bonding govern the binding of various chemotypes of allosteric antagonists. CCR2 residues S1012.63 and V2446.36 are implicated as determinants of CCR2/CCR5 and human/mouse orthosteric and allosteric antagonist selectivity, respectively, and the role of S1012.63 is corroborated through experimental gain-of-function mutagenesis. We establish a critical role of induced fit in antagonist recognition, reveal strong chemotype selectivity of existing structures, and demonstrate the high predictive potential of a new deep-learning-based compound scoring function. Finally, this study expands the available CCR2 structural landscape with computationally generated chemotype-specific models well-suited for structure-based antagonist design.
Collapse
Affiliation(s)
- John R D Dawson
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, CA, USA
| | - Grant M Wadman
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, CA, USA
| | | | | | - Percy H Carter
- Bristol Myers Squibb Company, Princeton, NJ, USA
- (current affiliation) Blueprint Medicines, Cambridge, MA, USA
| | - Siyi Gu
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, CA, USA
- (current affiliation) Lycia Therapeutics, South San Francisco, CA
| | - Thomas Shroka
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, CA, USA
- (current affiliation) Avidity Biosciences Inc., San Diego, CA
| | - Leire Borrega-Roman
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, CA, USA
| | - Catherina L Salanga
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, CA, USA
| | - Tracy M Handel
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, CA, USA
| | - Irina Kufareva
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, CA, USA
| |
Collapse
|
20
|
Chen D, Liu J, Wei GW. TopoFormer: Multiscale Topology-enabled Structure-to-Sequence Transformer for Protein-Ligand Interaction Predictions. RESEARCH SQUARE 2024:rs.3.rs-3640878. [PMID: 38405777 PMCID: PMC10889053 DOI: 10.21203/rs.3.rs-3640878/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/27/2024]
Abstract
Pre-trained deep Transformers have had tremendous success in a wide variety of disciplines. However, in computational biology, essentially all Transformers are built upon the biological sequences, which ignores vital stereochemical information and may result in crucial errors in downstream predictions. On the other hand, three-dimensional (3D) molecular structures are incompatible with the sequential architecture of Transformer and natural language processing (NLP) models in general. This work addresses this foundational challenge by a topological Transformer (TopoFormer). TopoFormer is built by integrating NLP and a multiscale topology techniques, the persistent topological hyperdigraph Laplacian (PTHL), which systematically converts intricate 3D protein-ligand complexes at various spatial scales into a NLP-admissible sequence of topological invariants and homotopic shapes. Element-specific PTHLs are further developed to embed crucial physical, chemical, and biological interactions into topological sequences. TopoFormer surges ahead of conventional algorithms and recent deep learning variants and gives rise to exemplary scoring accuracy and superior performance in ranking, docking, and screening tasks in a number of benchmark datasets. The proposed topological sequences can be extracted from all kinds of structural data in data science to facilitate various NLP models, heralding a new era in AI-driven discovery.
Collapse
Affiliation(s)
- Dong Chen
- Department of Mathematics, Michigan State University, MI, 48824, USA
| | - Jian Liu
- Department of Mathematics, Michigan State University, MI, 48824, USA
- Mathematical Science Research Center, Chongqing University of Technology, Chongqing 400054, China
| | - Guo-Wei Wei
- Department of Mathematics, Michigan State University, MI, 48824, USA
- Department of Electrical and Computer Engineering, Michigan State University, MI 48824, USA
- Department of Biochemistry and Molecular Biology, Michigan State University, MI 48824, USA
| |
Collapse
|
21
|
Isert C, Atz K, Riniker S, Schneider G. Exploring protein-ligand binding affinity prediction with electron density-based geometric deep learning. RSC Adv 2024; 14:4492-4502. [PMID: 38312732 PMCID: PMC10835705 DOI: 10.1039/d3ra08650j] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2023] [Accepted: 01/19/2024] [Indexed: 02/06/2024] Open
Abstract
Rational structure-based drug design relies on accurate predictions of protein-ligand binding affinity from structural molecular information. Although deep learning-based methods for predicting binding affinity have shown promise in computational drug design, certain approaches have faced criticism for their potential to inadequately capture the fundamental physical interactions between ligands and their macromolecular targets or for being susceptible to dataset biases. Herein, we propose to include bond-critical points based on the electron density of a protein-ligand complex as a fundamental physical representation of protein-ligand interactions. Employing a geometric deep learning model, we explore the usefulness of these bond-critical points to predict absolute binding affinities of protein-ligand complexes, benchmark model performance against existing methods, and provide a critical analysis of this new approach. The models achieved root-mean-squared errors of 1.4-1.8 log units on the PDBbind dataset, and 1.0-1.7 log units on the PDE10A dataset, not indicating significant advantages over benchmark methods, and thus rendering the utility of electron density for deep learning models context-dependent. The relationship between intermolecular electron density and corresponding binding affinity was analyzed, and Pearson correlation coefficients r > 0.7 were obtained for several macromolecular targets.
Collapse
Affiliation(s)
- Clemens Isert
- ETH Zurich, Department of Chemistry and Applied Biosciences Vladimir-Prelog-Weg 4 8093 Zurich Switzerland +41 44 633 73 27
| | - Kenneth Atz
- ETH Zurich, Department of Chemistry and Applied Biosciences Vladimir-Prelog-Weg 4 8093 Zurich Switzerland +41 44 633 73 27
| | - Sereina Riniker
- ETH Zurich, Department of Chemistry and Applied Biosciences Vladimir-Prelog-Weg 4 8093 Zurich Switzerland +41 44 633 73 27
| | - Gisbert Schneider
- ETH Zurich, Department of Chemistry and Applied Biosciences Vladimir-Prelog-Weg 4 8093 Zurich Switzerland +41 44 633 73 27
| |
Collapse
|
22
|
Cai H, Shen C, Jian T, Zhang X, Chen T, Han X, Yang Z, Dang W, Hsieh CY, Kang Y, Pan P, Ji X, Song J, Hou T, Deng Y. CarsiDock: a deep learning paradigm for accurate protein-ligand docking and screening based on large-scale pre-training. Chem Sci 2024; 15:1449-1471. [PMID: 38274053 PMCID: PMC10806797 DOI: 10.1039/d3sc05552c] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2023] [Accepted: 12/18/2023] [Indexed: 01/27/2024] Open
Abstract
The expertise accumulated in deep neural network-based structure prediction has been widely transferred to the field of protein-ligand binding pose prediction, thus leading to the emergence of a variety of deep learning-guided docking models for predicting protein-ligand binding poses without relying on heavy sampling. However, their prediction accuracy and applicability are still far from satisfactory, partially due to the lack of protein-ligand binding complex data. To this end, we create a large-scale complex dataset containing ∼9 M protein-ligand docking complexes for pre-training, and propose CarsiDock, the first deep learning-guided docking approach that leverages pre-training of millions of predicted protein-ligand complexes. CarsiDock contains two main stages, i.e., a deep learning model for the prediction of protein-ligand atomic distance matrices, and a translation, rotation and torsion-guided geometry optimization procedure to reconstruct the matrices into a credible binding pose. The pre-training and multiple innovative architectural designs facilitate the dramatically improved docking accuracy of our approach over the baselines in terms of multiple docking scenarios, thereby contributing to its outstanding early recognition performance in several retrospective virtual screening campaigns. Further explorations demonstrate that CarsiDock can not only guarantee the topological reliability of the binding poses but also successfully reproduce the crucial interactions in crystalized structures, highlighting its superior applicability.
Collapse
Affiliation(s)
- Heng Cai
- Hangzhou Carbonsilicon AI Technology Co., Ltd Hangzhou 310018 Zhejiang China
| | - Chao Shen
- Hangzhou Carbonsilicon AI Technology Co., Ltd Hangzhou 310018 Zhejiang China
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University Hangzhou 310058 Zhejiang China
| | - Tianye Jian
- Hangzhou Carbonsilicon AI Technology Co., Ltd Hangzhou 310018 Zhejiang China
| | - Xujun Zhang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University Hangzhou 310058 Zhejiang China
| | - Tong Chen
- Hangzhou Carbonsilicon AI Technology Co., Ltd Hangzhou 310018 Zhejiang China
| | - Xiaoqi Han
- Hangzhou Carbonsilicon AI Technology Co., Ltd Hangzhou 310018 Zhejiang China
| | - Zhuo Yang
- Hangzhou Carbonsilicon AI Technology Co., Ltd Hangzhou 310018 Zhejiang China
| | - Wei Dang
- Hangzhou Carbonsilicon AI Technology Co., Ltd Hangzhou 310018 Zhejiang China
| | - Chang-Yu Hsieh
- Hangzhou Carbonsilicon AI Technology Co., Ltd Hangzhou 310018 Zhejiang China
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University Hangzhou 310058 Zhejiang China
| | - Yu Kang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University Hangzhou 310058 Zhejiang China
| | - Peichen Pan
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University Hangzhou 310058 Zhejiang China
| | - Xiangyang Ji
- Department of Automation, Tsinghua University Beijing 100084 China
| | - Jianfei Song
- Hangzhou Carbonsilicon AI Technology Co., Ltd Hangzhou 310018 Zhejiang China
| | - Tingjun Hou
- Hangzhou Carbonsilicon AI Technology Co., Ltd Hangzhou 310018 Zhejiang China
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University Hangzhou 310058 Zhejiang China
| | - Yafeng Deng
- Hangzhou Carbonsilicon AI Technology Co., Ltd Hangzhou 310018 Zhejiang China
| |
Collapse
|
23
|
Wang H. Prediction of protein-ligand binding affinity via deep learning models. Brief Bioinform 2024; 25:bbae081. [PMID: 38446737 PMCID: PMC10939342 DOI: 10.1093/bib/bbae081] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2023] [Revised: 01/31/2024] [Indexed: 03/08/2024] Open
Abstract
Accurately predicting the binding affinity between proteins and ligands is crucial in drug screening and optimization, but it is still a challenge in computer-aided drug design. The recent success of AlphaFold2 in predicting protein structures has brought new hope for deep learning (DL) models to accurately predict protein-ligand binding affinity. However, the current DL models still face limitations due to the low-quality database, inaccurate input representation and inappropriate model architecture. In this work, we review the computational methods, specifically DL-based models, used to predict protein-ligand binding affinity. We start with a brief introduction to protein-ligand binding affinity and the traditional computational methods used to calculate them. We then introduce the basic principles of DL models for predicting protein-ligand binding affinity. Next, we review the commonly used databases, input representations and DL models in this field. Finally, we discuss the potential challenges and future work in accurately predicting protein-ligand binding affinity via DL models.
Collapse
Affiliation(s)
- Huiwen Wang
- School of Physics and Engineering, Henan University of Science and Technology, Luoyang 471023, China
| |
Collapse
|
24
|
Verburgt J, Jain A, Kihara D. Recent Deep Learning Applications to Structure-Based Drug Design. Methods Mol Biol 2024; 2714:215-234. [PMID: 37676602 PMCID: PMC10578466 DOI: 10.1007/978-1-0716-3441-7_13] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/08/2023]
Abstract
Identification and optimization of small molecules that bind to and modulate protein function is a crucial step in the early stages of drug development. For decades, this process has benefitted greatly from the use of computational models that can provide insights into molecular binding affinity and optimization. Over the past several years, various types of deep learning models have shown great potential in improving and enhancing the performance of traditional computational methods. In this chapter, we provide an overview of recent deep learning-based developments with applications in drug discovery. We classify these methods into four subcategories dependent on the task each method is aiming to solve. For each subcategory, we provide the general framework of the approach and discuss individual methods.
Collapse
Affiliation(s)
- Jacob Verburgt
- Department of Biological Sciences, Purdue University, West Lafayette, IN, USA
| | - Anika Jain
- Department of Biological Sciences, Purdue University, West Lafayette, IN, USA
| | - Daisuke Kihara
- Department of Biological Sciences, Purdue University, West Lafayette, IN, USA.
- Department of Computer Science, Purdue University, West Lafayette, IN, USA.
| |
Collapse
|
25
|
Li Y, Fan Z, Rao J, Chen Z, Chu Q, Zheng M, Li X. An overview of recent advances and challenges in predicting compound-protein interaction (CPI). MEDICAL REVIEW (2021) 2023; 3:465-486. [PMID: 38282802 PMCID: PMC10808869 DOI: 10.1515/mr-2023-0030] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/18/2023] [Accepted: 08/30/2023] [Indexed: 01/30/2024]
Abstract
Compound-protein interactions (CPIs) are critical in drug discovery for identifying therapeutic targets, drug side effects, and repurposing existing drugs. Machine learning (ML) algorithms have emerged as powerful tools for CPI prediction, offering notable advantages in cost-effectiveness and efficiency. This review provides an overview of recent advances in both structure-based and non-structure-based CPI prediction ML models, highlighting their performance and achievements. It also offers insights into CPI prediction-related datasets and evaluation benchmarks. Lastly, the article presents a comprehensive assessment of the current landscape of CPI prediction, elucidating the challenges faced and outlining emerging trends to advance the field.
Collapse
Affiliation(s)
- Yanbei Li
- School of Pharmaceutical Science and Technology, Hangzhou Institute for Advanced Study, UCAS, Hangzhou, Zhejiang Province, China
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Zhehuan Fan
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Jingxin Rao
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Zhiyi Chen
- School of Pharmaceutical Science and Technology, Hangzhou Institute for Advanced Study, UCAS, Hangzhou, Zhejiang Province, China
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Qinyu Chu
- School of Pharmaceutical Science and Technology, Hangzhou Institute for Advanced Study, UCAS, Hangzhou, Zhejiang Province, China
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Mingyue Zheng
- School of Pharmaceutical Science and Technology, Hangzhou Institute for Advanced Study, UCAS, Hangzhou, Zhejiang Province, China
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Xutong Li
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
- University of Chinese Academy of Sciences, Beijing, China
| |
Collapse
|
26
|
Dong T, Yang Z, Zhou J, Chen CYC. Equivariant Flexible Modeling of the Protein-Ligand Binding Pose with Geometric Deep Learning. J Chem Theory Comput 2023; 19:8446-8459. [PMID: 37938978 DOI: 10.1021/acs.jctc.3c00273] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2023]
Abstract
Flexible modeling of the protein-ligand complex structure is a fundamental challenge for in silico drug development. Recent studies have improved commonly used docking tools by incorporating extra-deep learning-based steps. However, such strategies limit their accuracy and efficiency because they retain massive sampling pressure and lack consideration for flexible biomolecular changes. In this study, we propose FlexPose, a geometric graph network capable of direct flexible modeling of complex structures in Euclidean space without the following conventional sampling and scoring strategies. Our model adopts two key designs: scalar-vector dual feature representation and SE(3)-equivariant network, to manage dynamic structural changes, as well as two strategies: conformation-aware pretraining and weakly supervised learning, to boost model generalizability in unseen chemical space. Benefiting from these paradigms, our model dramatically outperforms all tested popular docking tools and recently advanced deep learning methods, especially in tasks involving protein conformation changes. We further investigate the impact of protein and ligand similarity on the model performance with two conformation-aware strategies. Moreover, FlexPose provides an affinity estimation and model confidence for postanalysis.
Collapse
Affiliation(s)
- Tiejun Dong
- Intelligent Medical Research Center, School of Intelligent Systems Engineering, Sun Yat-sen University, Shenzhen, Guangdong 510275, China
| | - Ziduo Yang
- Intelligent Medical Research Center, School of Intelligent Systems Engineering, Sun Yat-sen University, Shenzhen, Guangdong 510275, China
| | - Jun Zhou
- Intelligent Medical Research Center, School of Intelligent Systems Engineering, Sun Yat-sen University, Shenzhen, Guangdong 510275, China
| | - Calvin Yu-Chian Chen
- Intelligent Medical Research Center, School of Intelligent Systems Engineering, Sun Yat-sen University, Shenzhen, Guangdong 510275, China
- AI for Science (AI4S)-Preferred Program, Peking University Shenzhen Graduate School, Shenzhen, Guangdong 518055, China
- School of Electronic and Computer Engineering, Peking University Shenzhen Graduate School, Shenzhen, Guangdong 518055, China
- Department of Medical Research, China Medical University Hospital, Taichung 40447, Taiwan
- Department of Bioinformatics and Medical Engineering, Asia University, Taichung 41354, Taiwan
| |
Collapse
|
27
|
Yan J, Ye Z, Yang Z, Lu C, Zhang S, Liu Q, Qiu J. Multi-task bioassay pre-training for protein-ligand binding affinity prediction. Brief Bioinform 2023; 25:bbad451. [PMID: 38084920 PMCID: PMC10783875 DOI: 10.1093/bib/bbad451] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2023] [Revised: 10/27/2023] [Accepted: 11/15/2023] [Indexed: 12/18/2023] Open
Abstract
Protein-ligand binding affinity (PLBA) prediction is the fundamental task in drug discovery. Recently, various deep learning-based models predict binding affinity by incorporating the three-dimensional (3D) structure of protein-ligand complexes as input and achieving astounding progress. However, due to the scarcity of high-quality training data, the generalization ability of current models is still limited. Although there is a vast amount of affinity data available in large-scale databases such as ChEMBL, issues such as inconsistent affinity measurement labels (i.e. IC50, Ki, Kd), different experimental conditions, and the lack of available 3D binding structures complicate the development of high-precision affinity prediction models using these data. To address these issues, we (i) propose Multi-task Bioassay Pre-training (MBP), a pre-training framework for structure-based PLBA prediction; (ii) construct a pre-training dataset called ChEMBL-Dock with more than 300k experimentally measured affinity labels and about 2.8M docked 3D structures. By introducing multi-task pre-training to treat the prediction of different affinity labels as different tasks and classifying relative rankings between samples from the same bioassay, MBP learns robust and transferrable structural knowledge from our new ChEMBL-Dock dataset with varied and noisy labels. Experiments substantiate the capability of MBP on the structure-based PLBA prediction task. To the best of our knowledge, MBP is the first affinity pre-training model and shows great potential for future development. MBP web-server is now available for free at: https://huggingface.co/spaces/jiaxianustc/mbp.
Collapse
Affiliation(s)
- Jiaxian Yan
- Anhui Province Key Lab of Big Data Analysis and Application, University of Science and Technology of China, JinZhai Road, 230026, Anhui, China
| | - Zhaofeng Ye
- Tencent Quantum Laboratory, Tencent, Shennan Road, 518057, Guangdong, China
| | - Ziyi Yang
- Tencent Quantum Laboratory, Tencent, Shennan Road, 518057, Guangdong, China
| | - Chengqiang Lu
- Anhui Province Key Lab of Big Data Analysis and Application, University of Science and Technology of China, JinZhai Road, 230026, Anhui, China
| | - Shengyu Zhang
- Tencent Quantum Laboratory, Tencent, Shennan Road, 518057, Guangdong, China
| | - Qi Liu
- Anhui Province Key Lab of Big Data Analysis and Application, University of Science and Technology of China, JinZhai Road, 230026, Anhui, China
| | - Jiezhong Qiu
- Tencent Quantum Laboratory, Tencent, Shennan Road, 518057, Guangdong, China
| |
Collapse
|
28
|
Nguyen NQ, Park S, Gim M, Kang J. MulinforCPI: enhancing precision of compound-protein interaction prediction through novel perspectives on multi-level information integration. Brief Bioinform 2023; 25:bbad484. [PMID: 38180829 PMCID: PMC10768804 DOI: 10.1093/bib/bbad484] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2023] [Revised: 11/15/2023] [Accepted: 12/05/2023] [Indexed: 01/07/2024] Open
Abstract
Forecasting the interaction between compounds and proteins is crucial for discovering new drugs. However, previous sequence-based studies have not utilized three-dimensional (3D) information on compounds and proteins, such as atom coordinates and distance matrices, to predict binding affinity. Furthermore, numerous widely adopted computational techniques have relied on sequences of amino acid characters for protein representations. This approach may constrain the model's ability to capture meaningful biochemical features, impeding a more comprehensive understanding of the underlying proteins. Here, we propose a two-step deep learning strategy named MulinforCPI that incorporates transfer learning techniques with multi-level resolution features to overcome these limitations. Our approach leverages 3D information from both proteins and compounds and acquires a profound understanding of the atomic-level features of proteins. Besides, our research highlights the divide between first-principle and data-driven methods, offering new research prospects for compound-protein interaction tasks. We applied the proposed method to six datasets: Davis, Metz, KIBA, CASF-2016, DUD-E and BindingDB, to evaluate the effectiveness of our approach.
Collapse
Affiliation(s)
- Ngoc-Quang Nguyen
- Department of Computer Science and Engineering, Korea University, 02841, Seoul, Korea
| | - Sejeong Park
- Department of Computer Science and Engineering, Korea University, 02841, Seoul, Korea
- AIGEN Sciences, 04778, Seoul, Korea
| | - Mogan Gim
- Department of Computer Science and Engineering, Korea University, 02841, Seoul, Korea
| | - Jaewoo Kang
- Department of Computer Science and Engineering, Korea University, 02841, Seoul, Korea
- Interdisciplinary Graduate Program in Bioinformatics, Korea University, 02841, Seoul, Korea
- AIGEN Sciences, 04778, Seoul, Korea
| |
Collapse
|
29
|
Xia S, Chen E, Zhang Y. Integrated Molecular Modeling and Machine Learning for Drug Design. J Chem Theory Comput 2023; 19:7478-7495. [PMID: 37883810 PMCID: PMC10653122 DOI: 10.1021/acs.jctc.3c00814] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2023] [Revised: 10/10/2023] [Accepted: 10/11/2023] [Indexed: 10/28/2023]
Abstract
Modern therapeutic development often involves several stages that are interconnected, and multiple iterations are usually required to bring a new drug to the market. Computational approaches have increasingly become an indispensable part of helping reduce the time and cost of the research and development of new drugs. In this Perspective, we summarize our recent efforts on integrating molecular modeling and machine learning to develop computational tools for modulator design, including a pocket-guided rational design approach based on AlphaSpace to target protein-protein interactions, delta machine learning scoring functions for protein-ligand docking as well as virtual screening, and state-of-the-art deep learning models to predict calculated and experimental molecular properties based on molecular mechanics optimized geometries. Meanwhile, we discuss remaining challenges and promising directions for further development and use a retrospective example of FDA approved kinase inhibitor Erlotinib to demonstrate the use of these newly developed computational tools.
Collapse
Affiliation(s)
- Song Xia
- Department
of Chemistry, New York University, New York, New York 10003, United States
| | - Eric Chen
- Department
of Chemistry, New York University, New York, New York 10003, United States
| | - Yingkai Zhang
- Department
of Chemistry, New York University, New York, New York 10003, United States
- Simons
Center for Computational Physical Chemistry at New York University, New York, New York 10003, United States
- NYU-ECNU
Center for Computational Chemistry at NYU Shanghai, Shanghai 200062, China
| |
Collapse
|
30
|
Libouban PY, Aci-Sèche S, Gómez-Tamayo JC, Tresadern G, Bonnet P. The Impact of Data on Structure-Based Binding Affinity Predictions Using Deep Neural Networks. Int J Mol Sci 2023; 24:16120. [PMID: 38003312 PMCID: PMC10671244 DOI: 10.3390/ijms242216120] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2023] [Revised: 10/30/2023] [Accepted: 11/01/2023] [Indexed: 11/26/2023] Open
Abstract
Artificial intelligence (AI) has gained significant traction in the field of drug discovery, with deep learning (DL) algorithms playing a crucial role in predicting protein-ligand binding affinities. Despite advancements in neural network architectures, system representation, and training techniques, the performance of DL affinity prediction has reached a plateau, prompting the question of whether it is truly solved or if the current performance is overly optimistic and reliant on biased, easily predictable data. Like other DL-related problems, this issue seems to stem from the training and test sets used when building the models. In this work, we investigate the impact of several parameters related to the input data on the performance of neural network affinity prediction models. Notably, we identify the size of the binding pocket as a critical factor influencing the performance of our statistical models; furthermore, it is more important to train a model with as much data as possible than to restrict the training to only high-quality datasets. Finally, we also confirm the bias in the typically used current test sets. Therefore, several types of evaluation and benchmarking are required to understand models' decision-making processes and accurately compare the performance of models.
Collapse
Affiliation(s)
- Pierre-Yves Libouban
- Institute of Organic and Analytical Chemistry (ICOA), UMR7311, Université d’Orléans, CNRS, Pôle de Chimie rue de Chartres, 45067 Orléans, CEDEX 2, France; (P.-Y.L.); (S.A.-S.)
| | - Samia Aci-Sèche
- Institute of Organic and Analytical Chemistry (ICOA), UMR7311, Université d’Orléans, CNRS, Pôle de Chimie rue de Chartres, 45067 Orléans, CEDEX 2, France; (P.-Y.L.); (S.A.-S.)
| | - Jose Carlos Gómez-Tamayo
- Computational Chemistry, Janssen Research & Development, Janssen Pharmaceutica N. V., B-2340 Beerse, Belgium; (J.C.G.-T.); (G.T.)
| | - Gary Tresadern
- Computational Chemistry, Janssen Research & Development, Janssen Pharmaceutica N. V., B-2340 Beerse, Belgium; (J.C.G.-T.); (G.T.)
| | - Pascal Bonnet
- Institute of Organic and Analytical Chemistry (ICOA), UMR7311, Université d’Orléans, CNRS, Pôle de Chimie rue de Chartres, 45067 Orléans, CEDEX 2, France; (P.-Y.L.); (S.A.-S.)
| |
Collapse
|
31
|
Gu S, Liu H, Liu L, Hou T, Kang Y. Artificial intelligence methods in kinase target profiling: Advances and challenges. Drug Discov Today 2023; 28:103796. [PMID: 37805065 DOI: 10.1016/j.drudis.2023.103796] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2023] [Revised: 09/29/2023] [Accepted: 10/03/2023] [Indexed: 10/09/2023]
Abstract
Kinases have a crucial role in regulating almost the full range of cellular processes, making them essential targets for therapeutic interventions against various diseases. Accurate kinase-profiling prediction is vital for addressing the selectivity/specificity challenges in kinase drug discovery, which is closely related to lead optimization, drug repurposing, and the understanding of potential drug side effects. In this review, we provide an overview of the latest advancements in machine learning (ML)-based and deep learning (DL)-based quantitative structure-activity relationship (QSAR) models for kinase profiling. We highlight current trends in this rapidly evolving field and discuss the existing challenges and future directions regarding experimental data set construction and model architecture design. Our aim is to offer practical insights and guidance for the development and utilization of these approaches.
Collapse
Affiliation(s)
- Shukai Gu
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Huanxiang Liu
- Faculty of Applied Science, Macao Polytechnic University, Macao 999078
| | - Liwei Liu
- Advanced Computing and Storage Laboratory, Central Research Institute, 2012 Laboratories, Huawei Technologies Co. Ltd, Nanjing 210000, Jiangsu, China
| | - Tingjun Hou
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China.
| | - Yu Kang
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China.
| |
Collapse
|
32
|
Zhao X, Li H, Zhang K, Huang SY. Iterative Knowledge-Based Scoring Function for Protein-Ligand Interactions by Considering Binding Affinity Information. J Phys Chem B 2023; 127:9021-9034. [PMID: 37822259 DOI: 10.1021/acs.jpcb.3c04421] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/13/2023]
Abstract
Scoring functions for protein-ligand interactions play a critical role in structure-based drug design. Owing to the good balance between general applicability and computational efficiency, knowledge-based scoring functions have obtained significant advancements and achieved many successes. Nevertheless, knowledge-based scoring functions face a challenge in utilizing the experimental affinity data and thus may not perform well in binding affinity prediction. Addressing the challenge, we have proposed an improved version of the iterative knowledge-based scoring function ITScore by considering binding affinity information, which is referred to as ITScoreAff, based on a large training set of 6216 protein-ligand complexes with both structures and affinity data. ITScoreAff was extensively evaluated and compared with ITScore, 33 traditional, and 6 machine learning scoring functions in terms of docking power, ranking power, and screening power on the independent CASF-2016 benchmark. It was shown that ITScoreAff obtained an overall better performance than the other 40 scoring functions and gave an average success rate of 85.3% in docking power, a correlation coefficient of 0.723 in scoring power, and an average rank correlation coefficient of 0.668 in ranking power. In addition, ITScoreAff also achieved the overall best screening power when the top 10% of the ranked database were considered. These results demonstrated the robustness of ITScoreAff and its improvement over existing scoring functions.
Collapse
Affiliation(s)
- Xuejun Zhao
- School of Physics, Huazhong University of Science and Technology, Wuhan, Hubei 430074, P. R. China
| | - Hao Li
- School of Physics, Huazhong University of Science and Technology, Wuhan, Hubei 430074, P. R. China
| | - Keqiong Zhang
- School of Physics, Huazhong University of Science and Technology, Wuhan, Hubei 430074, P. R. China
| | - Sheng-You Huang
- School of Physics, Huazhong University of Science and Technology, Wuhan, Hubei 430074, P. R. China
| |
Collapse
|
33
|
Shiota K, Akutsu T. Multi-shelled ECIF: improved extended connectivity interaction features for accurate binding affinity prediction. BIOINFORMATICS ADVANCES 2023; 3:vbad155. [PMID: 37928345 PMCID: PMC10625475 DOI: 10.1093/bioadv/vbad155] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/01/2023] [Revised: 09/20/2023] [Accepted: 10/19/2023] [Indexed: 11/07/2023]
Abstract
Motivation Extended connectivity interaction features (ECIF) is a method developed to predict protein-ligand binding affinity, allowing for detailed atomic representation. It performed very well in terms of Comparative Assessment of Scoring Functions 2016 (CASF-2016) scoring power. However, ECIF has the limitation of not being able to adequately account for interatomic distances. Results To investigate what kind of distance representation is effective for P-L binding affinity prediction, we have developed two algorithms that improved ECIF's feature extraction method to take distance into account. One is multi-shelled ECIF, which takes into account the distance between atoms by dividing the distance between atoms into multiple layers. The other is weighted ECIF, which weights the importance of interactions according to the distance between atoms. A comparison of these two methods shows that multi-shelled ECIF outperforms weighted ECIF and the original ECIF, achieving a CASF-2016 scoring power Pearson correlation coefficient of 0.877. Availability and implementation All the codes and data are available on GitHub (https://github.com/koji11235/MSECIFv2).
Collapse
Affiliation(s)
- Koji Shiota
- Department of Intelligence Science and Technology, Graduate School of Informatics, Kyoto University, Kyoto, Kyoto 606-8501, Japan
| | - Tatsuya Akutsu
- Department of Intelligence Science and Technology, Graduate School of Informatics, Kyoto University, Kyoto, Kyoto 606-8501, Japan
| |
Collapse
|
34
|
Yu J, Li Z, Chen G, Kong X, Hu J, Wang D, Cao D, Li Y, Huo R, Wang G, Liu X, Jiang H, Li X, Luo X, Zheng M. Computing the relative binding affinity of ligands based on a pairwise binding comparison network. NATURE COMPUTATIONAL SCIENCE 2023; 3:860-872. [PMID: 38177766 PMCID: PMC10766524 DOI: 10.1038/s43588-023-00529-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/05/2023] [Accepted: 09/05/2023] [Indexed: 01/06/2024]
Abstract
Structure-based lead optimization is an open challenge in drug discovery, which is still largely driven by hypotheses and depends on the experience of medicinal chemists. Here we propose a pairwise binding comparison network (PBCNet) based on a physics-informed graph attention mechanism, specifically tailored for ranking the relative binding affinity among congeneric ligands. Benchmarking on two held-out sets (provided by Schrödinger and Merck) containing over 460 ligands and 16 targets, PBCNet demonstrated substantial advantages in terms of both prediction accuracy and computational efficiency. Equipped with a fine-tuning operation, the performance of PBCNet reaches that of Schrödinger's FEP+, which is much more computationally intensive and requires substantial expert intervention. A further simulation-based experiment showed that active learning-optimized PBCNet may accelerate lead optimization campaigns by 473%. Finally, for the convenience of users, a web service for PBCNet is established to facilitate complex relative binding affinity prediction through an easy-to-operate graphical interface.
Collapse
Affiliation(s)
- Jie Yu
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
- School of Information Science and Technology, Shanghai Tech University, Shanghai, China
- Lingang Laboratory, Shanghai, China
| | - Zhaojun Li
- College of Computer and Information Engineering, Dezhou University, Dezhou City, China
- Development Department, Suzhou Alphama Biotechnology Co., Ltd, Suzhou City, China
| | - Geng Chen
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
- University of Chinese Academy of Sciences, Beijing, China
- School of Pharmaceutical Science and Technology, Hangzhou Institute for Advanced Study, UCAS, Hangzhou, China
| | - Xiangtai Kong
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Jie Hu
- School of Chinese Materia Medica, Nanjing University of Chinese Medicine, Nanjing, Jiangsu, China
| | - Dingyan Wang
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
- Lingang Laboratory, Shanghai, China
| | - Duanhua Cao
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang, China
| | - Yanbei Li
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
- University of Chinese Academy of Sciences, Beijing, China
- School of Pharmaceutical Science and Technology, Hangzhou Institute for Advanced Study, UCAS, Hangzhou, China
| | - Ruifeng Huo
- School of Chinese Materia Medica, Nanjing University of Chinese Medicine, Nanjing, Jiangsu, China
| | - Gang Wang
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Xiaohong Liu
- Development Department, Suzhou Alphama Biotechnology Co., Ltd, Suzhou City, China
| | - Hualiang Jiang
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
- University of Chinese Academy of Sciences, Beijing, China
- School of Chinese Materia Medica, Nanjing University of Chinese Medicine, Nanjing, Jiangsu, China
| | - Xutong Li
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China.
- University of Chinese Academy of Sciences, Beijing, China.
| | - Xiaomin Luo
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China.
- University of Chinese Academy of Sciences, Beijing, China.
| | - Mingyue Zheng
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China.
- University of Chinese Academy of Sciences, Beijing, China.
- State Key Laboratory of Pharmaceutical Biotechnology, Nanjing University, Nanjing, Jiangsu, China.
| |
Collapse
|
35
|
Zhang O, Wang T, Weng G, Jiang D, Wang N, Wang X, Zhao H, Wu J, Wang E, Chen G, Deng Y, Pan P, Kang Y, Hsieh CY, Hou T. Learning on topological surface and geometric structure for 3D molecular generation. NATURE COMPUTATIONAL SCIENCE 2023; 3:849-859. [PMID: 38177756 DOI: 10.1038/s43588-023-00530-2] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/28/2023] [Accepted: 09/06/2023] [Indexed: 01/06/2024]
Abstract
Highly effective de novo design is a grand challenge of computer-aided drug discovery. Practical structure-specific three-dimensional molecule generations have started to emerge in recent years, but most approaches treat the target structure as a conditional input to bias the molecule generation and do not fully learn the detailed atomic interactions that govern the molecular conformation and stability of the binding complexes. The omission of these fine details leads to many models having difficulty in outputting reasonable molecules for a variety of therapeutic targets. Here, to address this challenge, we formulate a model, called SurfGen, that designs molecules in a fashion closely resembling the figurative key-and-lock principle. SurfGen comprises two equivariant neural networks, Geodesic-GNN and Geoatom-GNN, which capture the topological interactions on the pocket surface and the spatial interaction between ligand atoms and surface nodes, respectively. SurfGen outperforms other methods in a number of benchmarks, and its high sensitivity on the pocket structures enables an effective generative-model-based solution to the thorny issue of mutation-induced drug resistance.
Collapse
Affiliation(s)
- Odin Zhang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, China
| | - Tianyue Wang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, China
| | - Gaoqi Weng
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, China
| | - Dejun Jiang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, China
| | - Ning Wang
- Hangzhou Carbonsilicon AI Technology Co., Ltd, Hangzhou, China
| | - Xiaorui Wang
- Hangzhou Carbonsilicon AI Technology Co., Ltd, Hangzhou, China
| | - Huifeng Zhao
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, China
| | - Jialu Wu
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, China
| | - Ercheng Wang
- Zhejiang Lab, Zhejiang University, Hangzhou, China
| | | | - Yafeng Deng
- Hangzhou Carbonsilicon AI Technology Co., Ltd, Hangzhou, China
| | - Peichen Pan
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, China
| | - Yu Kang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, China.
| | - Chang-Yu Hsieh
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, China.
| | - Tingjun Hou
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, China.
| |
Collapse
|
36
|
Shen C, Zhang X, Hsieh CY, Deng Y, Wang D, Xu L, Wu J, Li D, Kang Y, Hou T, Pan P. A generalized protein-ligand scoring framework with balanced scoring, docking, ranking and screening powers. Chem Sci 2023; 14:8129-8146. [PMID: 37538816 PMCID: PMC10395315 DOI: 10.1039/d3sc02044d] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2023] [Accepted: 07/03/2023] [Indexed: 08/05/2023] Open
Abstract
Applying machine learning algorithms to protein-ligand scoring functions has aroused widespread attention in recent years due to the high predictive accuracy and affordable computational cost. Nevertheless, most machine learning-based scoring functions are only applicable to a specific task, e.g., binding affinity prediction, binding pose prediction or virtual screening, suggesting that the development of a scoring function with balanced performance in all critical tasks remains a grand challenge. To this end, we propose a novel parameterization strategy by introducing an adjustable binding affinity term that represents the correlation between the predicted outcomes and experimental data into the training of mixture density network. The resulting residue-atom distance likelihood potential not only retains the superior docking and screening power over all the other state-of-the-art approaches, but also achieves a remarkable improvement in scoring and ranking performance. We emphatically explore the impacts of several key elements on prediction accuracy as well as the task preference, and demonstrate that the performance of scoring/ranking and docking/screening tasks of a certain model could be well balanced through an appropriate manner. Overall, our study highlights the potential utility of our innovative parameterization strategy as well as the resulting scoring framework in future structure-based drug design.
Collapse
Affiliation(s)
- Chao Shen
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University Hangzhou 310058 Zhejiang China
- State Key Lab of CAD&CG, Zhejiang University Hangzhou 310058 Zhejiang China
- School of Public Health, Zhejiang University Hangzhou 310058 Zhejiang China
- CarbonSilicon AI Technology Co., Ltd Hangzhou 310018 Zhejiang China
| | - Xujun Zhang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University Hangzhou 310058 Zhejiang China
| | - Chang-Yu Hsieh
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University Hangzhou 310058 Zhejiang China
| | - Yafeng Deng
- CarbonSilicon AI Technology Co., Ltd Hangzhou 310018 Zhejiang China
| | - Dong Wang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University Hangzhou 310058 Zhejiang China
| | - Lei Xu
- Institute of Bioinformatics and Medical Engineering, School of Electrical and Information Engineering, Jiangsu University of Technology Changzhou 213001 China
| | - Jian Wu
- School of Public Health, Zhejiang University Hangzhou 310058 Zhejiang China
| | - Dan Li
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University Hangzhou 310058 Zhejiang China
| | - Yu Kang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University Hangzhou 310058 Zhejiang China
| | - Tingjun Hou
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University Hangzhou 310058 Zhejiang China
- State Key Lab of CAD&CG, Zhejiang University Hangzhou 310058 Zhejiang China
| | - Peichen Pan
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University Hangzhou 310058 Zhejiang China
| |
Collapse
|
37
|
Zhang X, Shen C, Jiang D, Zhang J, Ye Q, Xu L, Hou T, Pan P, Kang Y. TB-IECS: an accurate machine learning-based scoring function for virtual screening. J Cheminform 2023; 15:63. [PMID: 37403155 DOI: 10.1186/s13321-023-00731-x] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2023] [Accepted: 06/18/2023] [Indexed: 07/06/2023] Open
Abstract
Machine learning-based scoring functions (MLSFs) have shown potential for improving virtual screening capabilities over classical scoring functions (SFs). Due to the high computational cost in the process of feature generation, the numbers of descriptors used in MLSFs and the characterization of protein-ligand interactions are always limited, which may affect the overall accuracy and efficiency. Here, we propose a new SF called TB-IECS (theory-based interaction energy component score), which combines energy terms from Smina and NNScore version 2, and utilizes the eXtreme Gradient Boosting (XGBoost) algorithm for model training. In this study, the energy terms decomposed from 15 traditional SFs were firstly categorized based on their formulas and physicochemical principles, and 324 feature combinations were generated accordingly. Five best feature combinations were selected for further evaluation of the model performance in regard to the selection of feature vectors with various length, interaction types and ML algorithms. The virtual screening power of TB-IECS was assessed on the datasets of DUD-E and LIT-PCBA, as well as seven target-specific datasets from the ChemDiv database. The results showed that TB-IECS outperformed classical SFs including Glide SP and Dock, and effectively balanced the efficiency and accuracy for practical virtual screening.
Collapse
Affiliation(s)
- Xujun Zhang
- Innovation Institute for Artificial Intelligence in Medicine of, Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, Zhejiang, China
| | - Chao Shen
- Innovation Institute for Artificial Intelligence in Medicine of, Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, Zhejiang, China
| | - Dejun Jiang
- Innovation Institute for Artificial Intelligence in Medicine of, Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, Zhejiang, China
| | - Jintu Zhang
- Innovation Institute for Artificial Intelligence in Medicine of, Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, Zhejiang, China
| | - Qing Ye
- Innovation Institute for Artificial Intelligence in Medicine of, Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, Zhejiang, China
| | - Lei Xu
- Institute of Bioinformatics and Medical Engineering, School of Electrical and Information Engineering, Jiangsu University of Technology, Changzhou, 213001, China
| | - Tingjun Hou
- Innovation Institute for Artificial Intelligence in Medicine of, Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, Zhejiang, China
| | - Peichen Pan
- Innovation Institute for Artificial Intelligence in Medicine of, Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, Zhejiang, China.
| | - Yu Kang
- Innovation Institute for Artificial Intelligence in Medicine of, Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, Zhejiang, China.
| |
Collapse
|
38
|
Zhang S, Jin Y, Liu T, Wang Q, Zhang Z, Zhao S, Shan B. SS-GNN: A Simple-Structured Graph Neural Network for Affinity Prediction. ACS OMEGA 2023; 8:22496-22507. [PMID: 37396234 PMCID: PMC10308598 DOI: 10.1021/acsomega.3c00085] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/02/2023] [Accepted: 06/01/2023] [Indexed: 07/04/2023]
Abstract
Efficient and effective drug-target binding affinity (DTBA) prediction is a challenging task due to the limited computational resources in practical applications and is a crucial basis for drug screening. Inspired by the good representation ability of graph neural networks (GNNs), we propose a simple-structured GNN model named SS-GNN to accurately predict DTBA. By constructing a single undirected graph based on a distance threshold to represent protein-ligand interactions, the scale of the graph data is greatly reduced. Moreover, ignoring covalent bonds in the protein further reduces the computational cost of the model. The graph neural network-multilayer perceptron (GNN-MLP) module takes the latent feature extraction of atoms and edges in the graph as two mutually independent processes. We also develop an edge-based atom-pair feature aggregation method to represent complex interactions and a graph pooling-based method to predict the binding affinity of the complex. We achieve state-of-the-art prediction performance using a simple model (with only 0.6 M parameters) without introducing complicated geometric feature descriptions. SS-GNN achieves Pearson's Rp = 0.853 on the PDBbind v2016 core set, outperforming state-of-the-art GNN-based methods by 5.2%. Moreover, the simplified model structure and concise data processing procedure improve the prediction efficiency of the model. For a typical protein-ligand complex, affinity prediction takes only 0.2 ms. All codes are freely accessible at https://github.com/xianyuco/SS-GNN.
Collapse
Affiliation(s)
- Shuke Zhang
- Software
College, Hebei Normal University, Shijiazhuang 050024, China
- Shijiazhuang
Xianyu Digital Biotechnology Co., Ltd, Shijiazhuang 050024, China
| | - Yanzhao Jin
- Software
College, Hebei Normal University, Shijiazhuang 050024, China
- Shijiazhuang
Xianyu Digital Biotechnology Co., Ltd, Shijiazhuang 050024, China
| | - Tianmeng Liu
- Software
College, Hebei Normal University, Shijiazhuang 050024, China
- Shijiazhuang
Xianyu Digital Biotechnology Co., Ltd, Shijiazhuang 050024, China
| | - Qi Wang
- Software
College, Hebei Normal University, Shijiazhuang 050024, China
- Shijiazhuang
Xianyu Digital Biotechnology Co., Ltd, Shijiazhuang 050024, China
| | - Zhaohui Zhang
- Software
College, Hebei Normal University, Shijiazhuang 050024, China
- College
of Computer and Cyber Security, Hebei Normal
University, Shijiazhuang 050024, China
| | - Shuliang Zhao
- College
of Computer and Cyber Security, Hebei Normal
University, Shijiazhuang 050024, China
- Hebei
Provincial Key Laboratory of Network and Information Security, Shijiazhuang 050024, China
- Hebei
Provincial Engineering Research Center for Supply Chain Big Data Analytics
& Data Security, Shijiazhuang 050024, China
| | - Bo Shan
- Software
College, Hebei Normal University, Shijiazhuang 050024, China
- Shijiazhuang
Xianyu Digital Biotechnology Co., Ltd, Shijiazhuang 050024, China
| |
Collapse
|
39
|
Zhang H, Saravanan KM, Zhang JZH. DeepBindGCN: Integrating Molecular Vector Representation with Graph Convolutional Neural Networks for Protein-Ligand Interaction Prediction. Molecules 2023; 28:4691. [PMID: 37375246 DOI: 10.3390/molecules28124691] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2023] [Revised: 06/08/2023] [Accepted: 06/09/2023] [Indexed: 06/29/2023] Open
Abstract
The core of large-scale drug virtual screening is to select the binders accurately and efficiently with high affinity from large libraries of small molecules in which non-binders are usually dominant. The binding affinity is significantly influenced by the protein pocket, ligand spatial information, and residue types/atom types. Here, we used the pocket residues or ligand atoms as the nodes and constructed edges with the neighboring information to comprehensively represent the protein pocket or ligand information. Moreover, the model with pre-trained molecular vectors performed better than the one-hot representation. The main advantage of DeepBindGCN is that it is independent of docking conformation, and concisely keeps the spatial information and physical-chemical features. Using TIPE3 and PD-L1 dimer as proof-of-concept examples, we proposed a screening pipeline integrating DeepBindGCN and other methods to identify strong-binding-affinity compounds. It is the first time a non-complex-dependent model has achieved a root mean square error (RMSE) value of 1.4190 and Pearson r value of 0.7584 in the PDBbind v.2016 core set, respectively, thereby showing a comparable prediction power with the state-of-the-art affinity prediction models that rely upon the 3D complex. DeepBindGCN provides a powerful tool to predict the protein-ligand interaction and can be used in many important large-scale virtual screening application scenarios.
Collapse
Affiliation(s)
- Haiping Zhang
- Shenzhen Institute of Synthetic Biology, Faculty of Synthetic Biology, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China
| | - Konda Mani Saravanan
- Department of Biotechnology, Bharath Institute of Higher Education and Research, Chennai 600073, Tamil Nadu, India
| | - John Z H Zhang
- Shenzhen Institute of Synthetic Biology, Faculty of Synthetic Biology, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China
- School of Chemistry and Molecular Engineering, East China Normal University, Shanghai 200062, China
- NYU-ECNU Center for Computational Chemistry at NYU Shanghai, Shanghai 200062, China
| |
Collapse
|
40
|
Isert C, Atz K, Schneider G. Structure-based drug design with geometric deep learning. Curr Opin Struct Biol 2023; 79:102548. [PMID: 36842415 DOI: 10.1016/j.sbi.2023.102548] [Citation(s) in RCA: 26] [Impact Index Per Article: 26.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2022] [Revised: 01/16/2023] [Accepted: 01/24/2023] [Indexed: 02/26/2023]
Abstract
Structure-based drug design uses three-dimensional geometric information of macromolecules, such as proteins or nucleic acids, to identify suitable ligands. Geometric deep learning, an emerging concept of neural-network-based machine learning, has been applied to macromolecular structures. This review provides an overview of the recent applications of geometric deep learning in bioorganic and medicinal chemistry, highlighting its potential for structure-based drug discovery and design. Emphasis is placed on molecular property prediction, ligand binding site and pose prediction, and structure-based de novo molecular design. The current challenges and opportunities are highlighted, and a forecast of the future of geometric deep learning for drug discovery is presented.
Collapse
Affiliation(s)
- Clemens Isert
- ETH Zurich, Department of Chemistry and Applied Biosciences, Vladimir-Prelog-Weg 4, Zurich, 8093, Switzerland
| | - Kenneth Atz
- ETH Zurich, Department of Chemistry and Applied Biosciences, Vladimir-Prelog-Weg 4, Zurich, 8093, Switzerland
| | - Gisbert Schneider
- ETH Zurich, Department of Chemistry and Applied Biosciences, Vladimir-Prelog-Weg 4, Zurich, 8093, Switzerland; ETH Singapore SEC Ltd, 1 CREATE Way, #06-01 CREATE Tower, Singapore, 8093, Singapore.
| |
Collapse
|
41
|
Ektefaie Y, Dasoulas G, Noori A, Farhat M, Zitnik M. Multimodal learning with graphs. NAT MACH INTELL 2023; 5:340-350. [PMID: 38076673 PMCID: PMC10704992 DOI: 10.1038/s42256-023-00624-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2022] [Accepted: 02/01/2023] [Indexed: 04/05/2023]
Abstract
Artificial intelligence for graphs has achieved remarkable success in modeling complex systems, ranging from dynamic networks in biology to interacting particle systems in physics. However, the increasingly heterogeneous graph datasets call for multimodal methods that can combine different inductive biases-the set of assumptions that algorithms use to make predictions for inputs they have not encountered during training. Learning on multimodal datasets presents fundamental challenges because the inductive biases can vary by data modality and graphs might not be explicitly given in the input. To address these challenges, multimodal graph AI methods combine different modalities while leveraging cross-modal dependencies using graphs. Diverse datasets are combined using graphs and fed into sophisticated multimodal architectures, specified as image-intensive, knowledge-grounded and language-intensive models. Using this categorization, we introduce a blueprint for multimodal graph learning, use it to study existing methods and provide guidelines to design new models.
Collapse
Affiliation(s)
- Yasha Ektefaie
- Bioinformatics and Integrative Genomics Program, Harvard Medical School, Boston, MA 02115, USA
- Department of Biomedical Informatics, Harvard University, Boston, MA 02115, USA
| | - George Dasoulas
- Department of Biomedical Informatics, Harvard University, Boston, MA 02115, USA
- Harvard Data Science Initiative, Cambridge, MA 02138, USA
| | - Ayush Noori
- Department of Biomedical Informatics, Harvard University, Boston, MA 02115, USA
- Harvard College, Cambridge, MA 02138, USA
| | - Maha Farhat
- Department of Biomedical Informatics, Harvard University, Boston, MA 02115, USA
- Division of Pulmonary and Critical Care, Department of Medicine, Massachusetts General Hospital, Boston, MA, USA
| | - Marinka Zitnik
- Department of Biomedical Informatics, Harvard University, Boston, MA 02115, USA
- Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- Harvard Data Science Initiative, Cambridge, MA 02138, USA
| |
Collapse
|
42
|
Fan X, Wang Y, Yu C, Lv Y, Zhang H, Yang Q, Wen M, Lu H, Zhang Z. A Universal and Accurate Method for Easily Identifying Components in Raman Spectroscopy Based on Deep Learning. Anal Chem 2023; 95:4863-4870. [PMID: 36908216 DOI: 10.1021/acs.analchem.2c03853] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/14/2023]
Abstract
Raman spectroscopy has been widely used to provide the structural fingerprint for molecular identification. Due to interference from coexisting components, noise, baseline, and systematic differences between spectrometers, component identification with Raman spectra is challenging, especially for mixtures. In this study, a method entitled DeepRaman has been proposed to solve those problems by combining the comparison ability of a pseudo-Siamese neural network (pSNN) and the input-shape flexibility of spatial pyramid pooling (SPP). DeepRaman was trained, validated, and tested with 41,564 augmented Raman spectra from two databases (pharmaceutical material and S.T. Japan). It can achieve 96.29% accuracy, 98.40% true positive rate (TPR), and 94.36% true negative rate (TNR) on the test set. Another six data sets measured on different instruments were used to evaluate the performance of the proposed method from different aspects. DeepRaman can provide accurate identification results and significantly outperform the hit quality index (HQI) method and other deep learning models. In addition, it performs well in cases of different spectral complexity and low-content components. Once the model is established, it can be used directly on different data sets without retraining or transfer learning. Furthermore, it also obtains promising results for the analysis of surface-enhanced Raman spectroscopy (SERS) data sets and Raman imaging data sets. In summary, it is an accurate, universal, and ready-to-use method for component identification in various application scenarios.
Collapse
Affiliation(s)
- Xiaqiong Fan
- College of Chemistry and Chemical Engineering, Central South University, Changsha 410083, China
| | - Yue Wang
- College of Chemistry and Chemical Engineering, Central South University, Changsha 410083, China
| | - Chuanxiu Yu
- College of Chemistry and Chemical Engineering, Central South University, Changsha 410083, China
| | - Yuanxia Lv
- College of Chemistry and Chemical Engineering, Central South University, Changsha 410083, China
| | - Hailiang Zhang
- College of Chemistry and Chemical Engineering, Central South University, Changsha 410083, China
| | - Qiong Yang
- College of Chemistry and Chemical Engineering, Central South University, Changsha 410083, China
| | - Ming Wen
- College of Chemistry and Chemical Engineering, Central South University, Changsha 410083, China
| | - Hongmei Lu
- College of Chemistry and Chemical Engineering, Central South University, Changsha 410083, China
| | - Zhimin Zhang
- College of Chemistry and Chemical Engineering, Central South University, Changsha 410083, China
| |
Collapse
|
43
|
Yang Z, Zhong W, Lv Q, Dong T, Yu-Chian Chen C. Geometric Interaction Graph Neural Network for Predicting Protein-Ligand Binding Affinities from 3D Structures (GIGN). J Phys Chem Lett 2023; 14:2020-2033. [PMID: 36794930 DOI: 10.1021/acs.jpclett.2c03906] [Citation(s) in RCA: 21] [Impact Index Per Article: 21.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/18/2023]
Abstract
Predicting protein-ligand binding affinities (PLAs) is a core problem in drug discovery. Recent advances have shown great potential in applying machine learning (ML) for PLA prediction. However, most of them omit the 3D structures of complexes and physical interactions between proteins and ligands, which are considered essential to understanding the binding mechanism. This paper proposes a geometric interaction graph neural network (GIGN) that incorporates 3D structures and physical interactions for predicting protein-ligand binding affinities. Specifically, we design a heterogeneous interaction layer that unifies covalent and noncovalent interactions into the message passing phase to learn node representations more effectively. The heterogeneous interaction layer also follows fundamental biological laws, including invariance to translations and rotations of the complexes, thus avoiding expensive data augmentation strategies. GIGN achieves state-of-the-art performance on three external test sets. Moreover, by visualizing learned representations of protein-ligand complexes, we show that the predictions of GIGN are biologically meaningful.
Collapse
Affiliation(s)
- Ziduo Yang
- Intelligent Medical Research Center, School of Intelligent Systems Engineering, Sun Yat-sen University, Shenzhen, Guangdong 510275, China
| | - Weihe Zhong
- Intelligent Medical Research Center, School of Intelligent Systems Engineering, Sun Yat-sen University, Shenzhen, Guangdong 510275, China
| | - Qiujie Lv
- Intelligent Medical Research Center, School of Intelligent Systems Engineering, Sun Yat-sen University, Shenzhen, Guangdong 510275, China
| | - Tiejun Dong
- Intelligent Medical Research Center, School of Intelligent Systems Engineering, Sun Yat-sen University, Shenzhen, Guangdong 510275, China
| | - Calvin Yu-Chian Chen
- Intelligent Medical Research Center, School of Intelligent Systems Engineering, Sun Yat-sen University, Shenzhen, Guangdong 510275, China
- Department of Medical Research, China Medical University Hospital, Taichung 40447, Taiwan
- Department of Bioinformatics and Medical Engineering, Asia University, Taichung 41354, Taiwan
| |
Collapse
|
44
|
Zhu Z, Yao Z, Qi G, Mazur N, Yang P, Cong B. Associative learning mechanism for drug‐target interaction prediction. CAAI TRANSACTIONS ON INTELLIGENCE TECHNOLOGY 2023. [DOI: 10.1049/cit2.12194] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/05/2023] Open
Affiliation(s)
- Zhiqin Zhu
- College of Automation Chongqing University of Posts and Telecommunications Chongqing China
| | - Zheng Yao
- College of Automation Chongqing University of Posts and Telecommunications Chongqing China
| | - Guanqiu Qi
- Computer Information Systems Department State University of New York at Buffalo State Buffalo New York USA
| | - Neal Mazur
- Computer Information Systems Department State University of New York at Buffalo State Buffalo New York USA
| | - Pan Yang
- Department of Cardiovascular Surgery Chongqing General Hospital University of Chinese Academy of Sciences Chongqing China
- Emergency Department The Second Affiliated Hospital of Chongqing Medical University Chongqing China
| | - Baisen Cong
- Data Scientist Diagnostics Digital DH (Shanghai) Diagnostics Co., Ltd. Danaher Company Shanghai China
| |
Collapse
|
45
|
Griffiths RR, Greenfield JL, Thawani AR, Jamasb AR, Moss HB, Bourached A, Jones P, McCorkindale W, Aldrick AA, Fuchter MJ, Lee AA. Data-driven discovery of molecular photoswitches with multioutput Gaussian processes. Chem Sci 2022; 13:13541-13551. [PMID: 36507171 PMCID: PMC9682911 DOI: 10.1039/d2sc04306h] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2022] [Accepted: 09/16/2022] [Indexed: 11/11/2022] Open
Abstract
Photoswitchable molecules display two or more isomeric forms that may be accessed using light. Separating the electronic absorption bands of these isomers is key to selectively addressing a specific isomer and achieving high photostationary states whilst overall red-shifting the absorption bands serves to limit material damage due to UV-exposure and increases penetration depth in photopharmacological applications. Engineering these properties into a system through synthetic design however, remains a challenge. Here, we present a data-driven discovery pipeline for molecular photoswitches underpinned by dataset curation and multitask learning with Gaussian processes. In the prediction of electronic transition wavelengths, we demonstrate that a multioutput Gaussian process (MOGP) trained using labels from four photoswitch transition wavelengths yields the strongest predictive performance relative to single-task models as well as operationally outperforming time-dependent density functional theory (TD-DFT) in terms of the wall-clock time for prediction. We validate our proposed approach experimentally by screening a library of commercially available photoswitchable molecules. Through this screen, we identified several motifs that displayed separated electronic absorption bands of their isomers, exhibited red-shifted absorptions, and are suited for information transfer and photopharmacological applications. Our curated dataset, code, as well as all models are made available at https://github.com/Ryan-Rhys/The-Photoswitch-Dataset.
Collapse
Affiliation(s)
- Ryan-Rhys Griffiths
- The Cavendish Laboratory, Department of Physics, University of CambridgeCambridge CB3 0HEUK
| | - Jake L. Greenfield
- Molecular Sciences Research Hub, Department of Chemistry, Imperial College LondonLondon W12 0BZUK,Center for Nanosystems Chemistry (CNC), Institut für Organische Chemie, Universität WürzburgWürzburg 97074Germany
| | - Aditya R. Thawani
- Molecular Sciences Research Hub, Department of Chemistry, Imperial College LondonLondon W12 0BZUK
| | - Arian R. Jamasb
- The Computer Laboratory, University of CambridgeCambridge CB3 0FDUK
| | | | - Anthony Bourached
- The Institute of Neurology, Department of Neurology, University College LondonLondon WC1N 3BGUK
| | - Penelope Jones
- The Cavendish Laboratory, Department of Physics, University of CambridgeCambridge CB3 0HEUK
| | - William McCorkindale
- The Cavendish Laboratory, Department of Physics, University of CambridgeCambridge CB3 0HEUK
| | - Alexander A. Aldrick
- The Cavendish Laboratory, Department of Physics, University of CambridgeCambridge CB3 0HEUK
| | - Matthew J. Fuchter
- Molecular Sciences Research Hub, Department of Chemistry, Imperial College LondonLondon W12 0BZUK
| | - Alpha A. Lee
- The Cavendish Laboratory, Department of Physics, University of CambridgeCambridge CB3 0HEUK
| |
Collapse
|
46
|
Han Z, Kammer DS, Fink O. Learning physics-consistent particle interactions. PNAS NEXUS 2022; 1:pgac264. [PMID: 36712322 PMCID: PMC9802333 DOI: 10.1093/pnasnexus/pgac264] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/31/2022] [Accepted: 11/15/2022] [Indexed: 11/19/2022]
Abstract
Interacting particle systems play a key role in science and engineering. Access to the governing particle interaction law is fundamental for a complete understanding of such systems. However, the inherent system complexity keeps the particle interaction hidden in many cases. Machine learning methods have the potential to learn the behavior of interacting particle systems by combining experiments with data analysis methods. However, most existing algorithms focus on learning the kinetics at the particle level. Learning pairwise interaction, e.g., pairwise force or pairwise potential energy, remains an open challenge. Here, we propose an algorithm that adapts the Graph Networks framework, which contains an edge part to learn the pairwise interaction and a node part to model the dynamics at particle level. Different from existing approaches that use neural networks in both parts, we design a deterministic operator in the node part that allows to precisely infer the pairwise interactions that are consistent with underlying physical laws by only being trained to predict the particle acceleration. We test the proposed methodology on multiple datasets and demonstrate that it achieves superior performance in inferring correctly the pairwise interactions while also being consistent with the underlying physics on all the datasets. While the previously proposed approaches are able to be applied as simulators, they fail to infer physically consistent particle interactions that satisfy Newton's laws. Moreover, the proposed physics-induced graph network for particle interaction also outperforms the other baseline models in terms of generalization ability to larger systems and robustness to significant levels of noise. The developed methodology can support a better understanding and discovery of the underlying particle interaction laws, and hence, guide the design of materials with targeted properties.
Collapse
Affiliation(s)
- Zhichao Han
- Institute for Building Materials, ETH Zürich, 8093 Zürich, Switzerland
| | - David S Kammer
- Institute for Building Materials, ETH Zürich, 8093 Zürich, Switzerland
| | - Olga Fink
- To whom correspondence should be addressed:
| |
Collapse
|
47
|
Chen H, Batchelor-McAuley C, Kätelhön E, Elliott J, Compton RG. A Critical Evaluation of Using Physics-Informed Neural Networks for Simulating Voltammetry: Strengths, Weaknesses and Best Practices. J Electroanal Chem (Lausanne) 2022. [DOI: 10.1016/j.jelechem.2022.116918] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
48
|
Shen C, Zhang X, Deng Y, Gao J, Wang D, Xu L, Pan P, Hou T, Kang Y. Boosting Protein-Ligand Binding Pose Prediction and Virtual Screening Based on Residue-Atom Distance Likelihood Potential and Graph Transformer. J Med Chem 2022; 65:10691-10706. [PMID: 35917397 DOI: 10.1021/acs.jmedchem.2c00991] [Citation(s) in RCA: 37] [Impact Index Per Article: 18.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/08/2023]
Abstract
The past few years have witnessed enormous progress toward applying machine learning approaches to the development of protein-ligand scoring functions. However, the robust performance and wide applicability of scoring functions remain a big challenge for increasing the success rate of docking-based virtual screening. Herein, a novel scoring function named RTMScore was developed by introducing a tailored residue-based graph representation strategy and several graph transformer layers for the learning of protein and ligand representations, followed by a mixture density network to obtain residue-atom distance likelihood potential. Our approach was resolutely validated on the CASF-2016 benchmark, and the results indicate that RTMScore can outperform almost all of the other state-of-the-art methods in terms of both the docking and screening powers. Further evaluation confirms the robustness of our approach that can not only retain its docking power on cross-docked poses but also achieve improved performance as a rescoring tool in larger-scale virtual screening.
Collapse
Affiliation(s)
- Chao Shen
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang 310058, China.,State Key Lab of CAD&CG, Zhejiang University, Hangzhou, Zhejiang 310058, China.,CarbonSilicon AI Technology Co., Ltd, Hangzhou, Zhejiang 310018, China
| | - Xujun Zhang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang 310058, China
| | - Yafeng Deng
- CarbonSilicon AI Technology Co., Ltd, Hangzhou, Zhejiang 310018, China
| | - Junbo Gao
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang 310058, China
| | - Dong Wang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang 310058, China
| | - Lei Xu
- Institute of Bioinformatics and Medical Engineering, School of Electrical and Information Engineering, Jiangsu University of Technology, Changzhou 213001, China
| | - Peichen Pan
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang 310058, China
| | - Tingjun Hou
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang 310058, China.,State Key Lab of CAD&CG, Zhejiang University, Hangzhou, Zhejiang 310058, China
| | - Yu Kang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang 310058, China
| |
Collapse
|
49
|
Zhang H, Zhang T, Saravanan KM, Liao L, Wu H, Zhang H, Zhang H, Pan Y, Wu X, Wei Y. DeepBindBC: a practical deep learning method for identifying native-like protein-ligand complexes in virtual screening. Methods 2022; 205:247-262. [PMID: 35878751 DOI: 10.1016/j.ymeth.2022.07.009] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2022] [Revised: 06/29/2022] [Accepted: 07/12/2022] [Indexed: 12/18/2022] Open
Abstract
Identifying native-like protein-ligand complexes (PLCs) from an abundance of docking decoys is critical for large-scale virtual drug screening in early-stage drug discovery lead searching efforts. Providing reliable prediction is still a challenge for most current affinity predicting models because of a lack of non-binding data during model training, lost critical physical-chemical features, and difficulties in learning abstract information with limited neural layers. In this work, we proposed a deep learning model, DeepBindBC, for classifying putative ligands as binding or non-binding. Our model incorporates information on non-binding interactions, making it more suitable for real applications. ResNet model architecture and more detailed atom type representation guarantee implicit features can be learned more accurately. Here, we show that DeepBindBC outperforms Autodock Vina, Pafnucy, and DLSCORE for three DUD.E testing sets. Moreover, DeepBindBC identified a novel human pancreatic α-amylase binder validated by a fluorescence spectral experiment (Ka= 1.0×105 M). Furthermore, DeepBindBC can be used as a core component of a hybrid virtual screening pipeline that incorporating many other complementary methods, such as DFCNN, Autodock Vina docking, and pocket molecular dynamics simulation. Additionally, an online web server based on the model is available at http://cbblab.siat.ac.cn/DeepBindBC/index.php for the user's convenience. Our model and the web server provide alternative tools in the early steps of drug discovery by providing accurate identification of native-like PLCs.
Collapse
Affiliation(s)
- Haiping Zhang
- Shenzhen Institute of Synthetic Biology, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, Guangdong, PR China; Center for High Performance Computing, Joint Engineering Research Center for Health Big Data Intelligent Analysis Technology, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, Guangdong 518 055, PR China
| | - Tingting Zhang
- School of Medicine, Shenzhen University, Shenzhen, Guangdong Province 518060, PR China
| | - Konda Mani Saravanan
- Department of Biotechnology, Bharath Institute of Higher Education and Research, Chennai 600073, Tamil Nadu, India
| | - Linbu Liao
- College of Software Technology, Zhejiang University, Zhejiang Province 315048, PR China
| | - Hao Wu
- Center for High Performance Computing, Joint Engineering Research Center for Health Big Data Intelligent Analysis Technology, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, Guangdong 518 055, PR China
| | - Haishan Zhang
- Center for High Performance Computing, Joint Engineering Research Center for Health Big Data Intelligent Analysis Technology, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, Guangdong 518 055, PR China
| | - Huiling Zhang
- Center for High Performance Computing, Joint Engineering Research Center for Health Big Data Intelligent Analysis Technology, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, Guangdong 518 055, PR China
| | - Yi Pan
- Center for High Performance Computing, Joint Engineering Research Center for Health Big Data Intelligent Analysis Technology, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, Guangdong 518 055, PR China
| | - Xuli Wu
- School of Medicine, Shenzhen University, Shenzhen, Guangdong Province 518060, PR China.
| | - Yanjie Wei
- Shenzhen Institute of Synthetic Biology, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, Guangdong, PR China.
| |
Collapse
|
50
|
Yu J, Wang D, Zheng M. Uncertainty quantification: Can we trust artificial intelligence in drug discovery? iScience 2022; 25:104814. [PMID: 35996575 PMCID: PMC9391523 DOI: 10.1016/j.isci.2022.104814] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022] Open
Abstract
The problem of human trust is one of the most fundamental problems in applied artificial intelligence in drug discovery. In silico models have been widely used to accelerate the process of drug discovery in recent years. However, most of these models can only give reliable predictions within a limited chemical space that the training set covers (applicability domain). Predictions of samples falling outside the applicability domain are unreliable and sometimes dangerous for the drug-design decision-making process. Uncertainty quantification accordingly has drawn great attention to enable autonomous drug designing. By quantifying the confidence level of model predictions, the reliability of the predictions can be quantitatively represented to assist researchers in their molecular reasoning and experimental design. Here we summarize the state-of-the-art approaches to uncertainty quantification and underline how they can be used for drug design and discovery projects. Furthermore, we also outline four representative application scenarios of uncertainty quantification in drug discovery.
Collapse
|