1
|
Atz K, Nippa DF, Müller AT, Jost V, Anelli A, Reutlinger M, Kramer C, Martin RE, Grether U, Schneider G, Wuitschik G. Geometric deep learning-guided Suzuki reaction conditions assessment for applications in medicinal chemistry. RSC Med Chem 2024; 15:2310-2321. [PMID: 39026644 PMCID: PMC11253849 DOI: 10.1039/d4md00196f] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2024] [Accepted: 05/25/2024] [Indexed: 07/20/2024] Open
Abstract
Suzuki cross-coupling reactions are considered a valuable tool for constructing carbon-carbon bonds in small molecule drug discovery. However, the synthesis of chemical matter often represents a time-consuming and labour-intensive bottleneck. We demonstrate how machine learning methods trained on high-throughput experimentation (HTE) data can be leveraged to enable fast reaction condition selection for novel coupling partners. We show that the trained models support chemists in determining suitable catalyst-solvent-base combinations for individual transformations including an evaluation of the need for HTE screening. We introduce an algorithm for designing 96-well plates optimized towards reaction yields and discuss the model performance of zero- and few-shot machine learning. The best-performing machine learning model achieved a three-category classification accuracy of 76.3% (±0.2%) and an F 1-score for a binary classification of 79.1% (±0.9%). Validation on eight reactions revealed a receiver operating characteristic (ROC) curve (AUC) value of 0.82 (±0.07) for few-shot machine learning. On the other hand, zero-shot machine learning models achieved a mean ROC-AUC value of 0.63 (±0.16). This study positively advocates the application of few-shot machine learning-guided reaction condition selection for HTE campaigns in medicinal chemistry and highlights practical applications as well as challenges associated with zero-shot machine learning.
Collapse
Affiliation(s)
- Kenneth Atz
- Roche Pharma Research and Early Development (pRED), Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd. Grenzacherstrasse 124 4070 Basel Switzerland
| | - David F Nippa
- Roche Pharma Research and Early Development (pRED), Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd. Grenzacherstrasse 124 4070 Basel Switzerland
| | - Alex T Müller
- Roche Pharma Research and Early Development (pRED), Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd. Grenzacherstrasse 124 4070 Basel Switzerland
| | - Vera Jost
- Roche Pharma Research and Early Development (pRED), Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd. Grenzacherstrasse 124 4070 Basel Switzerland
| | - Andrea Anelli
- Roche Pharma Research and Early Development (pRED), Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd. Grenzacherstrasse 124 4070 Basel Switzerland
| | - Michael Reutlinger
- Roche Pharma Research and Early Development (pRED), Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd. Grenzacherstrasse 124 4070 Basel Switzerland
| | - Christian Kramer
- Roche Pharma Research and Early Development (pRED), Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd. Grenzacherstrasse 124 4070 Basel Switzerland
| | - Rainer E Martin
- Roche Pharma Research and Early Development (pRED), Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd. Grenzacherstrasse 124 4070 Basel Switzerland
| | - Uwe Grether
- Roche Pharma Research and Early Development (pRED), Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd. Grenzacherstrasse 124 4070 Basel Switzerland
| | - Gisbert Schneider
- Department of Chemistry and Applied Biosciences, ETH Zurich Vladimir-Prelog-Weg 4 8093 Zurich Switzerland
| | - Georg Wuitschik
- Roche Pharma Research and Early Development (pRED), Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd. Grenzacherstrasse 124 4070 Basel Switzerland
| |
Collapse
|
2
|
Gale-Day ZJ, Shub L, Chuang KV, Keiser MJ. Proximity Graph Networks: Predicting Ligand Affinity with Message Passing Neural Networks. J Chem Inf Model 2024. [PMID: 38953560 DOI: 10.1021/acs.jcim.4c00311] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/04/2024]
Abstract
Message passing neural networks (MPNNs) on molecular graphs generate continuous and differentiable encodings of small molecules with state-of-the-art performance on protein-ligand complex scoring tasks. Here, we describe the proximity graph network (PGN) package, an open-source toolkit that constructs ligand-receptor graphs based on atom proximity and allows users to rapidly apply and evaluate MPNN architectures for a broad range of tasks. We demonstrate the utility of PGN by introducing benchmarks for affinity and docking score prediction tasks. Graph networks generalize better than fingerprint-based models and perform strongly for the docking score prediction task. Overall, MPNNs with proximity graph data structures augment the prediction of ligand-receptor complex properties when ligand-receptor data are available.
Collapse
Affiliation(s)
- Zachary J Gale-Day
- Department of Pharmaceutical Chemistry, University of California, San Francisco, San Francisco, California 94158, United States
- Institute for Neurodegenerative Diseases, University of California, San Francisco, San Francisco, California 94158, United States
| | - Laura Shub
- Department of Pharmaceutical Chemistry, University of California, San Francisco, San Francisco, California 94158, United States
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, San Francisco, California 94158, United States
- Institute for Neurodegenerative Diseases, University of California, San Francisco, San Francisco, California 94158, United States
- Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, California 94158, United States
| | - Kangway V Chuang
- Department of Pharmaceutical Chemistry, University of California, San Francisco, San Francisco, California 94158, United States
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, San Francisco, California 94158, United States
- Institute for Neurodegenerative Diseases, University of California, San Francisco, San Francisco, California 94158, United States
- Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, California 94158, United States
| | - Michael J Keiser
- Department of Pharmaceutical Chemistry, University of California, San Francisco, San Francisco, California 94158, United States
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, San Francisco, California 94158, United States
- Institute for Neurodegenerative Diseases, University of California, San Francisco, San Francisco, California 94158, United States
- Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, California 94158, United States
| |
Collapse
|
3
|
Nguyen ATN, Nguyen DTN, Koh HY, Toskov J, MacLean W, Xu A, Zhang D, Webb GI, May LT, Halls ML. The application of artificial intelligence to accelerate G protein-coupled receptor drug discovery. Br J Pharmacol 2024; 181:2371-2384. [PMID: 37161878 DOI: 10.1111/bph.16140] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2022] [Revised: 04/14/2023] [Accepted: 04/27/2023] [Indexed: 05/11/2023] Open
Abstract
The application of artificial intelligence (AI) approaches to drug discovery for G protein-coupled receptors (GPCRs) is a rapidly expanding area. Artificial intelligence can be used at multiple stages during the drug discovery process, from aiding our understanding of the fundamental actions of GPCRs to the discovery of new ligand-GPCR interactions or the prediction of clinical responses. Here, we provide an overview of the concepts behind artificial intelligence, including the subfields of machine learning and deep learning. We summarise the published applications of artificial intelligence to different stages of the GPCR drug discovery process. Finally, we reflect on the benefits and limitations of artificial intelligence and share our vision for the exciting potential for further development of applications to aid GPCR drug discovery. In addition to making the drug discovery process "faster, smarter and cheaper," we anticipate that the application of artificial intelligence will create exciting new opportunities for GPCR drug discovery. LINKED ARTICLES: This article is part of a themed issue Therapeutic Targeting of G Protein-Coupled Receptors: hot topics from the Australasian Society of Clinical and Experimental Pharmacologists and Toxicologists 2021 Virtual Annual Scientific Meeting. To view the other articles in this section visit http://onlinelibrary.wiley.com/doi/10.1111/bph.v181.14/issuetoc.
Collapse
Affiliation(s)
- Anh T N Nguyen
- Drug Discovery Biology Theme, Monash Institute of Pharmaceutical Sciences, Monash University, Parkville, Victoria, Australia
| | - Diep T N Nguyen
- Department of Information Technology, Faculty of Engineering and Technology, Vietnam National University, Cau Giay, Hanoi, Vietnam
| | - Huan Yee Koh
- Drug Discovery Biology Theme, Monash Institute of Pharmaceutical Sciences, Monash University, Parkville, Victoria, Australia
- Monash Data Futures Institute and Department of Data Science and Artificial Intelligence, Monash University, Clayton, Victoria, Australia
| | - Jason Toskov
- Monash DeepNeuron, Monash University, Clayton, Victoria, Australia
| | - William MacLean
- Monash DeepNeuron, Monash University, Clayton, Victoria, Australia
| | - Andrew Xu
- Monash DeepNeuron, Monash University, Clayton, Victoria, Australia
| | - Daokun Zhang
- Drug Discovery Biology Theme, Monash Institute of Pharmaceutical Sciences, Monash University, Parkville, Victoria, Australia
- Monash Data Futures Institute and Department of Data Science and Artificial Intelligence, Monash University, Clayton, Victoria, Australia
| | - Geoffrey I Webb
- Monash Data Futures Institute and Department of Data Science and Artificial Intelligence, Monash University, Clayton, Victoria, Australia
| | - Lauren T May
- Drug Discovery Biology Theme, Monash Institute of Pharmaceutical Sciences, Monash University, Parkville, Victoria, Australia
| | - Michelle L Halls
- Drug Discovery Biology Theme, Monash Institute of Pharmaceutical Sciences, Monash University, Parkville, Victoria, Australia
| |
Collapse
|
4
|
van Tilborg D, Brinkmann H, Criscuolo E, Rossen L, Özçelik R, Grisoni F. Deep learning for low-data drug discovery: Hurdles and opportunities. Curr Opin Struct Biol 2024; 86:102818. [PMID: 38669740 DOI: 10.1016/j.sbi.2024.102818] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2024] [Revised: 03/27/2024] [Accepted: 03/29/2024] [Indexed: 04/28/2024]
Abstract
Deep learning is becoming increasingly relevant in drug discovery, from de novo design to protein structure prediction and synthesis planning. However, it is often challenged by the small data regimes typical of certain drug discovery tasks. In such scenarios, deep learning approaches-which are notoriously 'data-hungry'-might fail to live up to their promise. Developing novel approaches to leverage the power of deep learning in low-data scenarios is sparking great attention, and future developments are expected to propel the field further. This mini-review provides an overview of recent low-data-learning approaches in drug discovery, analyzing their hurdles and advantages. Finally, we venture to provide a forecast of future research directions in low-data learning for drug discovery.
Collapse
Affiliation(s)
- Derek van Tilborg
- Institute for Complex Molecular Systems (ICMS), Department of Biomedical Engineering, Eindhoven University of Technology, PO Box 513, 5600 MB Eindhoven, the Netherlands; Centre for Living Technologies, Alliance TU/e, WUR, UU, UMC Utrecht, Princetonlaan 6, 3584 CB, Utrecht, the Netherlands. https://twitter.com/DerekvTilborg
| | - Helena Brinkmann
- Institute for Complex Molecular Systems (ICMS), Department of Biomedical Engineering, Eindhoven University of Technology, PO Box 513, 5600 MB Eindhoven, the Netherlands. https://twitter.com/hlnbrkmnn
| | - Emanuele Criscuolo
- Institute for Complex Molecular Systems (ICMS), Department of Biomedical Engineering, Eindhoven University of Technology, PO Box 513, 5600 MB Eindhoven, the Netherlands. https://twitter.com/emanuelecriscu9
| | - Luke Rossen
- Institute for Complex Molecular Systems (ICMS), Department of Biomedical Engineering, Eindhoven University of Technology, PO Box 513, 5600 MB Eindhoven, the Netherlands. https://twitter.com/molecular_ml
| | - Rıza Özçelik
- Institute for Complex Molecular Systems (ICMS), Department of Biomedical Engineering, Eindhoven University of Technology, PO Box 513, 5600 MB Eindhoven, the Netherlands; Centre for Living Technologies, Alliance TU/e, WUR, UU, UMC Utrecht, Princetonlaan 6, 3584 CB, Utrecht, the Netherlands. https://twitter.com/Rza_ozcelik
| | - Francesca Grisoni
- Institute for Complex Molecular Systems (ICMS), Department of Biomedical Engineering, Eindhoven University of Technology, PO Box 513, 5600 MB Eindhoven, the Netherlands; Centre for Living Technologies, Alliance TU/e, WUR, UU, UMC Utrecht, Princetonlaan 6, 3584 CB, Utrecht, the Netherlands.
| |
Collapse
|
5
|
Zhu J, Gu Z, Pei J, Lai L. DiffBindFR: an SE(3) equivariant network for flexible protein-ligand docking. Chem Sci 2024; 15:7926-7942. [PMID: 38817560 PMCID: PMC11134415 DOI: 10.1039/d3sc06803j] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2023] [Accepted: 04/07/2024] [Indexed: 06/01/2024] Open
Abstract
Molecular docking, a key technique in structure-based drug design, plays pivotal roles in protein-ligand interaction modeling, hit identification and optimization, in which accurate prediction of protein-ligand binding mode is essential. Conventional docking approaches perform well in redocking tasks with known protein binding pocket conformation in the complex state. However, in real-world docking scenario without knowing the protein binding conformation for a new ligand, accurately modeling the binding complex structure remains challenging as flexible docking is computationally expensive and inaccurate. Typical deep learning-based docking methods do not explicitly consider protein side chain conformations and fail to ensure the physical plausibility and detailed atomic interactions. In this study, we present DiffBindFR, a full-atom diffusion-based flexible docking model that operates over the product space of ligand overall movements and flexibility and pocket side chain torsion changes. We show that DiffBindFR has higher accuracy in producing native-like binding structures with physically plausible and detailed interactions than available docking methods. Furthermore, in the Apo and AlphaFold2 modeled structures, DiffBindFR demonstrates superior advantages in accurate ligand binding pose and protein binding conformation prediction, making it suitable for Apo and AlphaFold2 structure-based drug design. DiffBindFR provides a powerful flexible docking tool for modeling accurate protein-ligand binding structures.
Collapse
Affiliation(s)
- Jintao Zhu
- Center for Quantitative Biology, Academy for Advanced Interdisciplinary Studies, Peking University Beijing 100871 China
| | - Zhonghui Gu
- Peking-Tsinghua Center for Life Sciences, Academy for Advanced Interdisciplinary Studies, Peking University Beijing 100871 China
| | - Jianfeng Pei
- Center for Quantitative Biology, Academy for Advanced Interdisciplinary Studies, Peking University Beijing 100871 China
| | - Luhua Lai
- Center for Quantitative Biology, Academy for Advanced Interdisciplinary Studies, Peking University Beijing 100871 China
- Peking-Tsinghua Center for Life Sciences, Academy for Advanced Interdisciplinary Studies, Peking University Beijing 100871 China
- BNLMS, College of Chemistry and Molecular Engineering, Peking University Beijing 100871 China
- Peking University Chengdu Academy for Advanced Interdisciplinary Biotechnologies Chengdu Sichuan China
| |
Collapse
|
6
|
Bajorath J. Chemical and biological language models in molecular design: opportunities, risks and scientific reasoning. Future Sci OA 2024; 10:FSO957. [PMID: 38817390 PMCID: PMC11137808 DOI: 10.2144/fsoa-2023-0318] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2023] [Accepted: 01/03/2024] [Indexed: 06/01/2024] Open
Affiliation(s)
- Jürgen Bajorath
- Department of Life Science Informatics & Data Science, B-IT, LIMES Program Unit Chemical Biology & Medicinal Chemistry, Lamarr Institute for Machine Learning & Artificial Intelligence, Rheinische Friedrich-Wilhelms-Universität, Friedrich-Hirzebruch-Allee 5/6, D-53115 Bonn, Germany
| |
Collapse
|
7
|
Siebenmorgen T, Menezes F, Benassou S, Merdivan E, Didi K, Mourão ASD, Kitel R, Liò P, Kesselheim S, Piraud M, Theis FJ, Sattler M, Popowicz GM. MISATO: machine learning dataset of protein-ligand complexes for structure-based drug discovery. NATURE COMPUTATIONAL SCIENCE 2024; 4:367-378. [PMID: 38730184 PMCID: PMC11136668 DOI: 10.1038/s43588-024-00627-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/30/2023] [Accepted: 04/11/2024] [Indexed: 05/12/2024]
Abstract
Large language models have greatly enhanced our ability to understand biology and chemistry, yet robust methods for structure-based drug discovery, quantum chemistry and structural biology are still sparse. Precise biomolecule-ligand interaction datasets are urgently needed for large language models. To address this, we present MISATO, a dataset that combines quantum mechanical properties of small molecules and associated molecular dynamics simulations of ~20,000 experimental protein-ligand complexes with extensive validation of experimental data. Starting from the existing experimental structures, semi-empirical quantum mechanics was used to systematically refine these structures. A large collection of molecular dynamics traces of protein-ligand complexes in explicit water is included, accumulating over 170 μs. We give examples of machine learning (ML) baseline models proving an improvement of accuracy by employing our data. An easy entry point for ML experts is provided to enable the next generation of drug discovery artificial intelligence models.
Collapse
Affiliation(s)
- Till Siebenmorgen
- Molecular Targets and Therapeutics Center, Institute of Structural Biology, Helmholtz Munich, Neuherberg, Germany
- TUM School of Natural Sciences, Department of Bioscience, Bayerisches NMR Zentrum, Technical University of Munich, Garching, Germany
| | - Filipe Menezes
- Molecular Targets and Therapeutics Center, Institute of Structural Biology, Helmholtz Munich, Neuherberg, Germany
- TUM School of Natural Sciences, Department of Bioscience, Bayerisches NMR Zentrum, Technical University of Munich, Garching, Germany
| | - Sabrina Benassou
- Jülich Supercomputing Centre, Forschungszentrum Jülich, Jülich, Germany
| | | | - Kieran Didi
- Computer Laboratory, Cambridge University, Cambridge, UK
| | - André Santos Dias Mourão
- Molecular Targets and Therapeutics Center, Institute of Structural Biology, Helmholtz Munich, Neuherberg, Germany
- TUM School of Natural Sciences, Department of Bioscience, Bayerisches NMR Zentrum, Technical University of Munich, Garching, Germany
| | - Radosław Kitel
- Faculty of Chemistry, Jagiellonian University, Krakow, Poland
| | - Pietro Liò
- Computer Laboratory, Cambridge University, Cambridge, UK
| | - Stefan Kesselheim
- Jülich Supercomputing Centre, Forschungszentrum Jülich, Jülich, Germany
| | - Marie Piraud
- Helmholtz AI, Helmholtz Munich, Neuherberg, Germany
| | - Fabian J Theis
- Helmholtz AI, Helmholtz Munich, Neuherberg, Germany
- Computational Health Center, Institute of Computational Biology, Helmholtz Munich, Neuherberg, Germany
- TUM School of Computation, Information and Technology, Technical University of Munich, Garching, Germany
| | - Michael Sattler
- Molecular Targets and Therapeutics Center, Institute of Structural Biology, Helmholtz Munich, Neuherberg, Germany
- TUM School of Natural Sciences, Department of Bioscience, Bayerisches NMR Zentrum, Technical University of Munich, Garching, Germany
| | - Grzegorz M Popowicz
- Molecular Targets and Therapeutics Center, Institute of Structural Biology, Helmholtz Munich, Neuherberg, Germany.
- TUM School of Natural Sciences, Department of Bioscience, Bayerisches NMR Zentrum, Technical University of Munich, Garching, Germany.
| |
Collapse
|
8
|
Hong Q, Zhou G, Qin Y, Shen J, Li H. SadNet: a novel multimodal fusion network for protein-ligand binding affinity prediction. Phys Chem Chem Phys 2024; 26:12880-12891. [PMID: 38625412 DOI: 10.1039/d3cp05664c] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/17/2024]
Abstract
Protein-ligand binding affinity prediction plays an important role in the field of drug discovery. Existing deep learning-based approaches have significantly improved the efficiency of protein-ligand binding affinity prediction through their excellent inductive bias capability. However, these methods only focus on fragmented three-dimensional data, which truncates the integrity of pocket data, leading to the neglect of potential long-range interactions. In this paper, we propose a dual-stream framework, with amino acid sequence assisting the atomic data fusion for graph neural network (termed SadNet), to fuse both 3D atomic data and sequence data for more accurate prediction results. In detail, SadNet consists of a pocket module and a sequence module. The sequence module expands the "receptive field" of the pocket module through a mid-term virtual node fusion. To better integrate sequence-level information from the sequence module and 3D structural information from the pocket module, we incorporate structural information for each amino acid within the sequence module. Besides, to better understand the intrinsic relationship between sequences and 3D atomic information, our SadNet utilizes information stacking from both the early stage and later stage. Experimental results on publicly available benchmark datasets demonstrate the superiority of the proposed dual-stream approach over the state-of-the-art alternatives. The code of this work is available online at https://github.com/wardhong/SadNet.
Collapse
Affiliation(s)
- Qiansen Hong
- Nanjing University of Posts and Telecommunications, NanJing, China.
| | - Guoqiang Zhou
- Nanjing University of Posts and Telecommunications, NanJing, China.
| | - Yuke Qin
- Nanjing University of Posts and Telecommunications, NanJing, China.
| | - Jun Shen
- University of Wollongong, Australia
| | | |
Collapse
|
9
|
Atz K, Cotos L, Isert C, Håkansson M, Focht D, Hilleke M, Nippa DF, Iff M, Ledergerber J, Schiebroek CCG, Romeo V, Hiss JA, Merk D, Schneider P, Kuhn B, Grether U, Schneider G. Prospective de novo drug design with deep interactome learning. Nat Commun 2024; 15:3408. [PMID: 38649351 PMCID: PMC11035696 DOI: 10.1038/s41467-024-47613-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2023] [Accepted: 04/02/2024] [Indexed: 04/25/2024] Open
Abstract
De novo drug design aims to generate molecules from scratch that possess specific chemical and pharmacological properties. We present a computational approach utilizing interactome-based deep learning for ligand- and structure-based generation of drug-like molecules. This method capitalizes on the unique strengths of both graph neural networks and chemical language models, offering an alternative to the need for application-specific reinforcement, transfer, or few-shot learning. It enables the "zero-shot" construction of compound libraries tailored to possess specific bioactivity, synthesizability, and structural novelty. In order to proactively evaluate the deep interactome learning framework for protein structure-based drug design, potential new ligands targeting the binding site of the human peroxisome proliferator-activated receptor (PPAR) subtype gamma are generated. The top-ranking designs are chemically synthesized and computationally, biophysically, and biochemically characterized. Potent PPAR partial agonists are identified, demonstrating favorable activity and the desired selectivity profiles for both nuclear receptors and off-target interactions. Crystal structure determination of the ligand-receptor complex confirms the anticipated binding mode. This successful outcome positively advocates interactome-based de novo design for application in bioorganic and medicinal chemistry, enabling the creation of innovative bioactive molecules.
Collapse
Affiliation(s)
- Kenneth Atz
- ETH Zurich, Department of Chemistry and Applied Biosciences, Vladimir-Prelog-Weg 4, 8093, Zurich, Switzerland
| | - Leandro Cotos
- ETH Zurich, Department of Chemistry and Applied Biosciences, Vladimir-Prelog-Weg 4, 8093, Zurich, Switzerland
| | - Clemens Isert
- ETH Zurich, Department of Chemistry and Applied Biosciences, Vladimir-Prelog-Weg 4, 8093, Zurich, Switzerland
| | - Maria Håkansson
- SARomics Biostructures AB, Medicon Village, SE-223 81, Lund, Sweden
| | - Dorota Focht
- SARomics Biostructures AB, Medicon Village, SE-223 81, Lund, Sweden
| | - Mattis Hilleke
- ETH Zurich, Department of Chemistry and Applied Biosciences, Vladimir-Prelog-Weg 4, 8093, Zurich, Switzerland
| | - David F Nippa
- Roche Pharma Research and Early Development (pRED), Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd., Grenzacherstrasse 124, CH-4070, Basel, Switzerland
- Department of Pharmacy, Ludwig-Maximilians-Universität München, Butenandtstrasse 5, 81377, Munich, Germany
| | - Michael Iff
- ETH Zurich, Department of Chemistry and Applied Biosciences, Vladimir-Prelog-Weg 4, 8093, Zurich, Switzerland
| | - Jann Ledergerber
- ETH Zurich, Department of Chemistry and Applied Biosciences, Vladimir-Prelog-Weg 4, 8093, Zurich, Switzerland
| | - Carl C G Schiebroek
- ETH Zurich, Department of Chemistry and Applied Biosciences, Vladimir-Prelog-Weg 4, 8093, Zurich, Switzerland
| | - Valentina Romeo
- Roche Pharma Research and Early Development (pRED), Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd., Grenzacherstrasse 124, CH-4070, Basel, Switzerland
| | - Jan A Hiss
- ETH Zurich, Department of Chemistry and Applied Biosciences, Vladimir-Prelog-Weg 4, 8093, Zurich, Switzerland
| | - Daniel Merk
- Department of Pharmacy, Ludwig-Maximilians-Universität München, Butenandtstrasse 5, 81377, Munich, Germany
| | - Petra Schneider
- ETH Zurich, Department of Chemistry and Applied Biosciences, Vladimir-Prelog-Weg 4, 8093, Zurich, Switzerland
| | - Bernd Kuhn
- Roche Pharma Research and Early Development (pRED), Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd., Grenzacherstrasse 124, CH-4070, Basel, Switzerland
| | - Uwe Grether
- Roche Pharma Research and Early Development (pRED), Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd., Grenzacherstrasse 124, CH-4070, Basel, Switzerland
| | - Gisbert Schneider
- ETH Zurich, Department of Chemistry and Applied Biosciences, Vladimir-Prelog-Weg 4, 8093, Zurich, Switzerland.
| |
Collapse
|
10
|
Li X, Shen C, Zhu H, Yang Y, Wang Q, Yang J, Huang N. A High-Quality Data Set of Protein-Ligand Binding Interactions Via Comparative Complex Structure Modeling. J Chem Inf Model 2024; 64:2454-2466. [PMID: 38181418 DOI: 10.1021/acs.jcim.3c01170] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2024]
Abstract
High-quality protein-ligand complex structures provide the basis for understanding the nature of noncovalent binding interactions at the atomic level and enable structure-based drug design. However, experimentally determined complex structures are scarce compared with the vast chemical space. In this study, we addressed this issue by constructing the BindingNet data set via comparative complex structure modeling, which contains 69,816 modeled high-quality protein-ligand complex structures with experimental binding affinity data. BindingNet provides valuable insights into investigating protein-ligand interactions, allowing visual inspection and interpretation of structural analogues' structure-activity relationships. It can also be used for evaluating machine-learning-based scoring functions. Our results indicate that machine learning models trained on BindingNet could reduce the bias caused by buried solvent-accessible surface area, as we previously found for models trained on the PDBbind data set. We also discussed strategies to improve BindingNet and its potential utilization for benchmarking the molecular docking methods and ligand binding free energy calculation approaches. The BindingNet complements PDBbind in constructing a sufficient and unbiased protein-ligand binding data set and is freely available at http://bindingnet.huanglab.org.cn.
Collapse
Affiliation(s)
- Xuelian Li
- National Institute of Biological Sciences, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing 100730, China
- National Institute of Biological Sciences, 7 Science Park Road, Zhongguancun Life Science Park, Beijing 102206, China
| | - Cheng Shen
- National Institute of Biological Sciences, 7 Science Park Road, Zhongguancun Life Science Park, Beijing 102206, China
| | - Hui Zhu
- National Institute of Biological Sciences, 7 Science Park Road, Zhongguancun Life Science Park, Beijing 102206, China
- Tsinghua Institute of Multidisciplinary Biomedical Research, Tsinghua University, Beijing 102206, China
| | - Yujian Yang
- National Institute of Biological Sciences, 7 Science Park Road, Zhongguancun Life Science Park, Beijing 102206, China
| | - Qing Wang
- National Institute of Biological Sciences, 7 Science Park Road, Zhongguancun Life Science Park, Beijing 102206, China
| | - Jincai Yang
- National Institute of Biological Sciences, 7 Science Park Road, Zhongguancun Life Science Park, Beijing 102206, China
| | - Niu Huang
- National Institute of Biological Sciences, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing 100730, China
- National Institute of Biological Sciences, 7 Science Park Road, Zhongguancun Life Science Park, Beijing 102206, China
- Tsinghua Institute of Multidisciplinary Biomedical Research, Tsinghua University, Beijing 102206, China
| |
Collapse
|
11
|
Gorantla R, Kubincová A, Weiße AY, Mey ASJS. From Proteins to Ligands: Decoding Deep Learning Methods for Binding Affinity Prediction. J Chem Inf Model 2024; 64:2496-2507. [PMID: 37983381 PMCID: PMC11005465 DOI: 10.1021/acs.jcim.3c01208] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2023] [Revised: 10/26/2023] [Accepted: 10/27/2023] [Indexed: 11/22/2023]
Abstract
Accurate in silico prediction of protein-ligand binding affinity is important in the early stages of drug discovery. Deep learning-based methods exist but have yet to overtake more conventional methods such as giga-docking largely due to their lack of generalizability. To improve generalizability, we need to understand what these models learn from input protein and ligand data. We systematically investigated a sequence-based deep learning framework to assess the impact of protein and ligand encodings on predicting binding affinities for commonly used kinase data sets. The role of proteins is studied using convolutional neural network-based encodings obtained from sequences and graph neural network-based encodings enriched with structural information from contact maps. Ligand-based encodings are generated from graph-neural networks. We test different ligand perturbations by randomizing node and edge properties. For proteins, we make use of 3 different protein contact generation methods (AlphaFold2, Pconsc4, and ESM-1b) and compare these with a random control. Our investigation shows that protein encodings do not substantially impact the binding predictions, with no statistically significant difference in binding affinity for KIBA in the investigated metrics (concordance index, Pearson's R Spearman's Rank, and RMSE). Significant differences are seen for ligand encodings with random ligands and random ligand node properties, suggesting a much bigger reliance on ligand data for the learning tasks. Using different ways to combine protein and ligand encodings did not show a significant change in performance.
Collapse
Affiliation(s)
- Rohan Gorantla
- School
of Informatics, University of Edinburgh, Edinburgh, EH8 9AB, U.K.
- EaStCHEM
School of Chemistry, University of Edinburgh, Edinburgh, EH9 3FJ, U.K.
| | | | - Andrea Y. Weiße
- School
of Informatics, University of Edinburgh, Edinburgh, EH8 9AB, U.K.
- School
of Biological Sciences, University of Edinburgh, Edinburgh, EH9 3FF, U.K.
| | | |
Collapse
|
12
|
Brocidiacono M, Francoeur P, Aggarwal R, Popov KI, Koes DR, Tropsha A. BigBind: Learning from Nonstructural Data for Structure-Based Virtual Screening. J Chem Inf Model 2024; 64:2488-2495. [PMID: 38113513 DOI: 10.1021/acs.jcim.3c01211] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2023]
Abstract
Deep learning methods that predict protein-ligand binding have recently been used for structure-based virtual screening. Many such models have been trained using protein-ligand complexes with known crystal structures and activities from the PDBBind data set. However, because PDBbind only includes 20K complexes, models typically fail to generalize to new targets, and model performance is on par with models trained with only ligand information. Conversely, the ChEMBL database contains a wealth of chemical activity information but includes no information about binding poses. We introduce BigBind, a data set that maps ChEMBL activity data to proteins from the CrossDocked data set. BigBind comprises 583 K ligand activities and includes 3D structures of the protein binding pockets. Additionally, we augmented the data by adding an equal number of putative inactives for each target. Using this data, we developed Banana (basic neural network for binding affinity), a neural network-based model to classify active from inactive compounds, defined by a 10 μM cutoff. Our model achieved an AUC of 0.72 on BigBind's test set, while a ligand-only model achieved an AUC of 0.59. Furthermore, Banana achieved competitive performance on the LIT-PCBA benchmark (median EF1% 1.81) while running 16,000 times faster than molecular docking with Gnina. We suggest that Banana, as well as other models trained on this data set, will significantly improve the outcomes of prospective virtual screening tasks.
Collapse
Affiliation(s)
- Michael Brocidiacono
- Eshelman School of Pharmacy, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599, United States
| | - Paul Francoeur
- Department of Computational and Systems Biology, University of Pittsburgh, Pittsburgh, Pennsylvania 15260, United States
| | - Rishal Aggarwal
- Department of Computational and Systems Biology, University of Pittsburgh, Pittsburgh, Pennsylvania 15260, United States
| | - Konstantin I Popov
- Eshelman School of Pharmacy, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599, United States
| | - David Ryan Koes
- Department of Computational and Systems Biology, University of Pittsburgh, Pittsburgh, Pennsylvania 15260, United States
| | - Alexander Tropsha
- Eshelman School of Pharmacy, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599, United States
| |
Collapse
|
13
|
Zhang X, Gao H, Wang H, Chen Z, Zhang Z, Chen X, Li Y, Qi Y, Wang R. PLANET: A Multi-objective Graph Neural Network Model for Protein-Ligand Binding Affinity Prediction. J Chem Inf Model 2024; 64:2205-2220. [PMID: 37319418 DOI: 10.1021/acs.jcim.3c00253] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/17/2023]
Abstract
Predicting protein-ligand binding affinity is a central issue in drug design. Various deep learning models have been published in recent years, where many of them rely on 3D protein-ligand complex structures as input and tend to focus on the single task of reproducing binding affinity. In this study, we have developed a graph neural network model called PLANET (Protein-Ligand Affinity prediction NETwork). This model takes the graph-represented 3D structure of the binding pocket on the target protein and the 2D chemical structure of the ligand molecule as input. It was trained through a multi-objective process with three related tasks, including deriving the protein-ligand binding affinity, protein-ligand contact map, and ligand distance matrix. Besides the protein-ligand complexes with known binding affinity data retrieved from the PDBbind database, a large number of non-binder decoys were also added to the training data for deriving the final model of PLANET. When tested on the CASF-2016 benchmark, PLANET exhibited a scoring power comparable to the best result yielded by other deep learning models as well as a reasonable ranking power and docking power. In virtual screening trials conducted on the DUD-E benchmark, PLANET's performance was notably better than several deep learning and machine learning models. As on the LIT-PCBA benchmark, PLANET achieved comparable accuracy as the conventional docking program Glide, but it only spent less than 1% of Glide's computation time to finish the same job because PLANET did not need exhaustive conformational sampling. Considering the decent accuracy and efficiency of PLANET in binding affinity prediction, it may become a useful tool for conducting large-scale virtual screening.
Collapse
Affiliation(s)
- Xiangying Zhang
- Department of Medicinal Chemistry, School of Pharmacy, Fudan University, 826 Zhangheng Road, Shanghai 201203, People's Republic of China
| | - Haotian Gao
- Department of Medicinal Chemistry, School of Pharmacy, Fudan University, 826 Zhangheng Road, Shanghai 201203, People's Republic of China
| | - Haojie Wang
- Department of Medicinal Chemistry, School of Pharmacy, Fudan University, 826 Zhangheng Road, Shanghai 201203, People's Republic of China
| | - Zhihang Chen
- Department of Medicinal Chemistry, School of Pharmacy, Fudan University, 826 Zhangheng Road, Shanghai 201203, People's Republic of China
| | - Zhe Zhang
- Department of Medicinal Chemistry, School of Pharmacy, Fudan University, 826 Zhangheng Road, Shanghai 201203, People's Republic of China
| | - Xinchong Chen
- Department of Medicinal Chemistry, School of Pharmacy, Fudan University, 826 Zhangheng Road, Shanghai 201203, People's Republic of China
| | - Yan Li
- Department of Medicinal Chemistry, School of Pharmacy, Fudan University, 826 Zhangheng Road, Shanghai 201203, People's Republic of China
| | - Yifei Qi
- Department of Medicinal Chemistry, School of Pharmacy, Fudan University, 826 Zhangheng Road, Shanghai 201203, People's Republic of China
| | - Renxiao Wang
- Department of Medicinal Chemistry, School of Pharmacy, Fudan University, 826 Zhangheng Road, Shanghai 201203, People's Republic of China
| |
Collapse
|
14
|
Luo D, Liu D, Qu X, Dong L, Wang B. Enhancing Generalizability in Protein-Ligand Binding Affinity Prediction with Multimodal Contrastive Learning. J Chem Inf Model 2024; 64:1892-1906. [PMID: 38441880 DOI: 10.1021/acs.jcim.3c01961] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/26/2024]
Abstract
Improving the generalization ability of scoring functions remains a major challenge in protein-ligand binding affinity prediction. Many machine learning methods are limited by their reliance on single-modal representations, hindering a comprehensive understanding of protein-ligand interactions. We introduce a graph-neural-network-based scoring function that utilizes a triplet contrastive learning loss to improve protein-ligand representations. In this model, three-dimensional complex representations and the fusion of two-dimensional ligand and coarse-grained pocket representations converge while distancing from decoy representations in latent space. After rigorous validation on multiple external data sets, our model exhibits commendable generalization capabilities compared to those of other deep learning-based scoring functions, marking it as a promising tool in the realm of drug discovery. In the future, our training framework can be extended to other biophysical- and biochemical-related problems such as protein-protein interaction and protein mutation prediction.
Collapse
Affiliation(s)
- Ding Luo
- State Key Laboratory of Physical Chemistry of Solid Surfaces and Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, P. R. China
| | - Dandan Liu
- State Key Laboratory of Physical Chemistry of Solid Surfaces and Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, P. R. China
| | - Xiaoyang Qu
- School of Pharmacy and Medical Technology, Putian University, Putian 351100, P. R. China
- Key Laboratory of Pharmaceutical Analysis and Laboratory Medicine (Putian University), Fujian Province University, Putian 351100, P. R. China
| | - Lina Dong
- State Key Laboratory of Physical Chemistry of Solid Surfaces and Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, P. R. China
| | - Binju Wang
- State Key Laboratory of Physical Chemistry of Solid Surfaces and Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, P. R. China
- Innovation Laboratory for Sciences and Technologies of Energy Materials of Fujian Province (IKKEM), Xiamen 361005, P. R. China
| |
Collapse
|
15
|
Metcalf DP, Glick ZL, Bortolato A, Jiang A, Cheney DL, Sherrill CD. Directional Δ G Neural Network (DrΔ G-Net): A Modular Neural Network Approach to Binding Free Energy Prediction. J Chem Inf Model 2024; 64:1907-1918. [PMID: 38470995 DOI: 10.1021/acs.jcim.3c02054] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/14/2024]
Abstract
The protein-ligand binding free energy is a central quantity in structure-based computational drug discovery efforts. Although popular alchemical methods provide sound statistical means of computing the binding free energy of a large breadth of systems, they are generally too costly to be applied at the same frequency as end point or ligand-based methods. By contrast, these data-driven approaches are typically fast enough to address thousands of systems but with reduced transferability to unseen systems. We introduce DrΔG-Net (or simply Dragnet), an equivariant graph neural network that can blend ligand-based and protein-ligand data-driven approaches. It is based on a 3D fingerprint representation of the ligand alone and in complex with the protein target. Dragnet is a global scoring function to predict the binding affinity of arbitrary protein-ligand complexes, but can be easily tuned via transfer learning to specific systems or end points, performing similarly to common 2D ligand-based approaches in these tasks. Dragnet is evaluated on a total of 28 validation proteins with a set of congeneric ligands derived from the Binding DB and one custom set extracted from the ChEMBL Database. In general, a handful of experimental binding affinities are sufficient to optimize the scoring function for a particular protein and ligand scaffold. When not available, predictions from physics-based methods such as absolute free energy perturbation can be used for the transfer learning tuning of Dragnet. Furthermore, we use our data to illustrate the present limitations of data-driven modeling of binding free energy predictions.
Collapse
Affiliation(s)
- Derek P Metcalf
- Center for Computational Molecular Science and Technology, School of Chemistry and Biochemistry and School of Computational Science and Engineering, Georgia Institute of Technology, Atlanta, Georgia 30332-0400, United States
| | - Zachary L Glick
- Center for Computational Molecular Science and Technology, School of Chemistry and Biochemistry and School of Computational Science and Engineering, Georgia Institute of Technology, Atlanta, Georgia 30332-0400, United States
| | - Andrea Bortolato
- Molecular Structure and Design, Bristol-Myers Squibb Company, P.O. Box 5400, Princeton, New Jersey 08543, United States
| | - Andy Jiang
- Center for Computational Molecular Science and Technology, School of Chemistry and Biochemistry and School of Computational Science and Engineering, Georgia Institute of Technology, Atlanta, Georgia 30332-0400, United States
| | - Daniel L Cheney
- Molecular Structure and Design, Bristol-Myers Squibb Company, P.O. Box 5400, Princeton, New Jersey 08543, United States
| | - C David Sherrill
- Center for Computational Molecular Science and Technology, School of Chemistry and Biochemistry and School of Computational Science and Engineering, Georgia Institute of Technology, Atlanta, Georgia 30332-0400, United States
| |
Collapse
|
16
|
Gangwal A, Ansari A, Ahmad I, Azad AK, Kumarasamy V, Subramaniyan V, Wong LS. Generative artificial intelligence in drug discovery: basic framework, recent advances, challenges, and opportunities. Front Pharmacol 2024; 15:1331062. [PMID: 38384298 PMCID: PMC10879372 DOI: 10.3389/fphar.2024.1331062] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2023] [Accepted: 01/17/2024] [Indexed: 02/23/2024] Open
Abstract
There are two main ways to discover or design small drug molecules. The first involves fine-tuning existing molecules or commercially successful drugs through quantitative structure-activity relationships and virtual screening. The second approach involves generating new molecules through de novo drug design or inverse quantitative structure-activity relationship. Both methods aim to get a drug molecule with the best pharmacokinetic and pharmacodynamic profiles. However, bringing a new drug to market is an expensive and time-consuming endeavor, with the average cost being estimated at around $2.5 billion. One of the biggest challenges is screening the vast number of potential drug candidates to find one that is both safe and effective. The development of artificial intelligence in recent years has been phenomenal, ushering in a revolution in many fields. The field of pharmaceutical sciences has also significantly benefited from multiple applications of artificial intelligence, especially drug discovery projects. Artificial intelligence models are finding use in molecular property prediction, molecule generation, virtual screening, synthesis planning, repurposing, among others. Lately, generative artificial intelligence has gained popularity across domains for its ability to generate entirely new data, such as images, sentences, audios, videos, novel chemical molecules, etc. Generative artificial intelligence has also delivered promising results in drug discovery and development. This review article delves into the fundamentals and framework of various generative artificial intelligence models in the context of drug discovery via de novo drug design approach. Various basic and advanced models have been discussed, along with their recent applications. The review also explores recent examples and advances in the generative artificial intelligence approach, as well as the challenges and ongoing efforts to fully harness the potential of generative artificial intelligence in generating novel drug molecules in a faster and more affordable manner. Some clinical-level assets generated form generative artificial intelligence have also been discussed in this review to show the ever-increasing application of artificial intelligence in drug discovery through commercial partnerships.
Collapse
Affiliation(s)
- Amit Gangwal
- Department of Natural Product Chemistry, Shri Vile Parle Kelavani Mandal’s Institute of Pharmacy, Dhule, Maharashtra, India
| | - Azim Ansari
- Computer Aided Drug Design Center Shri Vile Parle Kelavani Mandal’s Institute of Pharmacy, Dhule, Maharashtra, India
| | - Iqrar Ahmad
- Department of Pharmaceutical Chemistry, Prof. Ravindra Nikam College of Pharmacy, Dhule, India
| | - Abul Kalam Azad
- Faculty of Pharmacy, University College of MAIWP International, Batu Caves, Malaysia
| | - Vinoth Kumarasamy
- Department of Parasitology and Medical Entomology, Faculty of Medicine, Universiti Kebangsaan Malaysia, Cheras, Malaysia
| | - Vetriselvan Subramaniyan
- Pharmacology Unit, Jeffrey Cheah School of Medicine and Health Sciences, Monash University Malaysia, Selangor, Malaysia
- School of Bioengineering and Biosciences, Lovely Professional University, Phagwara, Punjab, India
| | - Ling Shing Wong
- Faculty of Health and Life Sciences, INTI International University, Nilai, Malaysia
| |
Collapse
|
17
|
Isert C, Atz K, Riniker S, Schneider G. Exploring protein-ligand binding affinity prediction with electron density-based geometric deep learning. RSC Adv 2024; 14:4492-4502. [PMID: 38312732 PMCID: PMC10835705 DOI: 10.1039/d3ra08650j] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2023] [Accepted: 01/19/2024] [Indexed: 02/06/2024] Open
Abstract
Rational structure-based drug design relies on accurate predictions of protein-ligand binding affinity from structural molecular information. Although deep learning-based methods for predicting binding affinity have shown promise in computational drug design, certain approaches have faced criticism for their potential to inadequately capture the fundamental physical interactions between ligands and their macromolecular targets or for being susceptible to dataset biases. Herein, we propose to include bond-critical points based on the electron density of a protein-ligand complex as a fundamental physical representation of protein-ligand interactions. Employing a geometric deep learning model, we explore the usefulness of these bond-critical points to predict absolute binding affinities of protein-ligand complexes, benchmark model performance against existing methods, and provide a critical analysis of this new approach. The models achieved root-mean-squared errors of 1.4-1.8 log units on the PDBbind dataset, and 1.0-1.7 log units on the PDE10A dataset, not indicating significant advantages over benchmark methods, and thus rendering the utility of electron density for deep learning models context-dependent. The relationship between intermolecular electron density and corresponding binding affinity was analyzed, and Pearson correlation coefficients r > 0.7 were obtained for several macromolecular targets.
Collapse
Affiliation(s)
- Clemens Isert
- ETH Zurich, Department of Chemistry and Applied Biosciences Vladimir-Prelog-Weg 4 8093 Zurich Switzerland +41 44 633 73 27
| | - Kenneth Atz
- ETH Zurich, Department of Chemistry and Applied Biosciences Vladimir-Prelog-Weg 4 8093 Zurich Switzerland +41 44 633 73 27
| | - Sereina Riniker
- ETH Zurich, Department of Chemistry and Applied Biosciences Vladimir-Prelog-Weg 4 8093 Zurich Switzerland +41 44 633 73 27
| | - Gisbert Schneider
- ETH Zurich, Department of Chemistry and Applied Biosciences Vladimir-Prelog-Weg 4 8093 Zurich Switzerland +41 44 633 73 27
| |
Collapse
|
18
|
Voitsitskyi T, Bdzhola V, Stratiichuk R, Koleiev I, Ostrovsky Z, Vozniak V, Khropachov I, Henitsoi P, Popryho L, Zhytar R, Yesylevskyy S, Nafiiev A, Starosyla S. Augmenting a training dataset of the generative diffusion model for molecular docking with artificial binding pockets. RSC Adv 2024; 14:1341-1353. [PMID: 38174256 PMCID: PMC10763617 DOI: 10.1039/d3ra08147h] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2023] [Accepted: 12/21/2023] [Indexed: 01/05/2024] Open
Abstract
This study introduces the PocketCFDM generative diffusion model, aimed at improving the prediction of small molecule poses in the protein binding pockets. The model utilizes a novel data augmentation technique, involving the creation of numerous artificial binding pockets that mimic the statistical patterns of non-bond interactions found in actual protein-ligand complexes. An algorithmic method was developed to assess and replicate these interaction patterns in the artificial binding pockets built around small molecule conformers. It is shown that the integration of artificial binding pockets into the training process significantly enhanced the model's performance. Notably, PocketCFDM surpassed DiffDock in terms of non-bond interaction and steric clash numbers, and the inference speed. Future developments and optimizations of the model are discussed. The inference code and final model weights of PocketCFDM are accessible publicly via the GitHub repository: https://github.com/vtarasv/pocket-cfdm.git.
Collapse
Affiliation(s)
- Taras Voitsitskyi
- Receptor.AI Inc. 20-22 Wenlock Road London N1 7GU UK
- Department of Physics of Biological Systems, Institute of Physics of The National Academy of Sciences of Ukraine 46 Nauky Ave. Kyiv 03038 Ukraine
| | - Volodymyr Bdzhola
- Institute of Molecular Biology and Genetics of The National Academy of Sciences of Ukraine 150 Zabolotnogo Str. Kyiv 03143 Ukraine
| | - Roman Stratiichuk
- Receptor.AI Inc. 20-22 Wenlock Road London N1 7GU UK
- Department of Biophysics and Medical Informatics, Educational and Scientific Centre "Institute of Biology and Medicine", Taras Shevchenko Kyiv National University 64 Volodymyrska Str. Kyiv 01601 Ukraine
| | - Ihor Koleiev
- Receptor.AI Inc. 20-22 Wenlock Road London N1 7GU UK
- Department of Physics of Biological Systems, Institute of Physics of The National Academy of Sciences of Ukraine 46 Nauky Ave. Kyiv 03038 Ukraine
| | | | | | | | | | | | - Roman Zhytar
- Receptor.AI Inc. 20-22 Wenlock Road London N1 7GU UK
| | - Semen Yesylevskyy
- Receptor.AI Inc. 20-22 Wenlock Road London N1 7GU UK
- Institute of Organic Chemistry and Biochemistry, Czech Academy of Sciences Prague 6 CZ-166 10 Czech Republic
- Department of Physics of Biological Systems, Institute of Physics of The National Academy of Sciences of Ukraine 46 Nauky Ave. Kyiv 03038 Ukraine
- Department of Physical Chemistry, Faculty of Science, Palacký University Olomouc 17 listopadu 12 Olomouc 771 46 Czech Republic
| | - Alan Nafiiev
- Receptor.AI Inc. 20-22 Wenlock Road London N1 7GU UK
| | | |
Collapse
|
19
|
Zhou H, Fu H, Shao X, Cai W. Binding Thermodynamics of Fourth-Generation EGFR Inhibitors Revealed by Absolute Binding Free Energy Calculations. J Chem Inf Model 2023; 63:7837-7846. [PMID: 38054791 DOI: 10.1021/acs.jcim.3c01636] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/07/2023]
Abstract
The overexpression or mutation of the kinase domain of the epidermal growth factor receptor (EGFR) is strongly associated with non-small-cell lung cancer (NSCLC). EGFR tyrosine kinase inhibitors (TKIs) have proven to be effective in treating NSCLC patients. However, EGFR mutations can result in drug resistance. To elucidate the mechanisms underlying this resistance and inform future drug development, we examined the binding affinities of BLU-945, a recently reported fourth-generation TKI, to wild-type EGFR (EGFRWT) and its double-mutant (L858R/T790M; EGFRDM) and triple-mutant (L858R/T790M/C797S; EGFRTM) forms. We compared the binding affinities of BLU-945, BLU-945 analogues, CH7233163 (another fourth-generation TKI), and erlotinib (a first-generation TKI) using absolute binding free energy calculations. Our findings reveal that BLU-945 and CH7233163 exhibit binding affinities to both EGFRDM and EGFRTM stronger than those of erlotinib, corroborating experimental data. We identified K745 and T854 as the key residues in the binding of fourth-generation EGFR TKIs. Electrostatic forces were the predominant driving force for the binding of fourth-generation TKIs to EGFR mutants. Furthermore, we discovered that the incorporation of piperidinol and sulfone groups in BLU-945 substantially enhanced its binding capacity to EGFR mutants. Our study offers valuable theoretical insights for optimizing fourth-generation EGFR TKIs.
Collapse
Affiliation(s)
- Huaxin Zhou
- Research Center for Analytical Sciences, Tianjin Key Laboratory of Biosensing and Molecular Recognition, State Key Laboratory of Medicinal Chemical Biology, College of Chemistry, Nankai University, Tianjin 300071, China
- Haihe Laboratory of Sustainable Chemical Transformations, Tianjin 300192, China
| | - Haohao Fu
- Research Center for Analytical Sciences, Tianjin Key Laboratory of Biosensing and Molecular Recognition, State Key Laboratory of Medicinal Chemical Biology, College of Chemistry, Nankai University, Tianjin 300071, China
- Haihe Laboratory of Sustainable Chemical Transformations, Tianjin 300192, China
- School of Materials Science and Engineering, Smart Sensing Interdisciplinary Science Center, Nankai University, Tianjin 300350, China
| | - Xueguang Shao
- Research Center for Analytical Sciences, Tianjin Key Laboratory of Biosensing and Molecular Recognition, State Key Laboratory of Medicinal Chemical Biology, College of Chemistry, Nankai University, Tianjin 300071, China
- Haihe Laboratory of Sustainable Chemical Transformations, Tianjin 300192, China
- School of Materials Science and Engineering, Smart Sensing Interdisciplinary Science Center, Nankai University, Tianjin 300350, China
| | - Wensheng Cai
- Research Center for Analytical Sciences, Tianjin Key Laboratory of Biosensing and Molecular Recognition, State Key Laboratory of Medicinal Chemical Biology, College of Chemistry, Nankai University, Tianjin 300071, China
- Haihe Laboratory of Sustainable Chemical Transformations, Tianjin 300192, China
- School of Materials Science and Engineering, Smart Sensing Interdisciplinary Science Center, Nankai University, Tianjin 300350, China
| |
Collapse
|
20
|
Xi C, Diao J, Moon TS. Advances in ligand-specific biosensing for structurally similar molecules. Cell Syst 2023; 14:1024-1043. [PMID: 38128482 PMCID: PMC10751988 DOI: 10.1016/j.cels.2023.10.009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2023] [Revised: 08/23/2023] [Accepted: 10/19/2023] [Indexed: 12/23/2023]
Abstract
The specificity of biological systems makes it possible to develop biosensors targeting specific metabolites, toxins, and pollutants in complex medical or environmental samples without interference from structurally similar compounds. For the last two decades, great efforts have been devoted to creating proteins or nucleic acids with novel properties through synthetic biology strategies. Beyond augmenting biocatalytic activity, expanding target substrate scopes, and enhancing enzymes' enantioselectivity and stability, an increasing research area is the enhancement of molecular specificity for genetically encoded biosensors. Here, we summarize recent advances in the development of highly specific biosensor systems and their essential applications. First, we describe the rational design principles required to create libraries containing potential mutants with less promiscuity or better specificity. Next, we review the emerging high-throughput screening techniques to engineer biosensing specificity for the desired target. Finally, we examine the computer-aided evaluation and prediction methods to facilitate the construction of ligand-specific biosensors.
Collapse
Affiliation(s)
- Chenggang Xi
- Department of Energy, Environmental and Chemical Engineering, Washington University in St. Louis, St. Louis, MO, USA
| | - Jinjin Diao
- Department of Energy, Environmental and Chemical Engineering, Washington University in St. Louis, St. Louis, MO, USA
| | - Tae Seok Moon
- Department of Energy, Environmental and Chemical Engineering, Washington University in St. Louis, St. Louis, MO, USA; Division of Biology and Biomedical Sciences, Washington University in St. Louis, St. Louis, MO, USA.
| |
Collapse
|
21
|
Aly Abdelkader G, Ngnamsie Njimbouom S, Oh TJ, Kim JD. ResBiGAAT: Residual Bi-GRU with attention for protein-ligand binding affinity prediction. Comput Biol Chem 2023; 107:107969. [PMID: 37866117 DOI: 10.1016/j.compbiolchem.2023.107969] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2023] [Revised: 09/20/2023] [Accepted: 10/05/2023] [Indexed: 10/24/2023]
Abstract
Protein-ligand interaction plays a crucial role in drug discovery, facilitating efficient drug development and enabling drug repurposing. Several computational algorithms, such as Graph Neural Networks and Convolutional Neural Networks, have been proposed to predict the binding affinity using the three-dimensional structure of ligands and proteins. However, there are limitations due to the need for experimental characterization of the three-dimensional structure of protein sequences, which is still lacking for some proteins. Moreover, these models often suffer from unnecessary complexity, resulting in extraneous computations. This study presents ResBiGAAT, a novel deep learning model that combines a deep Residual Bidirectional Gated Recurrent Unit with two-sided self-attention mechanisms. ResBiGAAT leverages protein and ligand sequence-level features and their physicochemical properties to efficiently predict protein-ligand binding affinity. Through rigorous evaluation using 5-fold cross-validation, we demonstrate the performance of our proposed approach. The model exhibits competitive performance on an external dataset, highlighting its generalizability. Our publicly available web interface, located at resbigaat.streamlit.app, allows users to conveniently input protein and ligand sequences to estimate binding affinity.
Collapse
Affiliation(s)
- Gelany Aly Abdelkader
- Department of Computer Science and Electronic Engineering, Sun Moon University, Asan 31460, the Republic of Korea
| | - Soualihou Ngnamsie Njimbouom
- Department of Computer Science and Electronic Engineering, Sun Moon University, Asan 31460, the Republic of Korea
| | - Tae-Jin Oh
- Genome‑based BioIT Convergence Institute, Asan 31460, the Republic of Korea; Department of Pharmaceutical Engineering and Biotechnology, Sun Moon University, Asan 31460, the Republic of Korea
| | - Jeong-Dong Kim
- Department of Computer Science and Electronic Engineering, Sun Moon University, Asan 31460, the Republic of Korea; Division of Computer Science and Engineering, Sun Moon University, Asan 31460, the Republic of Korea.
| |
Collapse
|
22
|
McGibbon M, Shave S, Dong J, Gao Y, Houston DR, Xie J, Yang Y, Schwaller P, Blay V. From intuition to AI: evolution of small molecule representations in drug discovery. Brief Bioinform 2023; 25:bbad422. [PMID: 38033290 PMCID: PMC10689004 DOI: 10.1093/bib/bbad422] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2023] [Revised: 10/13/2023] [Accepted: 11/01/2023] [Indexed: 12/02/2023] Open
Abstract
Within drug discovery, the goal of AI scientists and cheminformaticians is to help identify molecular starting points that will develop into safe and efficacious drugs while reducing costs, time and failure rates. To achieve this goal, it is crucial to represent molecules in a digital format that makes them machine-readable and facilitates the accurate prediction of properties that drive decision-making. Over the years, molecular representations have evolved from intuitive and human-readable formats to bespoke numerical descriptors and fingerprints, and now to learned representations that capture patterns and salient features across vast chemical spaces. Among these, sequence-based and graph-based representations of small molecules have become highly popular. However, each approach has strengths and weaknesses across dimensions such as generality, computational cost, inversibility for generative applications and interpretability, which can be critical in informing practitioners' decisions. As the drug discovery landscape evolves, opportunities for innovation continue to emerge. These include the creation of molecular representations for high-value, low-data regimes, the distillation of broader biological and chemical knowledge into novel learned representations and the modeling of up-and-coming therapeutic modalities.
Collapse
Affiliation(s)
- Miles McGibbon
- Institute of Quantitative Biology, Biochemistry and Biotechnology, University of Edinburgh, Edinburgh, Scotland EH9 3BF, United Kingdom
| | - Steven Shave
- Institute of Quantitative Biology, Biochemistry and Biotechnology, University of Edinburgh, Edinburgh, Scotland EH9 3BF, United Kingdom
| | - Jie Dong
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha, 410013, China
| | - Yumiao Gao
- Institute of Quantitative Biology, Biochemistry and Biotechnology, University of Edinburgh, Edinburgh, Scotland EH9 3BF, United Kingdom
| | - Douglas R Houston
- Institute of Quantitative Biology, Biochemistry and Biotechnology, University of Edinburgh, Edinburgh, Scotland EH9 3BF, United Kingdom
| | - Jiancong Xie
- Key Laboratory of Machine Intelligence and Advanced Computing, Sun Yat-Sen University, Guangzhou, 510000, China
| | - Yuedong Yang
- Key Laboratory of Machine Intelligence and Advanced Computing, Sun Yat-Sen University, Guangzhou, 510000, China
| | - Philippe Schwaller
- Laboratory of Artificial Chemical Intelligence (LIAC), Institut des Sciences et Ingénierie Chimiques, Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
| | - Vincent Blay
- Institute of Quantitative Biology, Biochemistry and Biotechnology, University of Edinburgh, Edinburgh, Scotland EH9 3BF, United Kingdom
| |
Collapse
|
23
|
Libouban PY, Aci-Sèche S, Gómez-Tamayo JC, Tresadern G, Bonnet P. The Impact of Data on Structure-Based Binding Affinity Predictions Using Deep Neural Networks. Int J Mol Sci 2023; 24:16120. [PMID: 38003312 PMCID: PMC10671244 DOI: 10.3390/ijms242216120] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2023] [Revised: 10/30/2023] [Accepted: 11/01/2023] [Indexed: 11/26/2023] Open
Abstract
Artificial intelligence (AI) has gained significant traction in the field of drug discovery, with deep learning (DL) algorithms playing a crucial role in predicting protein-ligand binding affinities. Despite advancements in neural network architectures, system representation, and training techniques, the performance of DL affinity prediction has reached a plateau, prompting the question of whether it is truly solved or if the current performance is overly optimistic and reliant on biased, easily predictable data. Like other DL-related problems, this issue seems to stem from the training and test sets used when building the models. In this work, we investigate the impact of several parameters related to the input data on the performance of neural network affinity prediction models. Notably, we identify the size of the binding pocket as a critical factor influencing the performance of our statistical models; furthermore, it is more important to train a model with as much data as possible than to restrict the training to only high-quality datasets. Finally, we also confirm the bias in the typically used current test sets. Therefore, several types of evaluation and benchmarking are required to understand models' decision-making processes and accurately compare the performance of models.
Collapse
Affiliation(s)
- Pierre-Yves Libouban
- Institute of Organic and Analytical Chemistry (ICOA), UMR7311, Université d’Orléans, CNRS, Pôle de Chimie rue de Chartres, 45067 Orléans, CEDEX 2, France; (P.-Y.L.); (S.A.-S.)
| | - Samia Aci-Sèche
- Institute of Organic and Analytical Chemistry (ICOA), UMR7311, Université d’Orléans, CNRS, Pôle de Chimie rue de Chartres, 45067 Orléans, CEDEX 2, France; (P.-Y.L.); (S.A.-S.)
| | - Jose Carlos Gómez-Tamayo
- Computational Chemistry, Janssen Research & Development, Janssen Pharmaceutica N. V., B-2340 Beerse, Belgium; (J.C.G.-T.); (G.T.)
| | - Gary Tresadern
- Computational Chemistry, Janssen Research & Development, Janssen Pharmaceutica N. V., B-2340 Beerse, Belgium; (J.C.G.-T.); (G.T.)
| | - Pascal Bonnet
- Institute of Organic and Analytical Chemistry (ICOA), UMR7311, Université d’Orléans, CNRS, Pôle de Chimie rue de Chartres, 45067 Orléans, CEDEX 2, France; (P.-Y.L.); (S.A.-S.)
| |
Collapse
|
24
|
Wu J, Chen H, Cheng M, Xiong H. CurvAGN: Curvature-based Adaptive Graph Neural Networks for Predicting Protein-Ligand Binding Affinity. BMC Bioinformatics 2023; 24:378. [PMID: 37798653 PMCID: PMC10557336 DOI: 10.1186/s12859-023-05503-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2023] [Accepted: 09/28/2023] [Indexed: 10/07/2023] Open
Abstract
Accurately predicting the binding affinity between proteins and ligands is crucial for drug discovery. Recent advances in graph neural networks (GNNs) have made significant progress in learning representations of protein-ligand complexes to estimate binding affinities. To improve the performance of GNNs, there frequently needs to look into protein-ligand complexes from geometric perspectives. While the "off-the-shelf" GNNs could incorporate some basic geometric structures of molecules, such as distances and angles, through modeling the complexes as homophilic graphs, these solutions seldom take into account the higher-level geometric attributes like curvatures and homology, and also heterophilic interactions.To address these limitations, we introduce the Curvature-based Adaptive Graph Neural Network (CurvAGN). This GNN comprises two components: a curvature block and an adaptive attention guided neural block (AGN). The curvature block encodes multiscale curvature informaton, then the AGN, based on an adaptive graph attention mechanism, incorporates geometry structure including angle, distance, and multiscale curvature, long-range molecular interactions, and heterophily of the graph into the protein-ligand complex representation. We demonstrate the superiority of our proposed model through experiments conducted on the PDBbind-V2016 core dataset.
Collapse
Affiliation(s)
- Jianqiu Wu
- Research Center for Graph Computing, Zhejiang Lab, Yuhang, Hangzhou, 311121, Zhejiang, China
| | - Hongyang Chen
- Research Center for Graph Computing, Zhejiang Lab, Yuhang, Hangzhou, 311121, Zhejiang, China.
| | - Minhao Cheng
- Department of Computer Science and Engineering, Hong Kong University of Science and Technology, Jiulongwan, Hongkong, 999077, China
| | - Haoyi Xiong
- Big Data Lab, Baidu Inc., Haidian, Beijing, 100080, China
| |
Collapse
|
25
|
Shui S, Buckley S, Scheller L, Correia BE. Rational design of small-molecule responsive protein switches. Protein Sci 2023; 32:e4774. [PMID: 37656809 PMCID: PMC10510469 DOI: 10.1002/pro.4774] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2023] [Revised: 08/26/2023] [Accepted: 08/29/2023] [Indexed: 09/03/2023]
Abstract
Small-molecule responsive protein switches are powerful tools for controlling cellular processes. These switches are designed to respond rapidly and specifically to their inducer. They have been used in numerous applications, including the regulation of gene expression, post-translational protein modification, and signal transduction. Typically, small-molecule responsive protein switches consist of two proteins that interact with each other in the presence or absence of a small molecule. Recent advances in computational protein design already contributed to the development of protein switches with an expanded range of small-molecule inducers and increasingly sophisticated switch mechanisms. Further progress in the engineering of small-molecule responsive switches is fueled by cutting-edge computational design approaches, which will enable more complex and precise control over cellular processes and advance synthetic biology applications in biotechnology and medicine. Here, we discuss recent milestones and how technological advances are impacting the development of chemical switches.
Collapse
Affiliation(s)
- Sailan Shui
- Laboratory of Protein Design and Immunoengineering (LPDI)STI, EPFLLausanneSwitzerland
- Swiss Institute of Bioinformatics (SIB)LausanneSwitzerland
| | - Stephen Buckley
- Laboratory of Protein Design and Immunoengineering (LPDI)STI, EPFLLausanneSwitzerland
- Swiss Institute of Bioinformatics (SIB)LausanneSwitzerland
| | - Leo Scheller
- Laboratory of Protein Design and Immunoengineering (LPDI)STI, EPFLLausanneSwitzerland
- Swiss Institute of Bioinformatics (SIB)LausanneSwitzerland
| | - Bruno E. Correia
- Laboratory of Protein Design and Immunoengineering (LPDI)STI, EPFLLausanneSwitzerland
- Swiss Institute of Bioinformatics (SIB)LausanneSwitzerland
| |
Collapse
|
26
|
Hadfield TE, Scantlebury J, Deane CM. Exploring the ability of machine learning-based virtual screening models to identify the functional groups responsible for binding. J Cheminform 2023; 15:84. [PMID: 37726844 PMCID: PMC10509074 DOI: 10.1186/s13321-023-00755-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2023] [Accepted: 08/25/2023] [Indexed: 09/21/2023] Open
Abstract
Many recently proposed structure-based virtual screening models appear to be able to accurately distinguish high affinity binders from non-binders. However, several recent studies have shown that they often do so by exploiting ligand-specific biases in the dataset, rather than identifying favourable intermolecular interactions in the input protein-ligand complex. In this work we propose a novel approach for assessing the extent to which machine learning-based virtual screening models are able to identify the functional groups responsible for binding. To sidestep the difficulty in establishing the ground truth importance of each atom of a large scale set of protein-ligand complexes, we propose a protocol for generating synthetic data. Each ligand in the dataset is surrounded by a randomly sampled point cloud of pharmacophores, and the label assigned to the synthetic protein-ligand complex is determined by a 3-dimensional deterministic binding rule. This allows us to precisely quantify the ground truth importance of each atom and compare it to the model generated attributions. Using our generated datasets, we demonstrate that a recently proposed deep learning-based virtual screening model, PointVS, identified the most important functional groups with 39% more efficiency than a fingerprint-based random forest, suggesting that it would generalise more effectively to new examples. In addition, we found that ligand-specific biases, such as those present in widely used virtual screening datasets, substantially impaired the ability of all ML models to identify the most important functional groups. We have made our synthetic data generation framework available to facilitate the benchmarking of new virtual screening models. Code is available at https://github.com/tomhadfield95/synthVS .
Collapse
Affiliation(s)
- Thomas E Hadfield
- Oxford Protein Informatics Group, Department of Statistics, University of Oxford, Oxford, UK
| | - Jack Scantlebury
- Oxford Protein Informatics Group, Department of Statistics, University of Oxford, Oxford, UK
| | - Charlotte M Deane
- Oxford Protein Informatics Group, Department of Statistics, University of Oxford, Oxford, UK.
| |
Collapse
|
27
|
Schaller D, Christ CD, Chodera JD, Volkamer A. Benchmarking Cross-Docking Strategies for Structure-Informed Machine Learning in Kinase Drug Discovery. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.09.11.557138. [PMID: 37745489 PMCID: PMC10515787 DOI: 10.1101/2023.09.11.557138] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/26/2023]
Abstract
In recent years machine learning has transformed many aspects of the drug discovery process including small molecule design for which the prediction of the bioactivity is an integral part. Leveraging structural information about the interactions between a small molecule and its protein target has great potential for downstream machine learning scoring approaches, but is fundamentally limited by the accuracy with which protein:ligand complex structures can be predicted in a reliable and automated fashion. With the goal of finding practical approaches to generating useful kinase:inhibitor complex geometries for downstream machine learning scoring approaches, we present a kinase-centric docking benchmark assessing the performance of different classes of docking and pose selection strategies to assess how well experimentally observed binding modes are recapitulated in a realistic cross-docking scenario. The assembled benchmark data set focuses on the well-studied protein kinase family and comprises a subset of 589 protein structures co-crystallized with 423 ATP-competitive ligands. We find that the docking methods biased by the co-crystallized ligand-utilizing shape overlap with or without maximum common substructure matching-are more successful in recovering binding poses than standard physics-based docking alone. Also, docking into multiple structures significantly increases the chance to generate a low RMSD docking pose. Docking utilizing an approach that combines all three methods (Posit) into structures with the most similar co-crystallized ligands according to shape and electrostatics proofed to be the most efficient way to reproduce binding poses achieving a success rate of 66.9 % across all included systems. The studied docking and pose selection strategies-which utilize the OpenEye Toolkit-were implemented into pipelines of the KinoML framework allowing automated and reliable protein:ligand complex generation for future downstream machine learning tasks. Although focused on protein kinases, we believe the general findings can also be transferred to other protein families.
Collapse
Affiliation(s)
- David Schaller
- In Silico Toxicology and Structural Bioinformatics, Institute of Physiology, Charité-Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Augustenburger Platz 1, 13353 Berlin, Germany
- Computational and Systems Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA
| | - Clara D. Christ
- Molecular Design, Research and Development, Pharmaceuticals, Bayer AG, 13342 Berlin, Germany
| | - John D. Chodera
- Computational and Systems Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA
| | - Andrea Volkamer
- In Silico Toxicology and Structural Bioinformatics, Institute of Physiology, Charité-Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Augustenburger Platz 1, 13353 Berlin, Germany
- Data Driven Drug Design, Faculty of Mathematics and Computer Sciences, Saarland University, Saarbrücken, Germany
| |
Collapse
|
28
|
Dong L, Shi S, Qu X, Luo D, Wang B. Ligand binding affinity prediction with fusion of graph neural networks and 3D structure-based complex graph. Phys Chem Chem Phys 2023; 25:24110-24120. [PMID: 37655493 DOI: 10.1039/d3cp03651k] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/02/2023]
Abstract
Accurate prediction of protein-ligand binding affinity is pivotal for drug design and discovery. Here, we proposed a novel deep fusion graph neural networks framework named FGNN to learn the protein-ligand interactions from the 3D structures of protein-ligand complexes. Unlike 1D sequences for proteins or 2D graphs for ligands, the 3D graph of protein-ligand complex enables the more accurate representations of the protein-ligand interactions. Benchmark studies have shown that our fusion models FGNN can achieve more accurate prediction of binding affinity than any individual algorithm. The advantages of fusion strategies have been demonstrated in terms of expressive power of data, learning efficiency and model interpretability. Our fusion models show satisfactory performances on diverse data sets, demonstrating their generalization ability. Given the good performances in both binding affinity prediction and virtual screening, our fusion models are expected to be practically applied for drug screening and design. Our work highlights the potential of the fusion graph neural network algorithm in solving complex prediction problems in computational biology and chemistry. The fusion graph neural networks (FGNN) model is freely available in https://github.com/LinaDongXMU/FGNN.
Collapse
Affiliation(s)
- Lina Dong
- State Key Laboratory of Physical Chemistry of Solid Surfaces and Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, iChEM, College of Chemistry and Chemical Engineering, Xiamen University, Xiamen, 361005, China.
| | - Shuai Shi
- Department of Algorithm, TuringQ Co., Ltd., Shanghai, 200240, China
| | - Xiaoyang Qu
- State Key Laboratory of Physical Chemistry of Solid Surfaces and Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, iChEM, College of Chemistry and Chemical Engineering, Xiamen University, Xiamen, 361005, China.
| | - Ding Luo
- State Key Laboratory of Physical Chemistry of Solid Surfaces and Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, iChEM, College of Chemistry and Chemical Engineering, Xiamen University, Xiamen, 361005, China.
| | - Binju Wang
- State Key Laboratory of Physical Chemistry of Solid Surfaces and Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, iChEM, College of Chemistry and Chemical Engineering, Xiamen University, Xiamen, 361005, China.
- Innovation Laboratory for Sciences and Technologies of Energy Materials of Fujian Province (IKKEM), Xiamen, 361005, China
| |
Collapse
|
29
|
Zhang Y, Menke J, He J, Nittinger E, Tyrchan C, Koch O, Zhao H. Similarity-based pairing improves efficiency of siamese neural networks for regression tasks and uncertainty quantification. J Cheminform 2023; 15:75. [PMID: 37649050 PMCID: PMC10469421 DOI: 10.1186/s13321-023-00744-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2022] [Accepted: 08/10/2023] [Indexed: 09/01/2023] Open
Abstract
Siamese networks, representing a novel class of neural networks, consist of two identical subnetworks sharing weights but receiving different inputs. Here we present a similarity-based pairing method for generating compound pairs to train Siamese neural networks for regression tasks. In comparison with the conventional exhaustive pairing, it reduces the algorithm complexity from O(n2) to O(n). It also results in a better prediction performance consistently on the three physicochemical datasets, using a multilayer perceptron with the circular fingerprint as a proof of concept. We further include into a Siamese neural network the transformer-based Chemformer, which extracts task-specific features from the simplified molecular-input line-entry system representation of compounds. Additionally, we propose a means to measure the prediction uncertainty by utilizing the variance in predictions from a set of reference compounds. Our results demonstrate that the high prediction accuracy correlates with the high confidence. Finally, we investigate implications of the similarity property principle in machine learning.
Collapse
Affiliation(s)
- Yumeng Zhang
- Medicinal Chemistry, Research and Early Development, Respiratory and Immunology (R&I), BioPharmaceuticals R&D, AstraZeneca, 43183, Gothenburg, Sweden
- Department of Pharmaceutical Biosciences, Uppsala University, Uppsala, Sweden
| | - Janosch Menke
- Medicinal Chemistry, Research and Early Development, Respiratory and Immunology (R&I), BioPharmaceuticals R&D, AstraZeneca, 43183, Gothenburg, Sweden.
- Institute of Pharmaceutical and Medicinal Chemistry, Westfälische Wilhelms-Universität Münster, 48149, Münster, Germany.
| | - Jiazhen He
- Molecular AI, Discovery Sciences, R&D, AstraZeneca, 43183, Gothenburg, Sweden
| | - Eva Nittinger
- Medicinal Chemistry, Research and Early Development, Respiratory and Immunology (R&I), BioPharmaceuticals R&D, AstraZeneca, 43183, Gothenburg, Sweden
| | - Christian Tyrchan
- Medicinal Chemistry, Research and Early Development, Respiratory and Immunology (R&I), BioPharmaceuticals R&D, AstraZeneca, 43183, Gothenburg, Sweden
| | - Oliver Koch
- Institute of Pharmaceutical and Medicinal Chemistry, Westfälische Wilhelms-Universität Münster, 48149, Münster, Germany
| | - Hongtao Zhao
- Medicinal Chemistry, Research and Early Development, Respiratory and Immunology (R&I), BioPharmaceuticals R&D, AstraZeneca, 43183, Gothenburg, Sweden.
| |
Collapse
|
30
|
Gorostiola González M, van den Broek RL, Braun TGM, Chatzopoulou M, Jespers W, IJzerman AP, Heitman LH, van Westen GJP. 3DDPDs: describing protein dynamics for proteochemometric bioactivity prediction. A case for (mutant) G protein-coupled receptors. J Cheminform 2023; 15:74. [PMID: 37641107 PMCID: PMC10463931 DOI: 10.1186/s13321-023-00745-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2023] [Accepted: 08/10/2023] [Indexed: 08/31/2023] Open
Abstract
Proteochemometric (PCM) modelling is a powerful computational drug discovery tool used in bioactivity prediction of potential drug candidates relying on both chemical and protein information. In PCM features are computed to describe small molecules and proteins, which directly impact the quality of the predictive models. State-of-the-art protein descriptors, however, are calculated from the protein sequence and neglect the dynamic nature of proteins. This dynamic nature can be computationally simulated with molecular dynamics (MD). Here, novel 3D dynamic protein descriptors (3DDPDs) were designed to be applied in bioactivity prediction tasks with PCM models. As a test case, publicly available G protein-coupled receptor (GPCR) MD data from GPCRmd was used. GPCRs are membrane-bound proteins, which are activated by hormones and neurotransmitters, and constitute an important target family for drug discovery. GPCRs exist in different conformational states that allow the transmission of diverse signals and that can be modified by ligand interactions, among other factors. To translate the MD-encoded protein dynamics two types of 3DDPDs were considered: one-hot encoded residue-specific (rs) and embedding-like protein-specific (ps) 3DDPDs. The descriptors were developed by calculating distributions of trajectory coordinates and partial charges, applying dimensionality reduction, and subsequently condensing them into vectors per residue or protein, respectively. 3DDPDs were benchmarked on several PCM tasks against state-of-the-art non-dynamic protein descriptors. Our rs- and ps3DDPDs outperformed non-dynamic descriptors in regression tasks using a temporal split and showed comparable performance with a random split and in all classification tasks. Combinations of non-dynamic descriptors with 3DDPDs did not result in increased performance. Finally, the power of 3DDPDs to capture dynamic fluctuations in mutant GPCRs was explored. The results presented here show the potential of including protein dynamic information on machine learning tasks, specifically bioactivity prediction, and open opportunities for applications in drug discovery, including oncology.
Collapse
Affiliation(s)
- Marina Gorostiola González
- Division of Drug Discovery and Safety, Leiden Academic Centre for Drug Research, Leiden University, Leiden, The Netherlands
- ONCODE Institute, Leiden, The Netherlands
| | - Remco L van den Broek
- Division of Drug Discovery and Safety, Leiden Academic Centre for Drug Research, Leiden University, Leiden, The Netherlands
| | - Thomas G M Braun
- Division of Drug Discovery and Safety, Leiden Academic Centre for Drug Research, Leiden University, Leiden, The Netherlands
| | - Magdalini Chatzopoulou
- Division of Drug Discovery and Safety, Leiden Academic Centre for Drug Research, Leiden University, Leiden, The Netherlands
| | - Willem Jespers
- Division of Drug Discovery and Safety, Leiden Academic Centre for Drug Research, Leiden University, Leiden, The Netherlands
| | - Adriaan P IJzerman
- Division of Drug Discovery and Safety, Leiden Academic Centre for Drug Research, Leiden University, Leiden, The Netherlands
| | - Laura H Heitman
- Division of Drug Discovery and Safety, Leiden Academic Centre for Drug Research, Leiden University, Leiden, The Netherlands
- ONCODE Institute, Leiden, The Netherlands
| | - Gerard J P van Westen
- Division of Drug Discovery and Safety, Leiden Academic Centre for Drug Research, Leiden University, Leiden, The Netherlands.
| |
Collapse
|
31
|
Abdel-Rehim A, Orhobor O, Hang L, Ni H, King RD. Protein-ligand binding affinity prediction exploiting sequence constituent homology. Bioinformatics 2023; 39:btad502. [PMID: 37572302 PMCID: PMC10463547 DOI: 10.1093/bioinformatics/btad502] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2023] [Revised: 07/10/2023] [Accepted: 08/11/2023] [Indexed: 08/14/2023] Open
Abstract
MOTIVATION Molecular docking is a commonly used approach for estimating binding conformations and their resultant binding affinities. Machine learning has been successfully deployed to enhance such affinity estimations. Many methods of varying complexity have been developed making use of some or all the spatial and categorical information available in these structures. The evaluation of such methods has mainly been carried out using datasets from PDBbind. Particularly the Comparative Assessment of Scoring Functions (CASF) 2007, 2013, and 2016 datasets with dedicated test sets. This work demonstrates that only a small number of simple descriptors is necessary to efficiently estimate binding affinity for these complexes without the need to know the exact binding conformation of a ligand. RESULTS The developed approach of using a small number of ligand and protein descriptors in conjunction with gradient boosting trees demonstrates high performance on the CASF datasets. This includes the commonly used benchmark CASF2016 where it appears to perform better than any other approach. This methodology is also useful for datasets where the spatial relationship between the ligand and protein is unknown as demonstrated using a large ChEMBL-derived dataset. AVAILABILITY AND IMPLEMENTATION Code and data uploaded to https://github.com/abbiAR/PLBAffinity.
Collapse
Affiliation(s)
- Abbi Abdel-Rehim
- Department of Chemical Engineering and Biotechnology, University of Cambridge, Cambridge CB3 0AS, United Kingdom
| | | | - Lou Hang
- Department of Mathematics, University College London, London WC1H 0AY, United Kingdom
| | - Hao Ni
- Department of Mathematics, University College London, London WC1H 0AY, United Kingdom
- The Alan Turing Institute, London NW1 2DB, United Kingdom
| | - Ross D King
- Department of Chemical Engineering and Biotechnology, University of Cambridge, Cambridge CB3 0AS, United Kingdom
- The Alan Turing Institute, London NW1 2DB, United Kingdom
- Department of Biology and Biological Engineering, Chalmers University of Technology, Gothenburg 412 96, Sweden
- Department of Computer Science and Engineering, Chalmers University of Technology, Gothenburg 412 96, Sweden
| |
Collapse
|
32
|
Baillif B, Cole J, McCabe P, Bender A. Deep generative models for 3D molecular structure. Curr Opin Struct Biol 2023; 80:102566. [DOI: 10.1016/j.sbi.2023.102566] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2022] [Revised: 02/05/2023] [Accepted: 02/15/2023] [Indexed: 03/30/2023]
|
33
|
Lu R, Wang J, Li P, Li Y, Tan S, Pan Y, Liu H, Gao P, Xie G, Yao X. Improving drug-target affinity prediction via feature fusion and knowledge distillation. Brief Bioinform 2023; 24:7142721. [PMID: 37099690 DOI: 10.1093/bib/bbad145] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2023] [Revised: 02/15/2023] [Accepted: 03/27/2023] [Indexed: 04/28/2023] Open
Abstract
Rapid and accurate prediction of drug-target affinity can accelerate and improve the drug discovery process. Recent studies show that deep learning models may have the potential to provide fast and accurate drug-target affinity prediction. However, the existing deep learning models still have their own disadvantages that make it difficult to complete the task satisfactorily. Complex-based models rely heavily on the time-consuming docking process, and complex-free models lacks interpretability. In this study, we introduced a novel knowledge-distillation insights drug-target affinity prediction model with feature fusion inputs to make fast, accurate and explainable predictions. We benchmarked the model on public affinity prediction and virtual screening dataset. The results show that it outperformed previous state-of-the-art models and achieved comparable performance to previous complex-based models. Finally, we study the interpretability of this model through visualization and find it can provide meaningful explanations for pairwise interaction. We believe this model can further improve the drug-target affinity prediction for its higher accuracy and reliable interpretability.
Collapse
Affiliation(s)
- Ruiqiang Lu
- College of Chemistry and Chemical Engineering, Lanzhou University, 730000 Gansu, China
- Ping An Healthcare Technology, 100027 Beijing, China
| | - Jun Wang
- Ping An Healthcare Technology, 100027 Beijing, China
| | - Pengyong Li
- School of Computer Science and Technology, Xidian University, 710126 Shaanxi, China
| | - Yuquan Li
- College of Chemistry and Chemical Engineering, Lanzhou University, 730000 Gansu, China
| | - Shuoyan Tan
- College of Chemistry and Chemical Engineering, Lanzhou University, 730000 Gansu, China
- Ping An Healthcare Technology, 100027 Beijing, China
| | - Yiting Pan
- College of Chemistry and Chemical Engineering, Lanzhou University, 730000 Gansu, China
| | - Huanxiang Liu
- Faculty of Applied Science, Macao Polytechnic University, 999078 Macau, China
| | - Peng Gao
- Ping An Healthcare Technology, 100027 Beijing, China
| | - Guotong Xie
- Ping An Healthcare Technology, 100027 Beijing, China
- Ping An Health Cloud Company Limited, 100027 Beijing, China
- Ping An International Smart City Technology Co., Ltd., 100027 Beijing, China
| | - Xiaojun Yao
- College of Chemistry and Chemical Engineering, Lanzhou University, 730000 Gansu, China
- State Key Laboratory of Quality Research in Chinese Medicines, Macau University of Science and Technology, 999078 Macau, China
| |
Collapse
|
34
|
Cavasotto CN, Di Filippo JI. The Impact of Supervised Learning Methods in Ultralarge High-Throughput Docking. J Chem Inf Model 2023; 63:2267-2280. [PMID: 37036491 DOI: 10.1021/acs.jcim.2c01471] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/11/2023]
Abstract
Structure-based virtual screening methods are, nowadays, one of the key pillars of computational drug discovery. In recent years, a series of studies have reported docking-based virtual screening campaigns of large databases ranging from hundreds to thousands of millions compounds, further identifying novel hits after experimental validation. As these larg-scale efforts are not generally accessible, machine learning-based protocols have emerged to accelerate the identification of virtual hits within an ultralarge chemical space, reaching impressive reductions in computational time. Herein, we illustrate the motivation and the problem behind the screening of large databases, providing an overview of key concepts and essential applications of machine learning-accelerated protocols, specifically concerning supervised learning methods. We also discuss where the field stands with these novel developments, highlighting possible insights for future studies.
Collapse
Affiliation(s)
- Claudio N Cavasotto
- Computational Drug Design and Biomedical Informatics Laboratory, Instituto de Investigaciones en Medicina Traslacional (IIMT), CONICET-Universidad Austral, Av. Juan Domingo Perón 1500, B1629AHJ Pilar, Argentina
- Facultad de Ciencias Biomédicas, and Facultad de Ingeniería, Universidad Austral, Av. Juan Domingo Perón 1500, B1629AHJ Pilar, Argentina
- Austral Institute for Applied Artificial Intelligence, Universidad Austral, Av. Juan Domingo Perón 1500, B1629AHJ Pilar, Argentina
| | - Juan I Di Filippo
- Computational Drug Design and Biomedical Informatics Laboratory, Instituto de Investigaciones en Medicina Traslacional (IIMT), CONICET-Universidad Austral, Av. Juan Domingo Perón 1500, B1629AHJ Pilar, Argentina
- Facultad de Ciencias Biomédicas, and Facultad de Ingeniería, Universidad Austral, Av. Juan Domingo Perón 1500, B1629AHJ Pilar, Argentina
- Austral Institute for Applied Artificial Intelligence, Universidad Austral, Av. Juan Domingo Perón 1500, B1629AHJ Pilar, Argentina
| |
Collapse
|
35
|
Grisoni F. Chemical language models for de novo drug design: Challenges and opportunities. Curr Opin Struct Biol 2023; 79:102527. [PMID: 36738564 DOI: 10.1016/j.sbi.2023.102527] [Citation(s) in RCA: 12] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2022] [Revised: 12/07/2022] [Accepted: 12/20/2022] [Indexed: 02/05/2023]
Abstract
Generative deep learning is accelerating de novo drug design, by allowing the generation of molecules with desired properties on demand. Chemical language models - which generate new molecules in the form of strings using deep learning - have been particularly successful in this endeavour. Thanks to advances in natural language processing methods and interdisciplinary collaborations, chemical language models are expected to become increasingly relevant in drug discovery. This minireview provides an overview of the current state-of-the-art of chemical language models for de novo design, and analyses current limitations, challenges, and advantages. Finally, a perspective on future opportunities is provided.
Collapse
Affiliation(s)
- Francesca Grisoni
- Eindhoven University of Technology, Institute for Complex Molecular Systems and Dept. Biomedical Engineering, Eindhoven, Netherlands; Centre for Living Technologies, Alliance TU/e, WUR, UU, UMC Utrecht, Netherlands.
| |
Collapse
|
36
|
Sadybekov AV, Katritch V. Computational approaches streamlining drug discovery. Nature 2023; 616:673-685. [PMID: 37100941 DOI: 10.1038/s41586-023-05905-z] [Citation(s) in RCA: 135] [Impact Index Per Article: 135.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2022] [Accepted: 03/01/2023] [Indexed: 04/28/2023]
Abstract
Computer-aided drug discovery has been around for decades, although the past few years have seen a tectonic shift towards embracing computational technologies in both academia and pharma. This shift is largely defined by the flood of data on ligand properties and binding to therapeutic targets and their 3D structures, abundant computing capacities and the advent of on-demand virtual libraries of drug-like small molecules in their billions. Taking full advantage of these resources requires fast computational methods for effective ligand screening. This includes structure-based virtual screening of gigascale chemical spaces, further facilitated by fast iterative screening approaches. Highly synergistic are developments in deep learning predictions of ligand properties and target activities in lieu of receptor structure. Here we review recent advances in ligand discovery technologies, their potential for reshaping the whole process of drug discovery and development, as well as the challenges they encounter. We also discuss how the rapid identification of highly diverse, potent, target-selective and drug-like ligands to protein targets can democratize the drug discovery process, presenting new opportunities for the cost-effective development of safer and more effective small-molecule treatments.
Collapse
Affiliation(s)
- Anastasiia V Sadybekov
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA
- Center for New Technologies in Drug Discovery and Development, Bridge Institute, Michelson Center for Convergent Biosciences, University of Southern California, Los Angeles, CA, USA
| | - Vsevolod Katritch
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA.
- Center for New Technologies in Drug Discovery and Development, Bridge Institute, Michelson Center for Convergent Biosciences, University of Southern California, Los Angeles, CA, USA.
- Department of Chemistry, University of Southern California, Los Angeles, CA, USA.
| |
Collapse
|
37
|
Isert C, Atz K, Schneider G. Structure-based drug design with geometric deep learning. Curr Opin Struct Biol 2023; 79:102548. [PMID: 36842415 DOI: 10.1016/j.sbi.2023.102548] [Citation(s) in RCA: 26] [Impact Index Per Article: 26.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2022] [Revised: 01/16/2023] [Accepted: 01/24/2023] [Indexed: 02/26/2023]
Abstract
Structure-based drug design uses three-dimensional geometric information of macromolecules, such as proteins or nucleic acids, to identify suitable ligands. Geometric deep learning, an emerging concept of neural-network-based machine learning, has been applied to macromolecular structures. This review provides an overview of the recent applications of geometric deep learning in bioorganic and medicinal chemistry, highlighting its potential for structure-based drug discovery and design. Emphasis is placed on molecular property prediction, ligand binding site and pose prediction, and structure-based de novo molecular design. The current challenges and opportunities are highlighted, and a forecast of the future of geometric deep learning for drug discovery is presented.
Collapse
Affiliation(s)
- Clemens Isert
- ETH Zurich, Department of Chemistry and Applied Biosciences, Vladimir-Prelog-Weg 4, Zurich, 8093, Switzerland
| | - Kenneth Atz
- ETH Zurich, Department of Chemistry and Applied Biosciences, Vladimir-Prelog-Weg 4, Zurich, 8093, Switzerland
| | - Gisbert Schneider
- ETH Zurich, Department of Chemistry and Applied Biosciences, Vladimir-Prelog-Weg 4, Zurich, 8093, Switzerland; ETH Singapore SEC Ltd, 1 CREATE Way, #06-01 CREATE Tower, Singapore, 8093, Singapore.
| |
Collapse
|
38
|
Yang Z, Zhong W, Lv Q, Dong T, Yu-Chian Chen C. Geometric Interaction Graph Neural Network for Predicting Protein-Ligand Binding Affinities from 3D Structures (GIGN). J Phys Chem Lett 2023; 14:2020-2033. [PMID: 36794930 DOI: 10.1021/acs.jpclett.2c03906] [Citation(s) in RCA: 17] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/18/2023]
Abstract
Predicting protein-ligand binding affinities (PLAs) is a core problem in drug discovery. Recent advances have shown great potential in applying machine learning (ML) for PLA prediction. However, most of them omit the 3D structures of complexes and physical interactions between proteins and ligands, which are considered essential to understanding the binding mechanism. This paper proposes a geometric interaction graph neural network (GIGN) that incorporates 3D structures and physical interactions for predicting protein-ligand binding affinities. Specifically, we design a heterogeneous interaction layer that unifies covalent and noncovalent interactions into the message passing phase to learn node representations more effectively. The heterogeneous interaction layer also follows fundamental biological laws, including invariance to translations and rotations of the complexes, thus avoiding expensive data augmentation strategies. GIGN achieves state-of-the-art performance on three external test sets. Moreover, by visualizing learned representations of protein-ligand complexes, we show that the predictions of GIGN are biologically meaningful.
Collapse
Affiliation(s)
- Ziduo Yang
- Intelligent Medical Research Center, School of Intelligent Systems Engineering, Sun Yat-sen University, Shenzhen, Guangdong 510275, China
| | - Weihe Zhong
- Intelligent Medical Research Center, School of Intelligent Systems Engineering, Sun Yat-sen University, Shenzhen, Guangdong 510275, China
| | - Qiujie Lv
- Intelligent Medical Research Center, School of Intelligent Systems Engineering, Sun Yat-sen University, Shenzhen, Guangdong 510275, China
| | - Tiejun Dong
- Intelligent Medical Research Center, School of Intelligent Systems Engineering, Sun Yat-sen University, Shenzhen, Guangdong 510275, China
| | - Calvin Yu-Chian Chen
- Intelligent Medical Research Center, School of Intelligent Systems Engineering, Sun Yat-sen University, Shenzhen, Guangdong 510275, China
- Department of Medical Research, China Medical University Hospital, Taichung 40447, Taiwan
- Department of Bioinformatics and Medical Engineering, Asia University, Taichung 41354, Taiwan
| |
Collapse
|
39
|
Predicting Potent Compounds Using a Conditional Variational Autoencoder Based upon a New Structure-Potency Fingerprint. Biomolecules 2023; 13:biom13020393. [PMID: 36830761 PMCID: PMC9953226 DOI: 10.3390/biom13020393] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2022] [Revised: 02/07/2023] [Accepted: 02/16/2023] [Indexed: 02/22/2023] Open
Abstract
Prediction of the potency of bioactive compounds generally relies on linear or nonlinear quantitative structure-activity relationship (QSAR) models. Nonlinear models are generated using machine learning methods. We introduce a novel approach for potency prediction that depends on a newly designed molecular fingerprint (FP) representation. This structure-potency fingerprint (SPFP) combines different modules accounting for the structural features of active compounds and their potency values in a single bit string, hence unifying structure and potency representation. This encoding enables the derivation of a conditional variational autoencoder (CVAE) using SPFPs of training compounds and apply the model to predict the SPFP potency module of test compounds using only their structure module as input. The SPFP-CVAE approach correctly predicts the potency values of compounds belonging to different activity classes with an accuracy comparable to support vector regression (SVR), representing the state-of-the-art in the field. In addition, highly potent compounds are predicted with very similar accuracy as SVR and deep neural networks.
Collapse
|
40
|
Zou Y, Wang R, Du M, Wang X, Xu D. Identifying Protein-Ligand Interactions via a Novel Distance Self-Feedback Biomolecular Interaction Network. J Phys Chem B 2023; 127:899-911. [PMID: 36657025 DOI: 10.1021/acs.jpcb.2c07592] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/20/2023]
Abstract
Efficient and accurate characterizations of protein-ligand interactions are key to understanding biology at the molecular level. They are particularly useful in pharmaceutical industry applications. They are usually computationally demanding for those widely applied dynamics-based methods in identifying important residues or calculating ligand binding free energy. In this work, we proposed a graph deep learning (DL) framework, namely, the distance self-feedback biomolecular interaction network (DSBIN), in which the relationship between the complex structure and binding affinity can be established by means of a carefully designed distance self-feedback module and interaction layer. Our model can directly provide a quantitative evaluation of inhibitor binding affinities (pKd). More importantly, the DSBIN model efficiently identifies key interactions for inhibitor binding and thus intrinsically bears the interpretability. Its generalization performance was further verified using 1405 unseen structures. The predicted binding free energies' deviations were calculated to be less than 1.37 kcal/mol for more than 55% structures. Moreover, we also compared the DSBIN model with a commonly used theoretical method in calculating the substrate binding free energy, MM/GBSA. Our results show that the current DL model has generally better performance in predicting the binding free energy. For a specific complex system, mannopentaose/TmCBM27, the DSBIN predicted binding free energy is -8.21 kcal/mol, which is very close to experimentally measured -7.76 kcal/mol and MM/GBSA calculated -7.16 kcal/mol. Meanwhile, all important aromatic residues around the binding pocket can be identified by our DL model. Considering the accuracy and efficiency of the newly developed DL model, it may be very helpful in the field of drug design and molecular recognition.
Collapse
Affiliation(s)
- Yurong Zou
- MOE Key Laboratory of Green Chemistry and Technology, College of Chemistry, Sichuan University, Chengdu, Sichuan610064, PR China
| | - Ruihan Wang
- MOE Key Laboratory of Green Chemistry and Technology, College of Chemistry, Sichuan University, Chengdu, Sichuan610064, PR China
| | - Meng Du
- MOE Key Laboratory of Green Chemistry and Technology, College of Chemistry, Sichuan University, Chengdu, Sichuan610064, PR China
| | - Xin Wang
- MOE Key Laboratory of Green Chemistry and Technology, College of Chemistry, Sichuan University, Chengdu, Sichuan610064, PR China
| | - Dingguo Xu
- MOE Key Laboratory of Green Chemistry and Technology, College of Chemistry, Sichuan University, Chengdu, Sichuan610064, PR China.,Research Center for Materials Genome Engineering, Sichuan University, Chengdu, Sichuan610065, PR China
| |
Collapse
|
41
|
New avenues in artificial-intelligence-assisted drug discovery. Drug Discov Today 2023; 28:103516. [PMID: 36736583 DOI: 10.1016/j.drudis.2023.103516] [Citation(s) in RCA: 20] [Impact Index Per Article: 20.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2022] [Revised: 12/08/2022] [Accepted: 01/26/2023] [Indexed: 02/05/2023]
Abstract
Over the past decade, the amount of biomedical data available has grown at unprecedented rates. Increased automation technology and larger data volumes have encouraged the use of machine learning (ML) or artificial intelligence (AI) techniques for mining such data and extracting useful patterns. Because the identification of chemical entities with desired biological activity is a crucial task in drug discovery, AI technologies have the potential to accelerate this process and support decision making. In addition, the advent of deep learning (DL) has shown great promise in addressing diverse problems in drug discovery, such as de novo molecular design. Herein, we will appraise the current state-of-the-art in AI-assisted drug discovery, discussing the recent applications covering generative models for chemical structure generation, scoring functions to improve binding affinity and pose prediction, and molecular dynamics to assist in the parametrization, featurization and generalization tasks. Finally, we will discuss current hurdles and the strategies to overcome them, as well as potential future directions.
Collapse
|
42
|
Durairaj J, de Ridder D, van Dijk AD. Beyond sequence: Structure-based machine learning. Comput Struct Biotechnol J 2022; 21:630-643. [PMID: 36659927 PMCID: PMC9826903 DOI: 10.1016/j.csbj.2022.12.039] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2022] [Revised: 12/21/2022] [Accepted: 12/21/2022] [Indexed: 12/31/2022] Open
Abstract
Recent breakthroughs in protein structure prediction demarcate the start of a new era in structural bioinformatics. Combined with various advances in experimental structure determination and the uninterrupted pace at which new structures are published, this promises an age in which protein structure information is as prevalent and ubiquitous as sequence. Machine learning in protein bioinformatics has been dominated by sequence-based methods, but this is now changing to make use of the deluge of rich structural information as input. Machine learning methods making use of structures are scattered across literature and cover a number of different applications and scopes; while some try to address questions and tasks within a single protein family, others aim to capture characteristics across all available proteins. In this review, we look at the variety of structure-based machine learning approaches, how structures can be used as input, and typical applications of these approaches in protein biology. We also discuss current challenges and opportunities in this all-important and increasingly popular field.
Collapse
Affiliation(s)
- Janani Durairaj
- Biozentrum, University of Basel, Basel, Switzerland
- Bioinformatics Group, Department of Plant Sciences, Wageningen University and Research, Wageningen, the Netherlands
| | - Dick de Ridder
- Bioinformatics Group, Department of Plant Sciences, Wageningen University and Research, Wageningen, the Netherlands
| | - Aalt D.J. van Dijk
- Bioinformatics Group, Department of Plant Sciences, Wageningen University and Research, Wageningen, the Netherlands
| |
Collapse
|
43
|
Simple nearest-neighbour analysis meets the accuracy of compound potency predictions using complex machine learning models. NAT MACH INTELL 2022. [DOI: 10.1038/s42256-022-00581-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
|
44
|
van Tilborg D, Alenicheva A, Grisoni F. Exposing the Limitations of Molecular Machine Learning with Activity Cliffs. J Chem Inf Model 2022; 62:5938-5951. [PMID: 36456532 PMCID: PMC9749029 DOI: 10.1021/acs.jcim.2c01073] [Citation(s) in RCA: 25] [Impact Index Per Article: 12.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2022] [Indexed: 12/03/2022]
Abstract
Machine learning has become a crucial tool in drug discovery and chemistry at large, e.g., to predict molecular properties, such as bioactivity, with high accuracy. However, activity cliffs─pairs of molecules that are highly similar in their structure but exhibit large differences in potency─have received limited attention for their effect on model performance. Not only are these edge cases informative for molecule discovery and optimization but also models that are well equipped to accurately predict the potency of activity cliffs have increased potential for prospective applications. Our work aims to fill the current knowledge gap on best-practice machine learning methods in the presence of activity cliffs. We benchmarked a total of 24 machine and deep learning approaches on curated bioactivity data from 30 macromolecular targets for their performance on activity cliff compounds. While all methods struggled in the presence of activity cliffs, machine learning approaches based on molecular descriptors outperformed more complex deep learning methods. Our findings highlight large case-by-case differences in performance, advocating for (a) the inclusion of dedicated "activity-cliff-centered" metrics during model development and evaluation and (b) the development of novel algorithms to better predict the properties of activity cliffs. To this end, the methods, metrics, and results of this study have been encapsulated into an open-access benchmarking platform named MoleculeACE (Activity Cliff Estimation, available on GitHub at: https://github.com/molML/MoleculeACE). MoleculeACE is designed to steer the community toward addressing the pressing but overlooked limitation of molecular machine learning models posed by activity cliffs.
Collapse
Affiliation(s)
- Derek van Tilborg
- Institute
for Complex Molecular Systems and Dept. Biomedical Engineering, Eindhoven University of Technology, 5612AZEindhoven, The Netherlands
- Centre
for Living Technologies, Alliance TU/e,
WUR, UU, UMC Utrecht, 3584CBUtrecht, The Netherlands
| | | | - Francesca Grisoni
- Institute
for Complex Molecular Systems and Dept. Biomedical Engineering, Eindhoven University of Technology, 5612AZEindhoven, The Netherlands
- Centre
for Living Technologies, Alliance TU/e,
WUR, UU, UMC Utrecht, 3584CBUtrecht, The Netherlands
| |
Collapse
|
45
|
Raush E, Abagyan R, Totrov M. Graph-Convolutional Neural Net Model of the Statistical Torsion Profiles for Small Organic Molecules. J Chem Inf Model 2022; 62:5896-5906. [PMID: 36456533 DOI: 10.1021/acs.jcim.2c00790] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/03/2022]
Abstract
We present a graph-convolutional neural network (GCNN)-based method for learning and prediction of statistical torsional profiles (STP) in small organic molecules based on the experimental X-ray structure data. A specialized GCNN torsion profile model is trained using the structures in the Crystallography Open Database (COD). The GCNN-STP model captures torsional preferences over a wide range of torsion rotor chemotypes and correctly predicts a variety of effects from the vicinal atoms and moieties. GCNN-STP statistical profiles also show good agreement with quantum chemically (DFT) calculated torsion energy profiles. Furthermore, we demonstrate the application of the GCNN-STP statistical profiles for conformer generation. A web server that allows interactive profile prediction and viewing is made freely available at https://www.molsoft.com/tortool.html.
Collapse
Affiliation(s)
- Eugene Raush
- Molsoft L.L.C., 11199 Sorrento Valley Road, S209, San Diego, California92121, United States
| | - Ruben Abagyan
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, California92093, United States
| | - Maxim Totrov
- Molsoft L.L.C., 11199 Sorrento Valley Road, S209, San Diego, California92121, United States
| |
Collapse
|
46
|
Khalak Y, Tresadern G, Hahn DF, de Groot BL, Gapsys V. Chemical Space Exploration with Active Learning and Alchemical Free Energies. J Chem Theory Comput 2022; 18:6259-6270. [PMID: 36148968 PMCID: PMC9558370 DOI: 10.1021/acs.jctc.2c00752] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
![]()
Drug discovery can be thought of as a search for a needle
in a
haystack: searching through a large chemical space for the most active
compounds. Computational techniques can narrow the search space for
experimental follow up, but even they become unaffordable when evaluating
large numbers of molecules. Therefore, machine learning (ML) strategies
are being developed as computationally cheaper complementary techniques
for navigating and triaging large chemical libraries. Here, we explore
how an active learning protocol can be combined with first-principles
based alchemical free energy calculations to identify high affinity
phosphodiesterase 2 (PDE2) inhibitors. We first calibrate the procedure
using a set of experimentally characterized PDE2 binders. The optimized
protocol is then used prospectively on a large chemical library to
navigate toward potent inhibitors. In the active learning cycle, at
every iteration a small fraction of compounds is probed by alchemical
calculations and the obtained affinities are used to train ML models.
With successive rounds, high affinity binders are identified by explicitly
evaluating only a small subset of compounds in a large chemical library,
thus providing an efficient protocol that robustly identifies a large
fraction of true positives.
Collapse
Affiliation(s)
- Yuriy Khalak
- Computational Biomolecular Dynamics Group, Department of Theoretical and Computational Biophysics, Max Planck Institute for Multidisciplinary Sciences, Am Fassberg 11, D-37077 Göttingen, Germany
| | - Gary Tresadern
- Computational Chemistry, Janssen Research & Development, Janssen Pharmaceutica N. V., Turnhoutseweg 30, 2340 Beerse, Belgium
| | - David F Hahn
- Computational Chemistry, Janssen Research & Development, Janssen Pharmaceutica N. V., Turnhoutseweg 30, 2340 Beerse, Belgium
| | - Bert L de Groot
- Computational Biomolecular Dynamics Group, Department of Theoretical and Computational Biophysics, Max Planck Institute for Multidisciplinary Sciences, Am Fassberg 11, D-37077 Göttingen, Germany
| | - Vytautas Gapsys
- Computational Biomolecular Dynamics Group, Department of Theoretical and Computational Biophysics, Max Planck Institute for Multidisciplinary Sciences, Am Fassberg 11, D-37077 Göttingen, Germany
| |
Collapse
|
47
|
Born J, Shoshan Y, Huynh T, Cornell WD, Martin EJ, Manica M. On the Choice of Active Site Sequences for Kinase-Ligand Affinity Prediction. J Chem Inf Model 2022; 62:4295-4299. [PMID: 36098536 PMCID: PMC9516689 DOI: 10.1021/acs.jcim.2c00840] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
![]()
Recent work showed that active site rather than full-protein-sequence
information improves predictive performance in kinase-ligand binding
affinity prediction. To refine the notion of an “active site”,
we here propose and compare multiple definitions. We report significant
evidence that our novel definition is superior to previous definitions
and better models of ATP-noncompetitive inhibitors. Moreover, we leverage
the discontiguity of the active site sequence to motivate novel protein-sequence
augmentation strategies and find that combining them further improves
performance.
Collapse
Affiliation(s)
- Jannis Born
- Accelerated Discovery, IBM Research Europe, 8803 Rüschlikon, Switzerland.,Department of Biosystems Science and Engineering, ETH Zurich, 4058 Basel, Switzerland
| | | | - Tien Huynh
- IBM Thomas J. Watson Research Center, Yorktown Heights, New York 10598, United States
| | - Wendy D Cornell
- IBM Thomas J. Watson Research Center, Yorktown Heights, New York 10598, United States
| | - Eric J Martin
- Novartis Institutes for BioMedical Research, Emeryville, California 94608, United States
| | - Matteo Manica
- Accelerated Discovery, IBM Research Europe, 8803 Rüschlikon, Switzerland
| |
Collapse
|
48
|
Du H, Jiang D, Gao J, Zhang X, Jiang L, Zeng Y, Wu Z, Shen C, Xu L, Cao D, Hou T, Pan P. Proteome-Wide Profiling of the Covalent-Druggable Cysteines with a Structure-Based Deep Graph Learning Network. Research (Wash D C) 2022; 2022:9873564. [PMID: 35958111 PMCID: PMC9343084 DOI: 10.34133/2022/9873564] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2022] [Accepted: 06/27/2022] [Indexed: 11/06/2022] Open
Abstract
Covalent ligands have attracted increasing attention due to their unique advantages, such as long residence time, high selectivity, and strong binding affinity. They also show promise for targets where previous efforts to identify noncovalent small molecule inhibitors have failed. However, our limited knowledge of covalent binding sites has hindered the discovery of novel ligands. Therefore, developing in silico methods to identify covalent binding sites is highly desirable. Here, we propose DeepCoSI, the first structure-based deep graph learning model to identify ligandable covalent sites in the protein. By integrating the characterization of the binding pocket and the interactions between each cysteine and the surrounding environment, DeepCoSI achieves state-of-the-art predictive performances. The validation on two external test sets which mimic the real application scenarios shows that DeepCoSI has strong ability to distinguish ligandable sites from the others. Finally, we profiled the entire set of protein structures in the RCSB Protein Data Bank (PDB) with DeepCoSI to evaluate the ligandability of each cysteine for covalent ligand design, and made the predicted data publicly available on website.
Collapse
Affiliation(s)
- Hongyan Du
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058 Zhejiang, China
- State Key Lab of CAD&CG, Zhejiang University, Hangzhou, 310058 Zhejiang, China
| | - Dejun Jiang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058 Zhejiang, China
- State Key Lab of CAD&CG, Zhejiang University, Hangzhou, 310058 Zhejiang, China
| | - Junbo Gao
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058 Zhejiang, China
| | - Xujun Zhang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058 Zhejiang, China
| | - Lingxiao Jiang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058 Zhejiang, China
| | - Yundian Zeng
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058 Zhejiang, China
| | - Zhenxing Wu
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058 Zhejiang, China
| | - Chao Shen
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058 Zhejiang, China
| | - Lei Xu
- Institute of Bioinformatics and Medical Engineering, School of Electrical and Information Engineering, Jiangsu University of Technology, Changzhou 213001, China
| | - Dongsheng Cao
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha, 410004 Hunan, China
| | - Tingjun Hou
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058 Zhejiang, China
- State Key Lab of CAD&CG, Zhejiang University, Hangzhou, 310058 Zhejiang, China
| | - Peichen Pan
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058 Zhejiang, China
| |
Collapse
|