1
|
Isigkeit L, Hörmann T, Schallmayer E, Scholz K, Lillich FF, Ehrler JHM, Hufnagel B, Büchner J, Marschner JA, Pabel J, Proschak E, Merk D. Automated design of multi-target ligands by generative deep learning. Nat Commun 2024; 15:7946. [PMID: 39261471 PMCID: PMC11390726 DOI: 10.1038/s41467-024-52060-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2024] [Accepted: 08/23/2024] [Indexed: 09/13/2024] Open
Abstract
Generative deep learning models enable data-driven de novo design of molecules with tailored features. Chemical language models (CLM) trained on string representations of molecules such as SMILES have been successfully employed to design new chemical entities with experimentally confirmed activity on intended targets. Here, we probe the application of CLM to generate multi-target ligands for designed polypharmacology. We capitalize on the ability of CLM to learn from small fine-tuning sets of molecules and successfully bias the model towards designing drug-like molecules with similarity to known ligands of target pairs of interest. Designs obtained from CLM after pooled fine-tuning are predicted active on both proteins of interest and comprise pharmacophore elements of ligands for both targets in one molecule. Synthesis and testing of twelve computationally favored CLM designs for six target pairs reveals modulation of at least one intended protein by all selected designs with up to double-digit nanomolar potency and confirms seven compounds as designed dual ligands. These results corroborate CLM for multi-target de novo design as source of innovation in drug discovery.
Collapse
Affiliation(s)
- Laura Isigkeit
- Goethe University Frankfurt, Institute of Pharmaceutical Chemistry, 60438, Frankfurt, Germany
| | - Tim Hörmann
- Ludwig-Maximilians-Universität München, Department of Pharmacy, 81377, Munich, Germany
| | - Espen Schallmayer
- Goethe University Frankfurt, Institute of Pharmaceutical Chemistry, 60438, Frankfurt, Germany
| | - Katharina Scholz
- Ludwig-Maximilians-Universität München, Department of Pharmacy, 81377, Munich, Germany
| | - Felix F Lillich
- Goethe University Frankfurt, Institute of Pharmaceutical Chemistry, 60438, Frankfurt, Germany
- Fraunhofer Institute for Translational Medicine and Pharmacology ITMP, 60596, Frankfurt, Germany
| | - Johanna H M Ehrler
- Goethe University Frankfurt, Institute of Pharmaceutical Chemistry, 60438, Frankfurt, Germany
| | - Benedikt Hufnagel
- Goethe University Frankfurt, Institute of Pharmaceutical Chemistry, 60438, Frankfurt, Germany
| | - Jasmin Büchner
- Goethe University Frankfurt, Institute of Pharmaceutical Chemistry, 60438, Frankfurt, Germany
| | - Julian A Marschner
- Ludwig-Maximilians-Universität München, Department of Pharmacy, 81377, Munich, Germany
| | - Jörg Pabel
- Ludwig-Maximilians-Universität München, Department of Pharmacy, 81377, Munich, Germany
| | - Ewgenij Proschak
- Goethe University Frankfurt, Institute of Pharmaceutical Chemistry, 60438, Frankfurt, Germany
- Fraunhofer Institute for Translational Medicine and Pharmacology ITMP, 60596, Frankfurt, Germany
| | - Daniel Merk
- Goethe University Frankfurt, Institute of Pharmaceutical Chemistry, 60438, Frankfurt, Germany.
- Ludwig-Maximilians-Universität München, Department of Pharmacy, 81377, Munich, Germany.
| |
Collapse
|
2
|
Catacutan DB, Alexander J, Arnold A, Stokes JM. Machine learning in preclinical drug discovery. Nat Chem Biol 2024:10.1038/s41589-024-01679-1. [PMID: 39030362 DOI: 10.1038/s41589-024-01679-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2023] [Accepted: 06/13/2024] [Indexed: 07/21/2024]
Abstract
Drug-discovery and drug-development endeavors are laborious, costly and time consuming. These programs can take upward of 12 years and cost US $2.5 billion, with a failure rate of more than 90%. Machine learning (ML) presents an opportunity to improve the drug-discovery process. Indeed, with the growing abundance of public and private large-scale biological and chemical datasets, ML techniques are becoming well positioned as useful tools that can augment the traditional drug-development process. In this Perspective, we discuss the integration of algorithmic methods throughout the preclinical phases of drug discovery. Specifically, we highlight an array of ML-based efforts, across diverse disease areas, to accelerate initial hit discovery, mechanism-of-action (MOA) elucidation and chemical property optimization. With advances in the application of ML across diverse therapeutic areas, we posit that fully ML-integrated drug-discovery pipelines will define the future of drug-development programs.
Collapse
Affiliation(s)
- Denise B Catacutan
- Department of Biochemistry and Biomedical Sciences, McMaster University, Hamilton, Ontario, Canada
- Michael G. DeGroote Institute for Infectious Disease Research, McMaster University, Hamilton, Ontario, Canada
- David Braley Centre for Antibiotic Discovery, McMaster University, Hamilton, Ontario, Canada
| | - Jeremie Alexander
- Department of Biochemistry and Biomedical Sciences, McMaster University, Hamilton, Ontario, Canada
- Michael G. DeGroote Institute for Infectious Disease Research, McMaster University, Hamilton, Ontario, Canada
- David Braley Centre for Antibiotic Discovery, McMaster University, Hamilton, Ontario, Canada
| | - Autumn Arnold
- Department of Biochemistry and Biomedical Sciences, McMaster University, Hamilton, Ontario, Canada
- Michael G. DeGroote Institute for Infectious Disease Research, McMaster University, Hamilton, Ontario, Canada
- David Braley Centre for Antibiotic Discovery, McMaster University, Hamilton, Ontario, Canada
| | - Jonathan M Stokes
- Department of Biochemistry and Biomedical Sciences, McMaster University, Hamilton, Ontario, Canada.
- Michael G. DeGroote Institute for Infectious Disease Research, McMaster University, Hamilton, Ontario, Canada.
- David Braley Centre for Antibiotic Discovery, McMaster University, Hamilton, Ontario, Canada.
| |
Collapse
|
3
|
Fallani A, Medrano Sandonas L, Tkatchenko A. Inverse mapping of quantum properties to structures for chemical space of small organic molecules. Nat Commun 2024; 15:6061. [PMID: 39025883 PMCID: PMC11258234 DOI: 10.1038/s41467-024-50401-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2023] [Accepted: 07/01/2024] [Indexed: 07/20/2024] Open
Abstract
Computer-driven molecular design combines the principles of chemistry, physics, and artificial intelligence to identify chemical compounds with tailored properties. While quantum-mechanical (QM) methods, coupled with machine learning, already offer a direct mapping from 3D molecular structures to their properties, effective methodologies for the inverse mapping in chemical space remain elusive. We address this challenge by demonstrating the possibility of parametrizing a chemical space with a finite set of QM properties. Our proof-of-concept implementation achieves an approximate property-to-structure mapping, the QIM model (which stands for "Quantum Inverse Mapping"), by forcing a variational auto-encoder with a property encoder to obtain a common internal representation for both structures and properties. After validating this mapping for small drug-like molecules, we illustrate its capabilities with an explainability study as well as by the generation of de novo molecular structures with targeted properties and transition pathways between conformational isomers. Our findings thus provide a proof-of-principle demonstration aiming to enable the inverse property-to-structure design in diverse chemical spaces.
Collapse
Affiliation(s)
- Alessio Fallani
- Department of Physics and Materials Science, University of Luxembourg, L-1511, Luxembourg City, Luxembourg.
| | - Leonardo Medrano Sandonas
- Department of Physics and Materials Science, University of Luxembourg, L-1511, Luxembourg City, Luxembourg.
- Institute for Materials Science and Max Bergmann Center of Biomaterials, TU Dresden, 01062, Dresden, Germany.
| | - Alexandre Tkatchenko
- Department of Physics and Materials Science, University of Luxembourg, L-1511, Luxembourg City, Luxembourg.
| |
Collapse
|
4
|
Atz K, Nippa DF, Müller AT, Jost V, Anelli A, Reutlinger M, Kramer C, Martin RE, Grether U, Schneider G, Wuitschik G. Geometric deep learning-guided Suzuki reaction conditions assessment for applications in medicinal chemistry. RSC Med Chem 2024; 15:2310-2321. [PMID: 39026644 PMCID: PMC11253849 DOI: 10.1039/d4md00196f] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2024] [Accepted: 05/25/2024] [Indexed: 07/20/2024] Open
Abstract
Suzuki cross-coupling reactions are considered a valuable tool for constructing carbon-carbon bonds in small molecule drug discovery. However, the synthesis of chemical matter often represents a time-consuming and labour-intensive bottleneck. We demonstrate how machine learning methods trained on high-throughput experimentation (HTE) data can be leveraged to enable fast reaction condition selection for novel coupling partners. We show that the trained models support chemists in determining suitable catalyst-solvent-base combinations for individual transformations including an evaluation of the need for HTE screening. We introduce an algorithm for designing 96-well plates optimized towards reaction yields and discuss the model performance of zero- and few-shot machine learning. The best-performing machine learning model achieved a three-category classification accuracy of 76.3% (±0.2%) and an F 1-score for a binary classification of 79.1% (±0.9%). Validation on eight reactions revealed a receiver operating characteristic (ROC) curve (AUC) value of 0.82 (±0.07) for few-shot machine learning. On the other hand, zero-shot machine learning models achieved a mean ROC-AUC value of 0.63 (±0.16). This study positively advocates the application of few-shot machine learning-guided reaction condition selection for HTE campaigns in medicinal chemistry and highlights practical applications as well as challenges associated with zero-shot machine learning.
Collapse
Affiliation(s)
- Kenneth Atz
- Roche Pharma Research and Early Development (pRED), Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd. Grenzacherstrasse 124 4070 Basel Switzerland
| | - David F Nippa
- Roche Pharma Research and Early Development (pRED), Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd. Grenzacherstrasse 124 4070 Basel Switzerland
| | - Alex T Müller
- Roche Pharma Research and Early Development (pRED), Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd. Grenzacherstrasse 124 4070 Basel Switzerland
| | - Vera Jost
- Roche Pharma Research and Early Development (pRED), Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd. Grenzacherstrasse 124 4070 Basel Switzerland
| | - Andrea Anelli
- Roche Pharma Research and Early Development (pRED), Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd. Grenzacherstrasse 124 4070 Basel Switzerland
| | - Michael Reutlinger
- Roche Pharma Research and Early Development (pRED), Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd. Grenzacherstrasse 124 4070 Basel Switzerland
| | - Christian Kramer
- Roche Pharma Research and Early Development (pRED), Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd. Grenzacherstrasse 124 4070 Basel Switzerland
| | - Rainer E Martin
- Roche Pharma Research and Early Development (pRED), Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd. Grenzacherstrasse 124 4070 Basel Switzerland
| | - Uwe Grether
- Roche Pharma Research and Early Development (pRED), Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd. Grenzacherstrasse 124 4070 Basel Switzerland
| | - Gisbert Schneider
- Department of Chemistry and Applied Biosciences, ETH Zurich Vladimir-Prelog-Weg 4 8093 Zurich Switzerland
| | - Georg Wuitschik
- Roche Pharma Research and Early Development (pRED), Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd. Grenzacherstrasse 124 4070 Basel Switzerland
| |
Collapse
|
5
|
Yin X, Wang J, Ge M, Feng X, Zhang G. Designing Small Molecule PI3Kγ Inhibitors: A Review of Structure-Based Methods and Computational Approaches. J Med Chem 2024; 67:10530-10547. [PMID: 38988222 DOI: 10.1021/acs.jmedchem.4c00347] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/12/2024]
Abstract
The PI3K/AKT/mTOR pathway plays critical roles in a wide array of biological processes. Phosphatidylinositol 3-kinase gamma (PI3Kγ), a class IB PI3K family member, represents a potential therapeutic opportunity for the treatment of cancer, inflammation, and autoimmunity. In this Perspective, we provide a comprehensive overview of the structure, biological function, and regulation of PI3Kγ. We also focus on the development of PI3Kγ inhibitors over the past decade and emphasize their binding modes, structure-activity relationships, and pharmacological activities. The application of computational technologies and artificial intelligence in the discovery of novel PI3Kγ inhibitors is also introduced. This review aims to provide a timely and updated overview on the strategies for targeting PI3Kγ.
Collapse
Affiliation(s)
- Xiaoming Yin
- Hebei University of Science & Technology, Shijiazhuang 050018, People's Republic of China
- Hebei Research Center of Pharmaceutical and Chemical Engineering, Shijiazhuang 050018, People's Republic of China
| | - Jiaying Wang
- Hebei University of Science & Technology, Shijiazhuang 050018, People's Republic of China
- Hebei Research Center of Pharmaceutical and Chemical Engineering, Shijiazhuang 050018, People's Republic of China
| | - Minghao Ge
- Hebei University of Science & Technology, Shijiazhuang 050018, People's Republic of China
- Hebei Research Center of Pharmaceutical and Chemical Engineering, Shijiazhuang 050018, People's Republic of China
| | - Xue Feng
- Hebei University of Science & Technology, Shijiazhuang 050018, People's Republic of China
| | - Guogang Zhang
- Hebei University of Science & Technology, Shijiazhuang 050018, People's Republic of China
- Hebei Research Center of Pharmaceutical and Chemical Engineering, Shijiazhuang 050018, People's Republic of China
| |
Collapse
|
6
|
Thomas M, Ahmad M, Tresadern G, de Fabritiis G. PromptSMILES: prompting for scaffold decoration and fragment linking in chemical language models. J Cheminform 2024; 16:77. [PMID: 38965600 PMCID: PMC11225391 DOI: 10.1186/s13321-024-00866-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2024] [Accepted: 06/04/2024] [Indexed: 07/06/2024] Open
Abstract
SMILES-based generative models are amongst the most robust and successful recent methods used to augment drug design. They are typically used for complete de novo generation, however, scaffold decoration and fragment linking applications are sometimes desirable which requires a different grammar, architecture, training dataset and therefore, re-training of a new model. In this work, we describe a simple procedure to conduct constrained molecule generation with a SMILES-based generative model to extend applicability to scaffold decoration and fragment linking by providing SMILES prompts, without the need for re-training. In combination with reinforcement learning, we show that pre-trained, decoder-only models adapt to these applications quickly and can further optimize molecule generation towards a specified objective. We compare the performance of this approach to a variety of orthogonal approaches and show that performance is comparable or better. For convenience, we provide an easy-to-use python package to facilitate model sampling which can be found on GitHub and the Python Package Index.Scientific contributionThis novel method extends an autoregressive chemical language model to scaffold decoration and fragment linking scenarios. This doesn't require re-training, the use of a bespoke grammar, or curation of a custom dataset, as commonly required by other approaches.
Collapse
Affiliation(s)
- Morgan Thomas
- Computational Science Laboratory, Universitat Pompeu Fabra, Barcelona Biomedical Research Park (PRBB), C Dr. Aguiader 88, 08003, Barcelona, Spain.
| | - Mazen Ahmad
- In Silico Discovery, Janssen Pharmaceutica N. V., Turnhoutseweg 30, 2340, Beerse, Belgium
| | - Gary Tresadern
- In Silico Discovery, Janssen Pharmaceutica N. V., Turnhoutseweg 30, 2340, Beerse, Belgium
| | - Gianni de Fabritiis
- Computational Science Laboratory, Universitat Pompeu Fabra, Barcelona Biomedical Research Park (PRBB), C Dr. Aguiader 88, 08003, Barcelona, Spain.
- Acellera Labs, C Dr. Trueta 183, 08005, Barcelona, Spain.
- Institució Catalana de Recerca i Estudis Avançats (ICREA), Passeig Lluis Companys 23, 08010, Barcelona, Spain.
| |
Collapse
|
7
|
Jiang X, Lu L, Li J, Jiang J, Zhang J, Zhou S, Wen H, Cai H, Luo X, Li Z, Wang J, Ju B, Bai R. Synthetically Feasible De Novo Molecular Design of Leads Based on a Reinforcement Learning Model: AI-Assisted Discovery of an Anti-IBD Lead Targeting CXCR4. J Med Chem 2024; 67:10057-10075. [PMID: 38863440 DOI: 10.1021/acs.jmedchem.4c00184] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/13/2024]
Abstract
Artificial intelligence (AI) de novo molecular generation provides leads with novel structures for drug discovery. However, the target affinity and synthesizability of the generated molecules present critical challenges for the successful application of AI technology. Therefore, we developed an advanced reinforcement learning model to bridge the gap between the theory of de novo molecular generation and the practical aspects of drug discovery. This model utilizes chemical reaction templates and commercially available building blocks as a starting point and employs forward reaction prediction to generate molecules, while real-time docking and drug-likeness predictions are conducted to ensure synthesizability and drug-likeness. We applied this model to design active molecules targeting the inflammation-related receptor CXCR4 and successfully prepared them according to the AI-proposed synthetic routes. Several molecules exhibited potent anti-CXCR4 and anti-inflammatory activity in subsequent in vitro and in vivo assays. The top-performing compound XVI alleviated symptoms related to inflammatory bowel disease and showed reasonable pharmacokinetic properties.
Collapse
Affiliation(s)
- Xiaoying Jiang
- School of Pharmacy, Hangzhou Normal University, Hangzhou 311121, PR China
- Key Laboratory of Elemene Class Anti-Cancer Chinese Medicines; Engineering Laboratory of Development and Application of Traditional Chinese Medicines; Collaborative Innovation Center of Traditional Chinese Medicines of Zhejiang Province, Hangzhou Normal University, Hangzhou 311121, PR China
| | - Liuxin Lu
- School of Pharmacy, Hangzhou Normal University, Hangzhou 311121, PR China
- Key Laboratory of Elemene Class Anti-Cancer Chinese Medicines; Engineering Laboratory of Development and Application of Traditional Chinese Medicines; Collaborative Innovation Center of Traditional Chinese Medicines of Zhejiang Province, Hangzhou Normal University, Hangzhou 311121, PR China
| | - Junjie Li
- School of Pharmacy, Hangzhou Normal University, Hangzhou 311121, PR China
- Key Laboratory of Elemene Class Anti-Cancer Chinese Medicines; Engineering Laboratory of Development and Application of Traditional Chinese Medicines; Collaborative Innovation Center of Traditional Chinese Medicines of Zhejiang Province, Hangzhou Normal University, Hangzhou 311121, PR China
| | - Jing Jiang
- SanOmics AI Co. Ltd., Hangzhou 311103, PR China
| | - Jiapeng Zhang
- School of Pharmaceutical Science and Technology, Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Hangzhou 310024, PR China
| | - Shengbin Zhou
- School of Pharmaceutical Science and Technology, Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Hangzhou 310024, PR China
| | - Hao Wen
- School of Pharmacy, Hangzhou Normal University, Hangzhou 311121, PR China
- Key Laboratory of Elemene Class Anti-Cancer Chinese Medicines; Engineering Laboratory of Development and Application of Traditional Chinese Medicines; Collaborative Innovation Center of Traditional Chinese Medicines of Zhejiang Province, Hangzhou Normal University, Hangzhou 311121, PR China
| | - Hong Cai
- School of Pharmacy, Hangzhou Normal University, Hangzhou 311121, PR China
- Key Laboratory of Elemene Class Anti-Cancer Chinese Medicines; Engineering Laboratory of Development and Application of Traditional Chinese Medicines; Collaborative Innovation Center of Traditional Chinese Medicines of Zhejiang Province, Hangzhou Normal University, Hangzhou 311121, PR China
| | - Xinyu Luo
- School of Pharmacy, Hangzhou Normal University, Hangzhou 311121, PR China
- Key Laboratory of Elemene Class Anti-Cancer Chinese Medicines; Engineering Laboratory of Development and Application of Traditional Chinese Medicines; Collaborative Innovation Center of Traditional Chinese Medicines of Zhejiang Province, Hangzhou Normal University, Hangzhou 311121, PR China
| | - Zhen Li
- SanOmics AI Co. Ltd., Hangzhou 311103, PR China
| | - Jiahui Wang
- School of Pharmacy, Hangzhou Normal University, Hangzhou 311121, PR China
- Key Laboratory of Elemene Class Anti-Cancer Chinese Medicines; Engineering Laboratory of Development and Application of Traditional Chinese Medicines; Collaborative Innovation Center of Traditional Chinese Medicines of Zhejiang Province, Hangzhou Normal University, Hangzhou 311121, PR China
| | - Bin Ju
- SanOmics AI Co. Ltd., Hangzhou 311103, PR China
| | - Renren Bai
- School of Pharmacy, Hangzhou Normal University, Hangzhou 311121, PR China
- Key Laboratory of Elemene Class Anti-Cancer Chinese Medicines; Engineering Laboratory of Development and Application of Traditional Chinese Medicines; Collaborative Innovation Center of Traditional Chinese Medicines of Zhejiang Province, Hangzhou Normal University, Hangzhou 311121, PR China
| |
Collapse
|
8
|
Guo J, Schwaller P. Augmented Memory: Sample-Efficient Generative Molecular Design with Reinforcement Learning. JACS AU 2024; 4:2160-2172. [PMID: 38938817 PMCID: PMC11200228 DOI: 10.1021/jacsau.4c00066] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/23/2024] [Revised: 03/29/2024] [Accepted: 04/01/2024] [Indexed: 06/29/2024]
Abstract
Sample efficiency is a fundamental challenge in de novo molecular design. Ideally, molecular generative models should learn to satisfy a desired objective under minimal calls to oracles (computational property predictors). This problem becomes more apparent when using oracles that can provide increased predictive accuracy but impose significant computational cost. Consequently, designing molecules that are optimized for such oracles cannot be achieved under a practical computational budget. Molecular generative models based on simplified molecular-input line-entry system (SMILES) have shown remarkable sample efficiency when coupled with reinforcement learning, as demonstrated in the practical molecular optimization (PMO) benchmark. Here, we first show that experience replay drastically improves the performance of multiple previously proposed algorithms. Next, we propose a novel algorithm called Augmented Memory that combines data augmentation with experience replay. We show that scores obtained from oracle calls can be reused to update the model multiple times. We compare Augmented Memory to previously proposed algorithms and show significantly enhanced sample efficiency in an exploitation task, a drug discovery case study requiring both exploration and exploitation, and a materials design case study optimizing explicitly for quantum-mechanical properties. Our method achieves a new state-of-the-art in sample-efficient de novo molecular design, outperforming all of the previously reported methods. The code is available at https://github.com/schwallergroup/augmented_memory.
Collapse
Affiliation(s)
- Jeff Guo
- Laboratory
of Artificial Chemical Intelligence (LIAC), Institut des Sciences
et Ingénierie Chimiques, Ecole Polytechnique
Fédérale de Lausanne (EPFL), Lausanne 1015, Switzerland
- National
Centre of Competence in Research (NCCR) Catalysis, Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne 1015, Switzerland
| | - Philippe Schwaller
- Laboratory
of Artificial Chemical Intelligence (LIAC), Institut des Sciences
et Ingénierie Chimiques, Ecole Polytechnique
Fédérale de Lausanne (EPFL), Lausanne 1015, Switzerland
- National
Centre of Competence in Research (NCCR) Catalysis, Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne 1015, Switzerland
| |
Collapse
|
9
|
Yu H, Zheng Y, Yang X. scDM: A deep generative method for cell surface protein prediction with diffusion model. J Mol Biol 2024; 436:168610. [PMID: 38754773 DOI: 10.1016/j.jmb.2024.168610] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2024] [Revised: 05/06/2024] [Accepted: 05/09/2024] [Indexed: 05/18/2024]
Abstract
The executors of organismal functions are proteins, and the transition from RNA to protein is subject to post-transcriptional regulation; therefore, considering both RNA and surface protein expression simultaneously can provide additional evidence of biological processes. Cellular indexing of transcriptomes and epitopes by sequencing (CITE-Seq) technology can measure both RNA and protein expression in single cells, but these experiments are expensive and time-consuming. Due to the lack of computational tools for predicting surface proteins, we used datasets obtained with CITE-seq technology to design a deep generative prediction method based on diffusion models and to find biological discoveries through the prediction results. In our method, the scDM, which predicts protein expression values from RNA expression values of individual cells, uses a novel way of encoding the data into a model and generates predicted samples by introducing Gaussian noise to gradually remove the noise to learn the data distribution during the modelling process. Comprehensive evaluation across different datasets demonstrated that our predictions yielded satisfactory results and further demonstrated the effectiveness of incorporating information from single-cell multiomics data into diffusion models for biological studies. We also found that new directions for discovering therapeutic drug targets could be provided by jointly analysing the predictive value of surface protein expression and cancer cell drug scores.
Collapse
Affiliation(s)
- Hanlei Yu
- School of Information Science and Engineering, Shandong Normal University, Jinan 250358, China
| | - Yuanjie Zheng
- School of Information Science and Engineering, Shandong Normal University, Jinan 250358, China.
| | - Xinbo Yang
- School of Information Science and Engineering, Shandong Normal University, Jinan 250358, China
| |
Collapse
|
10
|
Fan Z, Yu J, Zhang X, Chen Y, Sun S, Zhang Y, Chen M, Xiao F, Wu W, Li X, Zheng M, Luo X, Wang D. Reducing overconfident errors in molecular property classification using Posterior Network. PATTERNS (NEW YORK, N.Y.) 2024; 5:100991. [PMID: 39005492 PMCID: PMC11240180 DOI: 10.1016/j.patter.2024.100991] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/16/2023] [Revised: 12/20/2023] [Accepted: 04/15/2024] [Indexed: 07/16/2024]
Abstract
Deep-learning-based classification models are increasingly used for predicting molecular properties in drug development. However, traditional classification models using the Softmax function often give overconfident mispredictions for out-of-distribution samples, highlighting a critical lack of accurate uncertainty estimation. Such limitations can result in substantial costs and should be avoided during drug development. Inspired by advances in evidential deep learning and Posterior Network, we replaced the Softmax function with a normalizing flow to enhance the uncertainty estimation ability of the model in molecular property classification. The proposed strategy was evaluated across diverse scenarios, including simulated experiments based on a synthetic dataset, ADMET predictions, and ligand-based virtual screening. The results demonstrate that compared with the vanilla model, the proposed strategy effectively alleviates the problem of giving overconfident but incorrect predictions. Our findings support the promising application of evidential deep learning in drug development and offer a valuable framework for further research.
Collapse
Affiliation(s)
- Zhehuan Fan
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai 201203, China
- University of Chinese Academy of Sciences, 19A Yuquan Road, Beijing 100049, China
| | - Jie Yu
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai 201203, China
- University of Chinese Academy of Sciences, 19A Yuquan Road, Beijing 100049, China
| | - Xiang Zhang
- School of Chinese Materia Medica, Nanjing University of Chinese Medicine, Nanjing 210023, China
| | - Yijie Chen
- School of Chinese Materia Medica, Nanjing University of Chinese Medicine, Nanjing 210023, China
| | - Shihui Sun
- School of Chinese Materia Medica, Nanjing University of Chinese Medicine, Nanjing 210023, China
| | - Yuanyuan Zhang
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai 201203, China
- University of Chinese Academy of Sciences, 19A Yuquan Road, Beijing 100049, China
| | - Mingan Chen
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai 201203, China
- School of Physical Science and Technology, ShanghaiTech University, Shanghai 201210, China
- Lingang Laboratory, Shanghai 200031, China
| | - Fu Xiao
- School of Chinese Materia Medica, Nanjing University of Chinese Medicine, Nanjing 210023, China
| | - Wenyong Wu
- Lingang Laboratory, Shanghai 200031, China
| | - Xutong Li
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai 201203, China
- University of Chinese Academy of Sciences, 19A Yuquan Road, Beijing 100049, China
| | - Mingyue Zheng
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai 201203, China
- University of Chinese Academy of Sciences, 19A Yuquan Road, Beijing 100049, China
- School of Chinese Materia Medica, Nanjing University of Chinese Medicine, Nanjing 210023, China
| | - Xiaomin Luo
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai 201203, China
- University of Chinese Academy of Sciences, 19A Yuquan Road, Beijing 100049, China
- School of Chinese Materia Medica, Nanjing University of Chinese Medicine, Nanjing 210023, China
| | | |
Collapse
|
11
|
Retchin M, Wang Y, Takaba K, Chodera JD. DrugGym: A testbed for the economics of autonomous drug discovery. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.05.28.596296. [PMID: 38854082 PMCID: PMC11160604 DOI: 10.1101/2024.05.28.596296] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2024]
Abstract
Drug discovery is stochastic. The effectiveness of candidate compounds in satisfying design objectives is unknown ahead of time, and the tools used for prioritization-predictive models and assays-are inaccurate and noisy. In a typical discovery campaign, thousands of compounds may be synthesized and tested before design objectives are achieved, with many others ideated but deprioritized. These challenges are well-documented, but assessing potential remedies has been difficult. We introduce DrugGym, a framework for modeling the stochastic process of drug discovery. Emulating biochemical assays with realistic surrogate models, we simulate the progression from weak hits to sub-micromolar leads with viable ADME. We use this testbed to examine how different ideation, scoring, and decision-making strategies impact statistical measures of utility, such as the probability of program success within predefined budgets and the expected costs to achieve target candidate profile (TCP) goals. We also assess the influence of affinity model inaccuracy, chemical creativity, batch size, and multi-step reasoning. Our findings suggest that reducing affinity model inaccuracy from 2 to 0.5 pIC50 units improves budget-constrained success rates tenfold. DrugGym represents a realistic testbed for machine learning methods applied to the hit-to-lead phase. Source code is available at www.drug-gym.org.
Collapse
Affiliation(s)
- Michael Retchin
- Tri-Institutional PhD Program in Computational Biology and Medicine, Weill Cornell Medical College, Cornell University, New York, NY 10065
| | - Yuanqing Wang
- Computational and Systems Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, NY 10065
- Simons Center for Computational Chemistry and Center for Data Science, New York University, New York, NY 10004
| | - Kenichiro Takaba
- Computational and Systems Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, NY 10065
- Pharmaceutical Research Center, Advanced Drug Discovery, Asahi Kasei Pharma Corporation, Shizuoka 410-2321, Japan
| | - John D. Chodera
- Tri-Institutional PhD Program in Computational Biology and Medicine, Weill Cornell Medical College, Cornell University, New York, NY 10065
- Computational and Systems Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, NY 10065
| |
Collapse
|
12
|
Nada H, Kim S, Lee K. PT-Finder: A multi-modal neural network approach to target identification. Comput Biol Med 2024; 174:108444. [PMID: 38636325 DOI: 10.1016/j.compbiomed.2024.108444] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2024] [Revised: 04/04/2024] [Accepted: 04/07/2024] [Indexed: 04/20/2024]
Abstract
Efficient target identification for bioactive compounds, including novel synthetic analogs, is crucial for accelerating the drug discovery pipeline. However, the process of target identification presents significant challenges and is often expensive, which in turn can hinder the drug discovery efforts. To address these challenges machine learning applications have arisen as a promising approach for predicting the targets for novel chemical compounds. These methods allow the exploration of ligand-target interactions, uncovering of biochemical mechanisms, and the investigation of drug repurposing. Typically, the current target identification tools rely on assessing ligand structural similarities. Herein, a multi-modal neural network model was built using a library of proteins, their respective sequences, and active inhibitors. Subsequent validations showed the model to possess accuracy of 82 % and MPRAUC of 0.80. Leveraging the trained model, we developed PT-Finder (Protein Target Finder), a user-friendly offline application that is capable of predicting the target proteins for hundreds of compounds within a few seconds. This combination of offline operation, speed, and accuracy positions PT-Finder as a powerful tool to accelerate drug discovery workflows. PT-Finder and its source codes have been made freely accessible for download at https://github.com/PT-Finder/PT-Finder.
Collapse
Affiliation(s)
- Hossam Nada
- BK21 FOUR Team and Integrated Research Institute for Drug Development, College of Pharmacy, Dongguk University-Seoul, Goyang, 10326, Republic of Korea
| | - Sungdo Kim
- BK21 FOUR Team and Integrated Research Institute for Drug Development, College of Pharmacy, Dongguk University-Seoul, Goyang, 10326, Republic of Korea
| | - Kyeong Lee
- BK21 FOUR Team and Integrated Research Institute for Drug Development, College of Pharmacy, Dongguk University-Seoul, Goyang, 10326, Republic of Korea.
| |
Collapse
|
13
|
Atz K, Cotos L, Isert C, Håkansson M, Focht D, Hilleke M, Nippa DF, Iff M, Ledergerber J, Schiebroek CCG, Romeo V, Hiss JA, Merk D, Schneider P, Kuhn B, Grether U, Schneider G. Prospective de novo drug design with deep interactome learning. Nat Commun 2024; 15:3408. [PMID: 38649351 PMCID: PMC11035696 DOI: 10.1038/s41467-024-47613-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2023] [Accepted: 04/02/2024] [Indexed: 04/25/2024] Open
Abstract
De novo drug design aims to generate molecules from scratch that possess specific chemical and pharmacological properties. We present a computational approach utilizing interactome-based deep learning for ligand- and structure-based generation of drug-like molecules. This method capitalizes on the unique strengths of both graph neural networks and chemical language models, offering an alternative to the need for application-specific reinforcement, transfer, or few-shot learning. It enables the "zero-shot" construction of compound libraries tailored to possess specific bioactivity, synthesizability, and structural novelty. In order to proactively evaluate the deep interactome learning framework for protein structure-based drug design, potential new ligands targeting the binding site of the human peroxisome proliferator-activated receptor (PPAR) subtype gamma are generated. The top-ranking designs are chemically synthesized and computationally, biophysically, and biochemically characterized. Potent PPAR partial agonists are identified, demonstrating favorable activity and the desired selectivity profiles for both nuclear receptors and off-target interactions. Crystal structure determination of the ligand-receptor complex confirms the anticipated binding mode. This successful outcome positively advocates interactome-based de novo design for application in bioorganic and medicinal chemistry, enabling the creation of innovative bioactive molecules.
Collapse
Affiliation(s)
- Kenneth Atz
- ETH Zurich, Department of Chemistry and Applied Biosciences, Vladimir-Prelog-Weg 4, 8093, Zurich, Switzerland
| | - Leandro Cotos
- ETH Zurich, Department of Chemistry and Applied Biosciences, Vladimir-Prelog-Weg 4, 8093, Zurich, Switzerland
| | - Clemens Isert
- ETH Zurich, Department of Chemistry and Applied Biosciences, Vladimir-Prelog-Weg 4, 8093, Zurich, Switzerland
| | - Maria Håkansson
- SARomics Biostructures AB, Medicon Village, SE-223 81, Lund, Sweden
| | - Dorota Focht
- SARomics Biostructures AB, Medicon Village, SE-223 81, Lund, Sweden
| | - Mattis Hilleke
- ETH Zurich, Department of Chemistry and Applied Biosciences, Vladimir-Prelog-Weg 4, 8093, Zurich, Switzerland
| | - David F Nippa
- Roche Pharma Research and Early Development (pRED), Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd., Grenzacherstrasse 124, CH-4070, Basel, Switzerland
- Department of Pharmacy, Ludwig-Maximilians-Universität München, Butenandtstrasse 5, 81377, Munich, Germany
| | - Michael Iff
- ETH Zurich, Department of Chemistry and Applied Biosciences, Vladimir-Prelog-Weg 4, 8093, Zurich, Switzerland
| | - Jann Ledergerber
- ETH Zurich, Department of Chemistry and Applied Biosciences, Vladimir-Prelog-Weg 4, 8093, Zurich, Switzerland
| | - Carl C G Schiebroek
- ETH Zurich, Department of Chemistry and Applied Biosciences, Vladimir-Prelog-Weg 4, 8093, Zurich, Switzerland
| | - Valentina Romeo
- Roche Pharma Research and Early Development (pRED), Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd., Grenzacherstrasse 124, CH-4070, Basel, Switzerland
| | - Jan A Hiss
- ETH Zurich, Department of Chemistry and Applied Biosciences, Vladimir-Prelog-Weg 4, 8093, Zurich, Switzerland
| | - Daniel Merk
- Department of Pharmacy, Ludwig-Maximilians-Universität München, Butenandtstrasse 5, 81377, Munich, Germany
| | - Petra Schneider
- ETH Zurich, Department of Chemistry and Applied Biosciences, Vladimir-Prelog-Weg 4, 8093, Zurich, Switzerland
| | - Bernd Kuhn
- Roche Pharma Research and Early Development (pRED), Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd., Grenzacherstrasse 124, CH-4070, Basel, Switzerland
| | - Uwe Grether
- Roche Pharma Research and Early Development (pRED), Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd., Grenzacherstrasse 124, CH-4070, Basel, Switzerland
| | - Gisbert Schneider
- ETH Zurich, Department of Chemistry and Applied Biosciences, Vladimir-Prelog-Weg 4, 8093, Zurich, Switzerland.
| |
Collapse
|
14
|
Shen T, Guo J, Han Z, Zhang G, Liu Q, Si X, Wang D, Wu S, Xia J. AutoMolDesigner for Antibiotic Discovery: An AI-Based Open-Source Software for Automated Design of Small-Molecule Antibiotics. J Chem Inf Model 2024; 64:575-583. [PMID: 38265916 DOI: 10.1021/acs.jcim.3c01562] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/26/2024]
Abstract
Discovery of small-molecule antibiotics with novel chemotypes serves as one of the essential strategies to address antibiotic resistance. Although a considerable number of computational tools committed to molecular design have been reported, there is a deficit in holistic and efficient tools specifically developed for small-molecule antibiotic discovery. To address this issue, we report AutoMolDesigner, a computational modeling software dedicated to small-molecule antibiotic design. It is a generalized framework comprising two functional modules, i.e., generative-deep-learning-enabled molecular generation and automated machine-learning-based antibacterial activity/property prediction, wherein individually trained models and curated datasets are out-of-the-box for whole-cell-based antibiotic screening and design. It is open-source, thus allowing for the incorporation of new features for flexible use. Unlike most software programs based on Linux and command lines, this application equipped with a Qt-based graphical user interface can be run on personal computers with multiple operating systems, making it much easier to use for experimental scientists. The software and related materials are freely available at GitHub (https://github.com/taoshen99/AutoMolDesigner) and Zenodo (https://zenodo.org/record/10097899).
Collapse
Affiliation(s)
- Tao Shen
- State Key Laboratory of Bioactive Substance and Function of Natural Medicines, Institute of Materia Medica, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing 100050, China
| | - Jiale Guo
- State Key Laboratory of Bioactive Substance and Function of Natural Medicines, Institute of Materia Medica, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing 100050, China
| | - Zunsheng Han
- State Key Laboratory of Bioactive Substance and Function of Natural Medicines, Institute of Materia Medica, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing 100050, China
| | - Gao Zhang
- State Key Laboratory of Bioactive Substance and Function of Natural Medicines, Institute of Materia Medica, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing 100050, China
| | - Qingxin Liu
- State Key Laboratory of Bioactive Substance and Function of Natural Medicines, Institute of Materia Medica, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing 100050, China
- School of Pharmacy, Jiangsu Ocean University, Lianyungang, Jiangsu 222005, China
| | - Xinxin Si
- School of Pharmacy, Jiangsu Ocean University, Lianyungang, Jiangsu 222005, China
| | - Dongmei Wang
- State Key Laboratory of Bioactive Substance and Function of Natural Medicines, Institute of Materia Medica, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing 100050, China
| | - Song Wu
- State Key Laboratory of Bioactive Substance and Function of Natural Medicines, Institute of Materia Medica, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing 100050, China
| | - Jie Xia
- State Key Laboratory of Bioactive Substance and Function of Natural Medicines, Institute of Materia Medica, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing 100050, China
| |
Collapse
|
15
|
Gangwal A, Ansari A, Ahmad I, Azad AK, Kumarasamy V, Subramaniyan V, Wong LS. Generative artificial intelligence in drug discovery: basic framework, recent advances, challenges, and opportunities. Front Pharmacol 2024; 15:1331062. [PMID: 38384298 PMCID: PMC10879372 DOI: 10.3389/fphar.2024.1331062] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2023] [Accepted: 01/17/2024] [Indexed: 02/23/2024] Open
Abstract
There are two main ways to discover or design small drug molecules. The first involves fine-tuning existing molecules or commercially successful drugs through quantitative structure-activity relationships and virtual screening. The second approach involves generating new molecules through de novo drug design or inverse quantitative structure-activity relationship. Both methods aim to get a drug molecule with the best pharmacokinetic and pharmacodynamic profiles. However, bringing a new drug to market is an expensive and time-consuming endeavor, with the average cost being estimated at around $2.5 billion. One of the biggest challenges is screening the vast number of potential drug candidates to find one that is both safe and effective. The development of artificial intelligence in recent years has been phenomenal, ushering in a revolution in many fields. The field of pharmaceutical sciences has also significantly benefited from multiple applications of artificial intelligence, especially drug discovery projects. Artificial intelligence models are finding use in molecular property prediction, molecule generation, virtual screening, synthesis planning, repurposing, among others. Lately, generative artificial intelligence has gained popularity across domains for its ability to generate entirely new data, such as images, sentences, audios, videos, novel chemical molecules, etc. Generative artificial intelligence has also delivered promising results in drug discovery and development. This review article delves into the fundamentals and framework of various generative artificial intelligence models in the context of drug discovery via de novo drug design approach. Various basic and advanced models have been discussed, along with their recent applications. The review also explores recent examples and advances in the generative artificial intelligence approach, as well as the challenges and ongoing efforts to fully harness the potential of generative artificial intelligence in generating novel drug molecules in a faster and more affordable manner. Some clinical-level assets generated form generative artificial intelligence have also been discussed in this review to show the ever-increasing application of artificial intelligence in drug discovery through commercial partnerships.
Collapse
Affiliation(s)
- Amit Gangwal
- Department of Natural Product Chemistry, Shri Vile Parle Kelavani Mandal’s Institute of Pharmacy, Dhule, Maharashtra, India
| | - Azim Ansari
- Computer Aided Drug Design Center Shri Vile Parle Kelavani Mandal’s Institute of Pharmacy, Dhule, Maharashtra, India
| | - Iqrar Ahmad
- Department of Pharmaceutical Chemistry, Prof. Ravindra Nikam College of Pharmacy, Dhule, India
| | - Abul Kalam Azad
- Faculty of Pharmacy, University College of MAIWP International, Batu Caves, Malaysia
| | - Vinoth Kumarasamy
- Department of Parasitology and Medical Entomology, Faculty of Medicine, Universiti Kebangsaan Malaysia, Cheras, Malaysia
| | - Vetriselvan Subramaniyan
- Pharmacology Unit, Jeffrey Cheah School of Medicine and Health Sciences, Monash University Malaysia, Selangor, Malaysia
- School of Bioengineering and Biosciences, Lovely Professional University, Phagwara, Punjab, India
| | - Ling Shing Wong
- Faculty of Health and Life Sciences, INTI International University, Nilai, Malaysia
| |
Collapse
|
16
|
Tropsha A, Isayev O, Varnek A, Schneider G, Cherkasov A. Integrating QSAR modelling and deep learning in drug discovery: the emergence of deep QSAR. Nat Rev Drug Discov 2024; 23:141-155. [PMID: 38066301 DOI: 10.1038/s41573-023-00832-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/21/2023] [Indexed: 02/08/2024]
Abstract
Quantitative structure-activity relationship (QSAR) modelling, an approach that was introduced 60 years ago, is widely used in computer-aided drug design. In recent years, progress in artificial intelligence techniques, such as deep learning, the rapid growth of databases of molecules for virtual screening and dramatic improvements in computational power have supported the emergence of a new field of QSAR applications that we term 'deep QSAR'. Marking a decade from the pioneering applications of deep QSAR to tasks involved in small-molecule drug discovery, we herein describe key advances in the field, including deep generative and reinforcement learning approaches in molecular design, deep learning models for synthetic planning and the application of deep QSAR models in structure-based virtual screening. We also reflect on the emergence of quantum computing, which promises to further accelerate deep QSAR applications and the need for open-source and democratized resources to support computer-aided drug design.
Collapse
Affiliation(s)
| | | | | | | | - Artem Cherkasov
- University of British Columbia, Vancouver, BC, Canada.
- Photonic Inc., Coquitlam, BC, Canada.
| |
Collapse
|
17
|
Chandra R, Horne RI, Vendruscolo M. Bayesian Optimization in the Latent Space of a Variational Autoencoder for the Generation of Selective FLT3 Inhibitors. J Chem Theory Comput 2024; 20:469-476. [PMID: 38112559 PMCID: PMC10782437 DOI: 10.1021/acs.jctc.3c01224] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2023] [Revised: 11/25/2023] [Accepted: 11/27/2023] [Indexed: 12/21/2023]
Abstract
The process of drug design requires the initial identification of compounds that bind their targets with high affinity and selectivity. Advances in generative modeling of small molecules based on deep learning are offering novel opportunities for making this process faster and cheaper. Here, we propose an approach to achieve this goal, where predictions of binding affinity are used in conjunction with the Junction Tree Variational Autoencoder (JTVAE) whose latent space is used to facilitate the efficient exploration of the chemical space using a Bayesian optimization strategy. The exploration identifies small molecules predicted to have both high affinity and high selectivity by using an objective function that optimizes the binding to the target while penalizing the binding to off-targets. The framework is demonstrated for FMS-like tyrosine kinase 3 (FLT3) and shown to predict small molecules with predicted affinity and selectivity comparable to those of clinically approved drugs for this target.
Collapse
Affiliation(s)
- Raghav Chandra
- Centre for Misfolding Diseases,
Yusuf Hamied Department of Chemistry, University
of Cambridge, Cambridge CB2 1EW, U.K.
| | - Robert I. Horne
- Centre for Misfolding Diseases,
Yusuf Hamied Department of Chemistry, University
of Cambridge, Cambridge CB2 1EW, U.K.
| | - Michele Vendruscolo
- Centre for Misfolding Diseases,
Yusuf Hamied Department of Chemistry, University
of Cambridge, Cambridge CB2 1EW, U.K.
| |
Collapse
|
18
|
Bajorath J. Chemical language models for molecular design. Mol Inform 2024; 43:e202300288. [PMID: 38010610 DOI: 10.1002/minf.202300288] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2023] [Revised: 11/22/2023] [Accepted: 11/23/2023] [Indexed: 11/29/2023]
Abstract
In drug discovery, chemical language models (CLMs) originating from natural language processing offer new opportunities for molecular design. CLMs have been developed using recurrent neural network (RNN) or transformer architectures. For the predictive performance of RNN-based encoder-decoder frameworks and transformers, attention mechanisms play a central role. Among others, emerging application areas for CLMs include constrained generative modeling and the prediction of chemical reactions or drug-target interactions. Since CLMs are applicable to any compound or target data that can be presented in a sequential format and tokenized, mappings of different types of sequences can be learned. For example, active compounds can be predicted from protein sequence motifs. Novel off-the-beat-path applications can also be considered. For example, analogue series from medicinal chemistry can be perceived and represented as chemical sequences and extended with new compounds using CLMs. Herein, methodological features of CLMs and different applications are discussed.
Collapse
Affiliation(s)
- Jürgen Bajorath
- Department of Life Science Informatics, Bonn-Aachen International Center for Information Technology, Rheinische Friedrich-Wilhelms-Universität Bonn, Friedrich-Hirzebruch-Allee 5/6, D-53115, Bonn, Germany
- Lamarr Institute for Machine Learning and Artificial Intelligence, Rheinische Friedrich-Wilhelms-Universität Bonn, Friedrich-Hirzebruch-Allee 5/6, D-53115, Bonn, Germany
| |
Collapse
|
19
|
Powers A, Yu HH, Suriana P, Koodli RV, Lu T, Paggi JM, Dror RO. Geometric Deep Learning for Structure-Based Ligand Design. ACS CENTRAL SCIENCE 2023; 9:2257-2267. [PMID: 38161364 PMCID: PMC10755842 DOI: 10.1021/acscentsci.3c00572] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 05/08/2023] [Revised: 10/26/2023] [Accepted: 10/27/2023] [Indexed: 01/03/2024]
Abstract
A pervasive challenge in drug design is determining how to expand a ligand-a small molecule that binds to a target biomolecule-in order to improve various properties of the ligand. Adding single chemical groups, known as fragments, is important for lead optimization tasks, and adding multiple fragments is critical for fragment-based drug design. We have developed a comprehensive framework that uses machine learning and three-dimensional protein-ligand structures to address this challenge. Our method, FRAME, iteratively determines where on a ligand to add fragments, selects fragments to add, and predicts the geometry of the added fragments. On a comprehensive benchmark, FRAME consistently improves predicted affinity and selectivity relative to the initial ligand, while generating molecules with more drug-like chemical properties than docking-based methods currently in widespread use. FRAME learns to accurately describe molecular interactions despite being given no prior information on such interactions. The resulting framework for quality molecular hypothesis generation can be easily incorporated into the workflows of medicinal chemists for diverse tasks, including lead optimization, fragment-based drug discovery, and de novo drug design.
Collapse
Affiliation(s)
- Alexander
S. Powers
- Department
of Chemistry, Stanford University, Stanford, California 94305, United States
- Department
of Computer Science, Stanford University, Stanford, California 94305, United States
- Department
of Molecular and Cellular Physiology, Stanford
University School of Medicine, Stanford, California 94305, United States
- Department
of Structural Biology, Stanford University
School of Medicine, Stanford, California 94305, United States
- Institute
for Computational and Mathematical Engineering, Stanford University, Stanford, California 94305, United States
| | - Helen H. Yu
- Department
of Computer Science, Stanford University, Stanford, California 94305, United States
- Department
of Molecular and Cellular Physiology, Stanford
University School of Medicine, Stanford, California 94305, United States
- Department
of Structural Biology, Stanford University
School of Medicine, Stanford, California 94305, United States
- Institute
for Computational and Mathematical Engineering, Stanford University, Stanford, California 94305, United States
| | - Patricia Suriana
- Department
of Computer Science, Stanford University, Stanford, California 94305, United States
- Department
of Molecular and Cellular Physiology, Stanford
University School of Medicine, Stanford, California 94305, United States
- Department
of Structural Biology, Stanford University
School of Medicine, Stanford, California 94305, United States
- Institute
for Computational and Mathematical Engineering, Stanford University, Stanford, California 94305, United States
| | - Rohan V. Koodli
- Department
of Computer Science, Stanford University, Stanford, California 94305, United States
- Department
of Molecular and Cellular Physiology, Stanford
University School of Medicine, Stanford, California 94305, United States
- Department
of Structural Biology, Stanford University
School of Medicine, Stanford, California 94305, United States
- Institute
for Computational and Mathematical Engineering, Stanford University, Stanford, California 94305, United States
- Biomedical
Informatics Program, Stanford University
School of Medicine, Stanford, California 94305, United States
| | - Tianyu Lu
- Department
of Computer Science, Stanford University, Stanford, California 94305, United States
- Department
of Molecular and Cellular Physiology, Stanford
University School of Medicine, Stanford, California 94305, United States
- Department
of Structural Biology, Stanford University
School of Medicine, Stanford, California 94305, United States
- Institute
for Computational and Mathematical Engineering, Stanford University, Stanford, California 94305, United States
- Department
of Bioengineering, Stanford University, Stanford, California 94305, United States
| | - Joseph M. Paggi
- Department
of Computer Science, Stanford University, Stanford, California 94305, United States
- Department
of Molecular and Cellular Physiology, Stanford
University School of Medicine, Stanford, California 94305, United States
- Department
of Structural Biology, Stanford University
School of Medicine, Stanford, California 94305, United States
- Institute
for Computational and Mathematical Engineering, Stanford University, Stanford, California 94305, United States
| | - Ron O. Dror
- Department
of Computer Science, Stanford University, Stanford, California 94305, United States
- Department
of Molecular and Cellular Physiology, Stanford
University School of Medicine, Stanford, California 94305, United States
- Department
of Structural Biology, Stanford University
School of Medicine, Stanford, California 94305, United States
- Institute
for Computational and Mathematical Engineering, Stanford University, Stanford, California 94305, United States
| |
Collapse
|
20
|
Kosonocky CW, Feller AL, Wilke CO, Ellington AD. Using alternative SMILES representations to identify novel functional analogues in chemical similarity vector searches. PATTERNS (NEW YORK, N.Y.) 2023; 4:100865. [PMID: 38106612 PMCID: PMC10724362 DOI: 10.1016/j.patter.2023.100865] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 05/17/2023] [Revised: 08/09/2023] [Accepted: 10/06/2023] [Indexed: 12/19/2023]
Abstract
Chemical similarity searches are a widely used family of in silico methods for identifying pharmaceutical leads. These methods historically relied on structure-based comparisons to compute similarity. Here, we use a chemical language model to create a vector-based chemical search. We extend previous implementations by creating a prompt engineering strategy that utilizes two different chemical string representation algorithms: one for the query and the other for the database. We explore this method by reviewing search results from nine queries with diverse targets. We find that the method identifies molecules with similar patent-derived functionality to the query, as determined by our validated LLM-assisted patent summarization pipeline. Further, many of these functionally similar molecules have different structures and scaffolds from the query, making them unlikely to be found with traditional chemical similarity searches. This method may serve as a new tool for the discovery of novel molecular structural classes that achieve target functionality.
Collapse
Affiliation(s)
- Clayton W. Kosonocky
- Department of Molecular Biosciences, University of Texas at Austin, Austin, TX 78705, USA
| | - Aaron L. Feller
- Department of Molecular Biosciences, University of Texas at Austin, Austin, TX 78705, USA
| | - Claus O. Wilke
- Department of Integrative Biology, University of Texas at Austin, Austin, TX 78705, USA
| | - Andrew D. Ellington
- Department of Molecular Biosciences, University of Texas at Austin, Austin, TX 78705, USA
- Center for Systems and Synthetic Biology, University of Texas at Austin, Austin, TX 78705, USA
| |
Collapse
|
21
|
Ochiai T, Inukai T, Akiyama M, Furui K, Ohue M, Matsumori N, Inuki S, Uesugi M, Sunazuka T, Kikuchi K, Kakeya H, Sakakibara Y. Variational autoencoder-based chemical latent space for large molecular structures with 3D complexity. Commun Chem 2023; 6:249. [PMID: 37973971 PMCID: PMC10654724 DOI: 10.1038/s42004-023-01054-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2023] [Accepted: 11/06/2023] [Indexed: 11/19/2023] Open
Abstract
The structural diversity of chemical libraries, which are systematic collections of compounds that have potential to bind to biomolecules, can be represented by chemical latent space. A chemical latent space is a projection of a compound structure into a mathematical space based on several molecular features, and it can express structural diversity within a compound library in order to explore a broader chemical space and generate novel compound structures for drug candidates. In this study, we developed a deep-learning method, called NP-VAE (Natural Product-oriented Variational Autoencoder), based on variational autoencoder for managing hard-to-analyze datasets from DrugBank and large molecular structures such as natural compounds with chirality, an essential factor in the 3D complexity of compounds. NP-VAE was successful in constructing the chemical latent space from large-sized compounds that were unable to be handled in existing methods, achieving higher reconstruction accuracy, and demonstrating stable performance as a generative model across various indices. Furthermore, by exploring the acquired latent space, we succeeded in comprehensively analyzing a compound library containing natural compounds and generating novel compound structures with optimized functions.
Collapse
Grants
- 22H04901 Ministry of Education, Culture, Sports, Science and Technology (MEXT)
- 17H06410 Ministry of Education, Culture, Sports, Science and Technology (MEXT)
- 23H04885 Ministry of Education, Culture, Sports, Science and Technology (MEXT)
- 23H04880 Ministry of Education, Culture, Sports, Science and Technology (MEXT)
- 23H04881 Ministry of Education, Culture, Sports, Science and Technology (MEXT)
- 23H04887 Ministry of Education, Culture, Sports, Science and Technology (MEXT)
Collapse
Affiliation(s)
- Toshiki Ochiai
- Department of Biosciences and Informatics, Keio University, Yokohama, Kanagawa, 223-8522, Japan
| | - Tensei Inukai
- Department of Biosciences and Informatics, Keio University, Yokohama, Kanagawa, 223-8522, Japan
| | - Manato Akiyama
- Department of Biosciences and Informatics, Keio University, Yokohama, Kanagawa, 223-8522, Japan
| | - Kairi Furui
- Department of Computer Science, School of Computing, Tokyo Institute of Technology, Yokohama, Kanagawa, 226-8501, Japan
| | - Masahito Ohue
- Department of Computer Science, School of Computing, Tokyo Institute of Technology, Yokohama, Kanagawa, 226-8501, Japan
| | - Nobuaki Matsumori
- Department of Chemistry, Graduate School of Science, Kyushu University, Fukuoka, Fukuoka, 819-0395, Japan
| | - Shinsuke Inuki
- Division of Medicinal Frontier Sciences, Graduate School of Pharmaceutical Sciences, Kyoto University, Kyoto, Kyoto, 606-8501, Japan
| | - Motonari Uesugi
- Institute for Chemical Research and WPI-iCeMS, Kyoto University, Uji, Kyoto, 611-0011, Japan
| | - Toshiaki Sunazuka
- Omura Satoshi Memorial Institute and Graduate School of Infection Control Sciences, Kitasato University, Minato-ku, Tokyo, 108-8641, Japan
| | - Kazuya Kikuchi
- Department of Applied Chemistry, Graduate School of Engineering, Osaka University, Suita, Osaka, 565-0871, Japan
- Immunology Frontier Research Centre, Osaka University, Suita, Osaka, 565-0871, Japan
| | - Hideaki Kakeya
- Division of Medicinal Frontier Sciences, Graduate School of Pharmaceutical Sciences, Kyoto University, Kyoto, Kyoto, 606-8501, Japan
| | - Yasubumi Sakakibara
- Department of Biosciences and Informatics, Keio University, Yokohama, Kanagawa, 223-8522, Japan.
- Department of Data Science, Kitasato University School of Frontier Engineering, Sagamihara, Kanagawa, 252-0373, Japan.
| |
Collapse
|
22
|
Liu Y, Wu F, Liu Z, Wang K, Wang F, Qu X. Can language models be used for real-world urban-delivery route optimization? Innovation (N Y) 2023; 4:100520. [PMID: 37869471 PMCID: PMC10587631 DOI: 10.1016/j.xinn.2023.100520] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2023] [Accepted: 09/27/2023] [Indexed: 10/24/2023] Open
Abstract
Language models have contributed to breakthroughs in interdisciplinary research, such as protein design and molecular dynamics understanding. In this study, we reveal that beyond language, representations of other entities, such as human behaviors, that are mappable to learnable sequences can be learned by language models. One compelling example is the real-world delivery route optimization problem. We here propose a novel approach based on the language model to optimize delivery routes on the basis of drivers' historical experiences. Although a broad range of optimization-based approaches have been designed to optimize delivery routes, they do not capture the implicit knowledge of complex delivery operating environments. The model we propose integrates this knowledge in the route optimization process by learning from driving behaviors in experienced drivers. A real-world delivery route that preserves drivers' implicit behavioral patterns is first analogized to a sentence in natural language. Through unsupervised learning, we then learn the vector representations of words and infer the drivers' delivery chains on the basis of the tailored chain-reaction-based algorithm. We also provide insights into the fusion of language models and operations research methods. In our approach, language models are applied to learn drivers' delivery behaviors and infer new deliveries at the delivery zone level, while the classic traveling salesman problem (TSP) model is embedded into the hybrid framework for intra-zone optimization. Numerical experiments performed on real-world data from Amazon's delivery service demonstrate that the proposed approach outperforms pure optimization, supporting the effectiveness, efficiency, and extensibility of our model. As a versatile approach, the proposed framework can easily be extended to various disciplines in which the data follow certain grammar rules. We anticipate that our work will serve as a stepping stone toward the understanding and application of language models in tackling interdisciplinary research problems.
Collapse
Affiliation(s)
- Yang Liu
- State Key Laboratory of Intelligent Green Vehicle and Mobility, Tsinghua University, Beijing 100084, China
| | - Fanyou Wu
- State Key Laboratory of Intelligent Green Vehicle and Mobility, Tsinghua University, Beijing 100084, China
| | - Zhiyuan Liu
- Jiangsu Key Laboratory of Urban ITS, Jiangsu Province Collaborative Innovation Center of Modern Urban Traffic Technologies, School of Transportation, Southeast University, Nanjing 211189, China
| | - Kai Wang
- School of Vehicle and Mobility, Tsinghua University, Beijing 100084, China
| | - Feiyue Wang
- Institute of Automation, State Key Laboratory for Management and Control of Complex Systems, Chinese Academy of Sciences, Beijing 100190, China
| | - Xiaobo Qu
- School of Vehicle and Mobility, Tsinghua University, Beijing 100084, China
| |
Collapse
|
23
|
Wang H, Liu W, Chen J, Wang Z. Applicability Domains Based on Molecular Graph Contrastive Learning Enable Graph Attention Network Models to Accurately Predict 15 Environmental End Points. ENVIRONMENTAL SCIENCE & TECHNOLOGY 2023; 57:16906-16917. [PMID: 37897806 DOI: 10.1021/acs.est.3c03860] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/30/2023]
Abstract
In silico models for predicting physicochemical properties and environmental fate parameters are necessary for the sound management of chemicals. This study employed graph attention network (GAT) algorithms to construct such models on 15 end points. The results showed that the GAT models outperformed the previous state-of-the-art models, and their performance was not influenced by the presence or absence of compounds with certain structures. Molecular similarity density (ρs) was found to be a key metrics characterizing data set modelability, in addition to the proportion of compounds at activity cliffs. By introducing molecular graph (MG) contrastive learning, MG-based ρs and molecular inconsistency in activities (IA) were calculated and employed for characterizing the structure-activity landscape (SAL)-based applicability domain ADSAL{ρs, IA}. The GAT models coupled with ADSAL{ρs, IA} significantly improved the prediction coefficient of determination (R2) on all the end points by an average of 14.4% and enabled all the end points to have R2 > 0.9, which could hardly be achieved previously. The models were employed to screen persistent, mobile, and/or bioaccumulative chemicals from inventories consisting of about 106 chemicals. Given the current state-of-the-art model performance and coverage of the various environmental end points, the constructed models with ADSAL{ρs, IA} may serve as benchmarks for future efforts to improve modeling efficacy.
Collapse
Affiliation(s)
- Haobo Wang
- Key Laboratory of Industrial Ecology and Environmental Engineering (Ministry of Education), Dalian Key Laboratory on Chemicals Risk Control and Pollution Prevention Technology, School of Environmental Science and Technology, Dalian University of Technology, Dalian 116024, China
| | - Wenjia Liu
- Key Laboratory of Industrial Ecology and Environmental Engineering (Ministry of Education), Dalian Key Laboratory on Chemicals Risk Control and Pollution Prevention Technology, School of Environmental Science and Technology, Dalian University of Technology, Dalian 116024, China
| | - Jingwen Chen
- Key Laboratory of Industrial Ecology and Environmental Engineering (Ministry of Education), Dalian Key Laboratory on Chemicals Risk Control and Pollution Prevention Technology, School of Environmental Science and Technology, Dalian University of Technology, Dalian 116024, China
| | - Zhongyu Wang
- Key Laboratory of Industrial Ecology and Environmental Engineering (Ministry of Education), Dalian Key Laboratory on Chemicals Risk Control and Pollution Prevention Technology, School of Environmental Science and Technology, Dalian University of Technology, Dalian 116024, China
| |
Collapse
|
24
|
Ghaemi Z, Asadollahi-Baboli M. Developing reliable classification of dual IDO1/TDO inhibitors using data fusion and majority voting. J Biomol Struct Dyn 2023:1-9. [PMID: 37921776 DOI: 10.1080/07391102.2023.2278079] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2023] [Accepted: 10/25/2023] [Indexed: 11/04/2023]
Abstract
Indoleamine 2,3-dioxygenase 1 (IDO1) and tryptophan 2,3-dioxygenase (TDO) are promising dual-targeting inhibitors in cancer and neurodegenerative diseases treatment. Data fusion of receptor-based and ligand-based information of dual IDO1/TDO inhibitors were employed for active/inactive classification performance. A reliable decision making procedure was used here to identify active/inactive dual IDO1/TDO inhibitors using majority voting method and pools of individual classifications instead of individual models. All classification models were validated using prediction set, cross-validation and y-scrambling tests. The classification outcomes indicate that the sensitivity, specificity, precision, accuracy, G-mean and F1 score values increases up to ∼90% using data fusion and majority voting method. Compare to individual classification models with a single prediction point, the majority voting method has more reliable results due to the integration of the pool of individual classification models. This classification strategy may lead to more reliable identification of active/inactive dual-targeting inhibitors in cancer immunotherapy.Communicated by Ramaswamy H. Sarma.
Collapse
Affiliation(s)
- Zahra Ghaemi
- Department of Chemistry, Faculty of Science, Babol Noshirvani University of Technology, Babol, Mazandaran, Iran
| | - M Asadollahi-Baboli
- Department of Chemistry, Faculty of Science, Babol Noshirvani University of Technology, Babol, Mazandaran, Iran
| |
Collapse
|
25
|
Mullowney MW, Duncan KR, Elsayed SS, Garg N, van der Hooft JJJ, Martin NI, Meijer D, Terlouw BR, Biermann F, Blin K, Durairaj J, Gorostiola González M, Helfrich EJN, Huber F, Leopold-Messer S, Rajan K, de Rond T, van Santen JA, Sorokina M, Balunas MJ, Beniddir MA, van Bergeijk DA, Carroll LM, Clark CM, Clevert DA, Dejong CA, Du C, Ferrinho S, Grisoni F, Hofstetter A, Jespers W, Kalinina OV, Kautsar SA, Kim H, Leao TF, Masschelein J, Rees ER, Reher R, Reker D, Schwaller P, Segler M, Skinnider MA, Walker AS, Willighagen EL, Zdrazil B, Ziemert N, Goss RJM, Guyomard P, Volkamer A, Gerwick WH, Kim HU, Müller R, van Wezel GP, van Westen GJP, Hirsch AKH, Linington RG, Robinson SL, Medema MH. Artificial intelligence for natural product drug discovery. Nat Rev Drug Discov 2023; 22:895-916. [PMID: 37697042 DOI: 10.1038/s41573-023-00774-7] [Citation(s) in RCA: 33] [Impact Index Per Article: 33.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/20/2023] [Indexed: 09/13/2023]
Abstract
Developments in computational omics technologies have provided new means to access the hidden diversity of natural products, unearthing new potential for drug discovery. In parallel, artificial intelligence approaches such as machine learning have led to exciting developments in the computational drug design field, facilitating biological activity prediction and de novo drug design for molecular targets of interest. Here, we describe current and future synergies between these developments to effectively identify drug candidates from the plethora of molecules produced by nature. We also discuss how to address key challenges in realizing the potential of these synergies, such as the need for high-quality datasets to train deep learning algorithms and appropriate strategies for algorithm validation.
Collapse
Affiliation(s)
| | - Katherine R Duncan
- Strathclyde Institute of Pharmacy and Biomedical Sciences, University of Strathclyde, Glasgow, UK
| | - Somayah S Elsayed
- Department of Molecular Biotechnology, Institute of Biology, Leiden University, Leiden, The Netherlands
| | - Neha Garg
- School of Chemistry and Biochemistry, Center for Microbial Dynamics and Infection, Georgia Institute of Technology, Atlanta, GA, USA
| | - Justin J J van der Hooft
- Bioinformatics Group, Wageningen University, Wageningen, The Netherlands
- Department of Biochemistry, University of Johannesburg, Johannesburg, South Africa
| | - Nathaniel I Martin
- Biological Chemistry Group, Institute of Biology, Leiden University, Leiden, The Netherlands
| | - David Meijer
- Bioinformatics Group, Wageningen University, Wageningen, The Netherlands
| | - Barbara R Terlouw
- Bioinformatics Group, Wageningen University, Wageningen, The Netherlands
| | - Friederike Biermann
- Bioinformatics Group, Wageningen University, Wageningen, The Netherlands
- Institute of Molecular Bio Science, Goethe-University Frankfurt, Frankfurt am Main, Germany
- LOEWE Center for Translational Biodiversity Genomics (TBG), Frankfurt am Main, Germany
| | - Kai Blin
- The Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, Kongens Lyngby, Denmark
| | | | - Marina Gorostiola González
- Drug Discovery and Safety, Leiden Academic Centre for Drug Research, Leiden, The Netherlands
- ONCODE institute, Leiden, The Netherlands
| | - Eric J N Helfrich
- Institute of Molecular Bio Science, Goethe-University Frankfurt, Frankfurt am Main, Germany
- LOEWE Center for Translational Biodiversity Genomics (TBG), Frankfurt am Main, Germany
| | - Florian Huber
- Center for Digitalization and Digitality, Hochschule Düsseldorf, Düsseldorf, Germany
| | - Stefan Leopold-Messer
- Institut für Mikrobiologie, Eidgenössische Technische Hochschule (ETH) Zürich, Zürich, Switzerland
| | - Kohulan Rajan
- Institute for Inorganic and Analytical Chemistry, Friedrich-Schiller-University Jena, Jena, Germany
| | - Tristan de Rond
- School of Chemical Sciences, University of Auckland, Auckland, New Zealand
| | - Jeffrey A van Santen
- Department of Chemistry, Simon Fraser University, Burnaby, British Columbia, Canada
| | - Maria Sorokina
- Institute for Inorganic and Analytical Chemistry, Friedrich-Schiller University, Jena, Germany
- Pharmaceuticals R&D, Bayer AG, Berlin, Germany
| | - Marcy J Balunas
- Department of Microbiology and Immunology, University of Michigan, Ann Arbor, MI, USA
- Department of Medicinal Chemistry, University of Michigan, Ann Arbor, MI, USA
| | - Mehdi A Beniddir
- Équipe "Chimie des Substances Naturelles", Université Paris-Saclay, CNRS, BioCIS, Orsay, France
| | - Doris A van Bergeijk
- Department of Molecular Biotechnology, Institute of Biology, Leiden University, Leiden, The Netherlands
| | - Laura M Carroll
- Structural and Computational Biology Unit, EMBL, Heidelberg, Germany
| | - Chase M Clark
- Division of Pharmaceutical Sciences, School of Pharmacy, University of Wisconsin-Madison, Madison, WI, USA
| | | | | | - Chao Du
- Department of Molecular Biotechnology, Institute of Biology, Leiden University, Leiden, The Netherlands
| | | | - Francesca Grisoni
- Institute for Complex Molecular Systems, Department of Biomedical Engineering, Eindhoven University of Technology, Eindhoven, The Netherlands
- Centre for Living Technologies, Alliance TU/e, WUR, UU, UMC Utrecht, Utrecht, The Netherlands
| | | | - Willem Jespers
- Drug Discovery and Safety, Leiden Academic Centre for Drug Research, Leiden, The Netherlands
| | - Olga V Kalinina
- Helmholtz Institute for Pharmaceutical Research Saarland (HIPS), Helmholtz Centre for Infection Research (HZI), Saarbrücken, Germany
- Drug Bioinformatics, Medical Faculty, Saarland University, Homburg, Germany
- Center for Bioinformatics, Saarland University, Saarbrücken, Germany
| | | | - Hyunwoo Kim
- College of Pharmacy and Integrated Research Institute for Drug Development, Dongguk University Seoul, Goyang-si, Republic of Korea
| | - Tiago F Leao
- Center for Nuclear Energy in Agriculture, University of São Paulo, Piracicaba, Brazil
| | - Joleen Masschelein
- Center for Microbiology, VIB-KU Leuven, Heverlee, Belgium
- Department of Biology, KU Leuven, Heverlee, Belgium
| | - Evan R Rees
- Division of Pharmaceutical Sciences, School of Pharmacy, University of Wisconsin-Madison, Madison, WI, USA
| | - Raphael Reher
- Institute of Pharmaceutical Biology and Biotechnology, University of Marburg, Marburg, Germany
- Institute of Pharmacy, Martin-Luther-University Halle-Wittenberg, Halle (Saale), Germany
| | - Daniel Reker
- Department of Biomedical Engineering, Duke University, Durham, NC, USA
- Duke Microbiome Center, Duke University, Durham, NC, USA
| | - Philippe Schwaller
- Laboratory of Artificial Chemical Intelligence, Institut des Sciences et Ingénierie Chimiques, Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
| | | | - Michael A Skinnider
- Adapsyn Bioscience, Hamilton, Ontario, Canada
- Michael Smith Laboratories, University of British Columbia, Vancouver, British Columbia, Canada
| | - Allison S Walker
- Department of Chemistry, Vanderbilt University, Nashville, TN, USA
- Department of Biological Sciences, Vanderbilt University, Nashville, TN, USA
| | - Egon L Willighagen
- Department of Bioinformatics - BiGCaT, NUTRIM, Maastricht University, Maastricht, The Netherlands
| | - Barbara Zdrazil
- European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Cambridgeshire, UK
| | - Nadine Ziemert
- Interfaculty Institute for Microbiology and Infection Medicine Tuebingen (IMIT), Institute for Bioinformatics and Medical Informatics (IBMI), University of Tuebingen, Tuebingen, Germany
| | | | - Pierre Guyomard
- Bonsai team, CRIStAL - Centre de Recherche en Informatique Signal et Automatique de Lille, Université de Lille, Villeneuve d'Ascq Cedex, France
| | - Andrea Volkamer
- Center for Bioinformatics, Saarland University, Saarbrücken, Germany
- In silico Toxicology and Structural Bioinformatics, Institute of Physiology, Charité - Universitätsmedizin Berlin, Berlin, Germany
| | - William H Gerwick
- Scripps Institution of Oceanography, University of California San Diego, La Jolla, CA, USA
| | - Hyun Uk Kim
- Department of Chemical and Biomolecular Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, Korea
| | - Rolf Müller
- Helmholtz Institute for Pharmaceutical Research Saarland (HIPS), Helmholtz Centre for Infection Research (HZI), Saarbrücken, Germany
- Department of Pharmacy, Saarland University, Saarbrücken, Germany
- German Center for infection research (DZIF), Braunschweig, Germany
- Helmholtz International Lab for Anti-Infectives, Saarbrücken, Germany
| | - Gilles P van Wezel
- Department of Molecular Biotechnology, Institute of Biology, Leiden University, Leiden, The Netherlands
- Netherlands Institute of Ecology, NIOO-KNAW, Wageningen, The Netherlands
| | - Gerard J P van Westen
- Drug Discovery and Safety, Leiden Academic Centre for Drug Research, Leiden, The Netherlands.
| | - Anna K H Hirsch
- Helmholtz Institute for Pharmaceutical Research Saarland (HIPS), Helmholtz Centre for Infection Research (HZI), Saarbrücken, Germany.
- Department of Pharmacy, Saarland University, Saarbrücken, Germany.
- German Center for infection research (DZIF), Braunschweig, Germany.
- Helmholtz International Lab for Anti-Infectives, Saarbrücken, Germany.
| | - Roger G Linington
- Department of Chemistry, Simon Fraser University, Burnaby, British Columbia, Canada.
| | - Serina L Robinson
- Department of Environmental Microbiology, Eawag: Swiss Federal Institute for Aquatic Science and Technology, Dübendorf, Switzerland.
| | - Marnix H Medema
- Bioinformatics Group, Wageningen University, Wageningen, The Netherlands.
- Institute of Biology, Leiden University, Leiden, The Netherlands.
| |
Collapse
|
26
|
Silva-Júnior EFD. "You've got the Body I've got the Brains" - Could the current AI-based tools replace the human ingenuity for designing new drug candidates? Bioorg Med Chem 2023; 94:117475. [PMID: 37741120 DOI: 10.1016/j.bmc.2023.117475] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2023] [Revised: 08/12/2023] [Accepted: 09/12/2023] [Indexed: 09/25/2023]
Abstract
The emergence of artificial intelligence (AI) tools has transformed the landscape of drug discovery, providing unprecedented speed, efficiency, and cost-effectiveness in the search for new therapeutics. From target identification to drug formulation and delivery, AI-driven algorithms have revolutionized various aspects of medicinal chemistry, significantly accelerating the drug design process. Despite the transformative power of AI, this perspective article emphasizes the limitations of AI tools in drug discovery, requiring inventive skills of medicinal chemists. However, the article highlighted that there is a need for a harmonious integration of AI-based tools and human expertise in drug discovery. Such a synergistic approach promises to lead to groundbreaking therapies that address unmet medical needs and benefit humankind. As the world evolves technologically, the question remains: When will AI tools effectively design and develop drugs? The answer may lie in the seamless collaboration between AI and human researchers, unlocking transformative therapies that combat diseases effectively.
Collapse
Affiliation(s)
- Edeildo Ferreira da Silva-Júnior
- Institute of Chemistry and Biotechnology, Federal University of Alagoas, Lourival Melo Mota Avenue, AC. Simões Campus, 57072-970 Alagoas, Maceió, Brazil
| |
Collapse
|
27
|
Khan M, Kandwal S, Fayne D. DataPype: A Fully Automated Unified Software Platform for Computer-Aided Drug Design. ACS OMEGA 2023; 8:39468-39480. [PMID: 37901539 PMCID: PMC10601415 DOI: 10.1021/acsomega.3c05207] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/20/2023] [Accepted: 09/26/2023] [Indexed: 10/31/2023]
Abstract
With the advent of computer-aided drug design (CADD), traditional physical testing of thousands of molecules has now been replaced by target-focused drug discovery, where potentially bioactive molecules are predicted by computer software before their physical synthesis. However, despite being a significant breakthrough, CADD still faces various limitations and challenges. The increasing availability of data on small molecules has created a need to streamline the sourcing of data from different databases and automate the processing and cleaning of data into a form that can be used by multiple CADD software applications. Several standalone software packages are available to aid the drug designer, each with its own specific application, requiring specialized knowledge and expertise for optimal use. These applications require their own input and output files, making it a challenge for nonexpert users or multidisciplinary discovery teams. Here, we have developed a new software platform called DataPype, which wraps around these different software packages. It provides a unified automated workflow to search for hit compounds using specialist software. Additionally, multiple virtual screening packages can be used in the one workflow, and if different ways of looking at potential hit compounds all predict the same set of molecules, we have higher confidence that we should make or purchase and test the molecules. Importantly, DataPype can run on computer servers, speeding up the virtual screening for new compounds. Combining access to multiple CADD tools within one interface will enhance the early stage of drug discovery, increase usability, and enable the use of parallel computing.
Collapse
Affiliation(s)
- Mohemmed
Faraz Khan
- Molecular
Design Group, School of Biochemistry and Immunology, Trinity Biomedical Sciences Institute, Trinity College Dublin, Dublin 2, Ireland
- Department
of Pharmaceutical Chemistry, Faculty of Pharmacy, Integral University, Lucknow U.P., 226026, India
| | - Shubhangi Kandwal
- Molecular
Design Group, School of Biochemistry and Immunology, Trinity Biomedical Sciences Institute, Trinity College Dublin, Dublin 2, Ireland
| | - Darren Fayne
- Molecular
Design Group, School of Biochemistry and Immunology, Trinity Biomedical Sciences Institute, Trinity College Dublin, Dublin 2, Ireland
| |
Collapse
|
28
|
Stanley M, Segler M. Fake it until you make it? Generative de novo design and virtual screening of synthesizable molecules. Curr Opin Struct Biol 2023; 82:102658. [PMID: 37473637 DOI: 10.1016/j.sbi.2023.102658] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2023] [Revised: 06/21/2023] [Accepted: 06/22/2023] [Indexed: 07/22/2023]
Abstract
Computational techniques, including virtual screening, de novo design, and generative models, play an increasing role in expediting DMTA cycles for modern molecular discovery. However, computationally proposed molecules must be synthetically feasible for laboratory testing. In this perspective, we offer a succinct introduction to the subject, and showcase typical workflows to integrate synthesis planning, synthesizability scoring, and molecule generation. Finally, we address limitations and opportunities for future research.
Collapse
Affiliation(s)
- Megan Stanley
- Microsoft Research AI4Science, UK. https://twitter.com/@megjanestanley
| | | |
Collapse
|
29
|
Bassani D, Brigo A, Andrews-Morger A. Federated Learning in Computational Toxicology: An Industrial Perspective on the Effiris Hackathon. Chem Res Toxicol 2023; 36:1503-1517. [PMID: 37584277 PMCID: PMC10523574 DOI: 10.1021/acs.chemrestox.3c00137] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2023] [Indexed: 08/17/2023]
Abstract
In silico approaches have acquired a towering role in pharmaceutical research and development, allowing laboratories all around the world to design, create, and optimize novel molecular entities with unprecedented efficiency. From a toxicological perspective, computational methods have guided the choices of medicinal chemists toward compounds displaying improved safety profiles. Even if the recent advances in the field are significant, many challenges remain active in the on-target and off-target prediction fields. Machine learning methods have shown their ability to identify molecules with safety concerns. However, they strongly depend on the abundance and diversity of data used for their training. Sharing such information among pharmaceutical companies remains extremely limited due to confidentiality reasons, but in this scenario, a recent concept named "federated learning" can help overcome such concerns. Within this framework, it is possible for companies to contribute to the training of common machine learning algorithms, using, but not sharing, their proprietary data. Very recently, Lhasa Limited organized a hackathon involving several industrial partners in order to assess the performance of their federated learning platform, called "Effiris". In this paper, we share our experience as Roche in participating in such an event, evaluating the performance of the federated algorithms and comparing them with those coming from our in-house-only machine learning models. Our aim is to highlight the advantages of federated learning and its intrinsic limitations and also suggest some points for potential improvements in the method.
Collapse
Affiliation(s)
- Davide Bassani
- Pharmaceutical Research &
Early Development, Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd., 4070 Basel, Switzerland
| | - Alessandro Brigo
- Pharmaceutical Research &
Early Development, Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd., 4070 Basel, Switzerland
| | - Andrea Andrews-Morger
- Pharmaceutical Research &
Early Development, Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd., 4070 Basel, Switzerland
| |
Collapse
|
30
|
Lamanna G, Delre P, Marcou G, Saviano M, Varnek A, Horvath D, Mangiatordi GF. GENERA: A Combined Genetic/Deep-Learning Algorithm for Multiobjective Target-Oriented De Novo Design. J Chem Inf Model 2023; 63:5107-5119. [PMID: 37556857 PMCID: PMC10466378 DOI: 10.1021/acs.jcim.3c00963] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2023] [Indexed: 08/11/2023]
Abstract
This study introduces a new de novo design algorithm called GENERA that combines the capabilities of a deep-learning algorithm for automated drug-like analogue design, called DeLA-Drug, with a genetic algorithm for generating molecules with desired target-oriented properties. Specifically, GENERA was applied to the angiotensin-converting enzyme 2 (ACE2) target, which is implicated in many pathological conditions, including COVID-19. The ability of GENERA to de novo design promising candidates for a specific target was assessed using two docking programs, PLANTS and GLIDE. A fitness function based on the Pareto dominance resulting from computed PLANTS and GLIDE scores was applied to demonstrate the algorithm's ability to perform multiobjective optimizations effectively. GENERA can quickly generate focused libraries that produce better scores compared to a starting set of known ACE-2 binders. This study is the first to utilize a DL-based algorithm designed for analogue generation as a mutational operator within a GA framework, representing an innovative approach to target-oriented de novo design.
Collapse
Affiliation(s)
- Giuseppe Lamanna
- Chemistry
Department, University of Bari “Aldo
Moro”, Via E.
Orabona, 4, I-70125 Bari, Italy
- CNR
− Institute of Crystallography, Via Amendola 122/o, 70126 Bari, Italy
| | - Pietro Delre
- CNR
− Institute of Crystallography, Via Amendola 122/o, 70126 Bari, Italy
| | - Gilles Marcou
- Laboratoire
de Chémoinformatique UMR7140, 4 rue Blaise Pascal, 67000 Strasbourg, France
| | - Michele Saviano
- CNR
− Institute of Crystallography, Via Vivaldi 43, 81100 Caserta, Italy
| | - Alexandre Varnek
- Laboratoire
de Chémoinformatique UMR7140, 4 rue Blaise Pascal, 67000 Strasbourg, France
| | - Dragos Horvath
- Laboratoire
de Chémoinformatique UMR7140, 4 rue Blaise Pascal, 67000 Strasbourg, France
| | | |
Collapse
|
31
|
Chenthamarakshan V, Hoffman SC, Owen CD, Lukacik P, Strain-Damerell C, Fearon D, Malla TR, Tumber A, Schofield CJ, Duyvesteyn HM, Dejnirattisai W, Carrique L, Walter TS, Screaton GR, Matviiuk T, Mojsilovic A, Crain J, Walsh MA, Stuart DI, Das P. Accelerating drug target inhibitor discovery with a deep generative foundation model. SCIENCE ADVANCES 2023; 9:eadg7865. [PMID: 37343087 PMCID: PMC10284550 DOI: 10.1126/sciadv.adg7865] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/20/2023] [Accepted: 05/17/2023] [Indexed: 06/23/2023]
Abstract
Inhibitor discovery for emerging drug-target proteins is challenging, especially when target structure or active molecules are unknown. Here, we experimentally validate the broad utility of a deep generative framework trained at-scale on protein sequences, small molecules, and their mutual interactions-unbiased toward any specific target. We performed a protein sequence-conditioned sampling on the generative foundation model to design small-molecule inhibitors for two dissimilar targets: the spike protein receptor-binding domain (RBD) and the main protease from SARS-CoV-2. Despite using only the target sequence information during the model inference, micromolar-level inhibition was observed in vitro for two candidates out of four synthesized for each target. The most potent spike RBD inhibitor exhibited activity against several variants in live virus neutralization assays. These results establish that a single, broadly deployable generative foundation model for accelerated inhibitor discovery is effective and efficient, even in the absence of target structure or binder information.
Collapse
Affiliation(s)
| | - Samuel C. Hoffman
- IBM Research, Thomas J. Watson Research Center, Yorktown Heights, New York, NY, USA
| | - C. David Owen
- Diamond Light Source Ltd., Harwell Science and Innovation Campus, OX11 0DE Didcot, UK
- Research Complex at Harwell, Harwell Science and Innovation Campus, OX11 0FA Didcot, UK
| | - Petra Lukacik
- Diamond Light Source Ltd., Harwell Science and Innovation Campus, OX11 0DE Didcot, UK
- Research Complex at Harwell, Harwell Science and Innovation Campus, OX11 0FA Didcot, UK
| | - Claire Strain-Damerell
- Diamond Light Source Ltd., Harwell Science and Innovation Campus, OX11 0DE Didcot, UK
- Research Complex at Harwell, Harwell Science and Innovation Campus, OX11 0FA Didcot, UK
| | - Daren Fearon
- Diamond Light Source Ltd., Harwell Science and Innovation Campus, OX11 0DE Didcot, UK
- Research Complex at Harwell, Harwell Science and Innovation Campus, OX11 0FA Didcot, UK
| | - Tika R. Malla
- Chemistry Research Laboratory, Department of Chemistry and the Ineos Oxford Institute for Antimicrobial Research, University of Oxford, 12 Mansfield Road, OX1 3TA Oxford, UK
| | - Anthony Tumber
- Chemistry Research Laboratory, Department of Chemistry and the Ineos Oxford Institute for Antimicrobial Research, University of Oxford, 12 Mansfield Road, OX1 3TA Oxford, UK
| | - Christopher J. Schofield
- Chemistry Research Laboratory, Department of Chemistry and the Ineos Oxford Institute for Antimicrobial Research, University of Oxford, 12 Mansfield Road, OX1 3TA Oxford, UK
| | - Helen M.E. Duyvesteyn
- Division of Structural Biology, University of Oxford, The Wellcome Centre for Human Genetics, Headington, Oxford, UK
| | - Wanwisa Dejnirattisai
- Wellcome Centre for Human Genetics, Nuffield Department of Medicine, University of Oxford, Oxford OX3 7BN, UK
| | - Loic Carrique
- Division of Structural Biology, University of Oxford, The Wellcome Centre for Human Genetics, Headington, Oxford, UK
| | - Thomas S. Walter
- Division of Structural Biology, University of Oxford, The Wellcome Centre for Human Genetics, Headington, Oxford, UK
| | - Gavin R. Screaton
- Wellcome Centre for Human Genetics, Nuffield Department of Medicine, University of Oxford, Oxford OX3 7BN, UK
| | | | | | - Jason Crain
- IBM Research Europe, Hartree Centre, Daresbury WA4 4AD, UK
- Department of Biochemistry, University of Oxford, Oxford OX1 3QU, UK
| | - Martin A. Walsh
- Diamond Light Source Ltd., Harwell Science and Innovation Campus, OX11 0DE Didcot, UK
- Research Complex at Harwell, Harwell Science and Innovation Campus, OX11 0FA Didcot, UK
| | - David I. Stuart
- Diamond Light Source Ltd., Harwell Science and Innovation Campus, OX11 0DE Didcot, UK
- Division of Structural Biology, University of Oxford, The Wellcome Centre for Human Genetics, Headington, Oxford, UK
| | - Payel Das
- IBM Research, Thomas J. Watson Research Center, Yorktown Heights, New York, NY, USA
| |
Collapse
|
32
|
Mareş C, Udrea AM, Şuţan NA, Avram S. Bioinformatics Tools for the Analysis of Active Compounds Identified in Ranunculaceae Species. Pharmaceuticals (Basel) 2023; 16:842. [PMID: 37375790 DOI: 10.3390/ph16060842] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2023] [Revised: 05/30/2023] [Accepted: 05/31/2023] [Indexed: 06/29/2023] Open
Abstract
The chemical compounds from extracts of three Ranunculaceae species, Aconitum toxicum Rchb., Anemone nemorosa L. and Helleborus odorus Waldst. & Kit. ex Willd., respectively, were isolated using the HPLC purification technique and analyzed from a bioinformatics point of view. The classes of compounds identified based on the proportion in the rhizomes/leaves/flowers used for microwave-assisted extraction and ultrasound-assisted extraction were alkaloids and phenols. Here, the quantifying of pharmacokinetics, pharmacogenomics and pharmacodynamics helps us to identify the actual biologically active compounds. Our results showed that (i) pharmacokinetically, the compounds show good absorption at the intestinal level and high permeability at the level of the central nervous system for alkaloids; (ii) regarding pharmacogenomics, alkaloids can influence tumor sensitivity and the effectiveness of some treatments; (iii) and pharmacodynamically, the compounds of these Ranunculaceae species bind to carbonic anhydrase and aldose reductase. The results obtained showed a high affinity of the compounds in the binding solution at the level of carbonic anhydrases. Carbonic anhydrase inhibitors extracted from natural sources can represent the path to new drugs useful both in the treatment of glaucoma, but also of some renal, neurological and even neoplastic diseases. The identification of natural compounds with the role of inhibitors can have a role in different types of pathologies, both associated with studied and known receptors such as carbonic anhydrase and aldose reductase, as well as new pathologies not yet addressed.
Collapse
Affiliation(s)
- Cătălina Mareş
- Department of Anatomy, Animal Physiology and Biophysics, University of Bucharest, 91-95 Splaiul Independentei, 050095 Bucharest, Romania
| | - Ana-Maria Udrea
- Laser Department, National Institute for Laser, Plasma and Radiation Physics, Atomistilor 409, 077125 Magurele, Romania
- Research Institute of the University of Bucharest-ICUB, University of Bucharest, 91-95 Splaiul Independentei, 050095 Bucharest, Romania
| | - Nicoleta Anca Şuţan
- Department of Natural Sciences, University of Piteşti, 1 Targul din Vale Str., 110040 Pitesti, Romania
| | - Speranţa Avram
- Department of Anatomy, Animal Physiology and Biophysics, University of Bucharest, 91-95 Splaiul Independentei, 050095 Bucharest, Romania
| |
Collapse
|
33
|
Janet JP, Mervin L, Engkvist O. Artificial intelligence in molecular de novo design: Integration with experiment. Curr Opin Struct Biol 2023; 80:102575. [PMID: 36966692 DOI: 10.1016/j.sbi.2023.102575] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2022] [Revised: 02/09/2023] [Accepted: 02/18/2023] [Indexed: 06/04/2023]
Abstract
In this mini review, we capture the latest progress of applying artificial intelligence (AI) techniques based on deep learning architectures to molecular de novo design with a focus on integration with experimental validation. We will cover the progress and experimental validation of novel generative algorithms, the validation of QSAR models and how AI-based molecular de novo design is starting to become connected with chemistry automation. While progress has been made in the last few years, it is still early days. The experimental validations conducted thus far should be considered proof-of-principle, providing confidence that the field is moving in the right direction.
Collapse
Affiliation(s)
- Jon Paul Janet
- Molecular AI, Discovery Sciences, R&D, AstraZeneca, Gothenburg, Sweden
| | - Lewis Mervin
- Molecular AI, Discovery Sciences, R&D, AstraZeneca, Cambridge, UK.
| | - Ola Engkvist
- Molecular AI, Discovery Sciences, R&D, AstraZeneca, Gothenburg, Sweden
| |
Collapse
|
34
|
Ballarotto M, Willems S, Stiller T, Nawa F, Marschner JA, Grisoni F, Merk D. De Novo Design of Nurr1 Agonists via Fragment-Augmented Generative Deep Learning in Low-Data Regime. J Med Chem 2023. [PMID: 37256819 DOI: 10.1021/acs.jmedchem.3c00485] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/02/2023]
Abstract
Generative neural networks trained on SMILES can design innovative bioactive molecules de novo. These so-called chemical language models (CLMs) have typically been trained on tens of template molecules for fine-tuning. However, it is challenging to apply CLM to orphan targets with few known ligands. We have fine-tuned a CLM with a single potent Nurr1 agonist as template in a fragment-augmented fashion and obtained novel Nurr1 agonists using sampling frequency for design prioritization. Nanomolar potency and binding affinity of the top-ranking design and its structural novelty compared to available Nurr1 ligands highlight its value as an early chemical tool and as a lead for Nurr1 agonist development, as well as the applicability of CLM in very low-data scenarios.
Collapse
Affiliation(s)
- Marco Ballarotto
- Department of Pharmacy, Ludwig-Maximilians-Universität (LMU) München, 81377 Munich, Germany
- Department of Pharmaceutical Sciences, Università degli Studi di Perugia, 06123 Perugia, Italy
| | - Sabine Willems
- Department of Pharmacy, Ludwig-Maximilians-Universität (LMU) München, 81377 Munich, Germany
| | - Tanja Stiller
- Department of Pharmacy, Ludwig-Maximilians-Universität (LMU) München, 81377 Munich, Germany
| | - Felix Nawa
- Department of Pharmacy, Ludwig-Maximilians-Universität (LMU) München, 81377 Munich, Germany
| | - Julian A Marschner
- Department of Pharmacy, Ludwig-Maximilians-Universität (LMU) München, 81377 Munich, Germany
| | - Francesca Grisoni
- Institute for Complex Molecular Systems, Department of Biomedical Engineering, Eindhoven University of Technology, 5612AZ Eindhoven, The Netherlands
- Centre for Living Technologies, Alliance TU/e, WUR, UU, UMC Utrecht, 3584CB Utrecht, The Netherlands
| | - Daniel Merk
- Department of Pharmacy, Ludwig-Maximilians-Universität (LMU) München, 81377 Munich, Germany
| |
Collapse
|
35
|
Isert C, Atz K, Schneider G. Structure-based drug design with geometric deep learning. Curr Opin Struct Biol 2023; 79:102548. [PMID: 36842415 DOI: 10.1016/j.sbi.2023.102548] [Citation(s) in RCA: 31] [Impact Index Per Article: 31.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2022] [Revised: 01/16/2023] [Accepted: 01/24/2023] [Indexed: 02/26/2023]
Abstract
Structure-based drug design uses three-dimensional geometric information of macromolecules, such as proteins or nucleic acids, to identify suitable ligands. Geometric deep learning, an emerging concept of neural-network-based machine learning, has been applied to macromolecular structures. This review provides an overview of the recent applications of geometric deep learning in bioorganic and medicinal chemistry, highlighting its potential for structure-based drug discovery and design. Emphasis is placed on molecular property prediction, ligand binding site and pose prediction, and structure-based de novo molecular design. The current challenges and opportunities are highlighted, and a forecast of the future of geometric deep learning for drug discovery is presented.
Collapse
Affiliation(s)
- Clemens Isert
- ETH Zurich, Department of Chemistry and Applied Biosciences, Vladimir-Prelog-Weg 4, Zurich, 8093, Switzerland
| | - Kenneth Atz
- ETH Zurich, Department of Chemistry and Applied Biosciences, Vladimir-Prelog-Weg 4, Zurich, 8093, Switzerland
| | - Gisbert Schneider
- ETH Zurich, Department of Chemistry and Applied Biosciences, Vladimir-Prelog-Weg 4, Zurich, 8093, Switzerland; ETH Singapore SEC Ltd, 1 CREATE Way, #06-01 CREATE Tower, Singapore, 8093, Singapore.
| |
Collapse
|
36
|
Tysinger EP, Rai BK, Sinitskiy AV. Can We Quickly Learn to "Translate" Bioactive Molecules with Transformer Models? J Chem Inf Model 2023; 63:1734-1744. [PMID: 36914216 DOI: 10.1021/acs.jcim.2c01618] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/16/2023]
Abstract
Meaningful exploration of the chemical space of druglike molecules in drug design is a highly challenging task due to a combinatorial explosion of possible modifications of molecules. In this work, we address this problem with transformer models, a type of machine learning (ML) model originally developed for machine translation. By training transformer models on pairs of similar bioactive molecules from the public ChEMBL data set, we enable them to learn medicinal-chemistry-meaningful, context-dependent transformations of molecules, including those absent from the training set. By retrospective analysis on the performance of transformer models on ChEMBL subsets of ligands binding to COX2, DRD2, or HERG protein targets, we demonstrate that the models can generate structures identical or highly similar to most active ligands, despite the models having not seen any ligands active against the corresponding protein target during training. Our work demonstrates that human experts working on hit expansion in drug design can easily and quickly employ transformer models, originally developed to translate texts from one natural language to another, to "translate" from known molecules active against a given protein target to novel molecules active against the same target.
Collapse
Affiliation(s)
- Emma P Tysinger
- Machine Learning and Computational Sciences, Pfizer Worldwide Research, Development, and Medical, 610 Main Street, Cambridge, Massachusetts 02139, United States
| | - Brajesh K Rai
- Machine Learning and Computational Sciences, Pfizer Worldwide Research, Development, and Medical, 610 Main Street, Cambridge, Massachusetts 02139, United States
| | - Anton V Sinitskiy
- Machine Learning and Computational Sciences, Pfizer Worldwide Research, Development, and Medical, 610 Main Street, Cambridge, Massachusetts 02139, United States
| |
Collapse
|