1
|
Gangwal A, Lavecchia A. Artificial Intelligence in Natural Product Drug Discovery: Current Applications and Future Perspectives. J Med Chem 2025; 68:3948-3969. [PMID: 39916476 DOI: 10.1021/acs.jmedchem.4c01257] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/28/2025]
Abstract
Drug discovery, a multifaceted process from compound identification to regulatory approval, historically plagued by inefficiencies and time lags due to limited data utilization, now faces urgent demands for accelerated lead compound identification. Innovations in biological data and computational chemistry have spurred a shift from trial-and-error methods to holistic approaches to medicinal chemistry. Computational techniques, particularly artificial intelligence (AI), notably machine learning (ML) and deep learning (DL), have revolutionized drug development, enhancing data analysis and predictive modeling. Natural products (NPs) have long served as rich sources of biologically active compounds, with many successful drugs originating from them. Advances in information science expanded NP-related databases, enabling deeper exploration with AI. Integrating AI into NP drug discovery promises accelerated discoveries, leveraging AI's analytical prowess, including generative AI for data synthesis. This perspective illuminates AI's current landscape in NP drug discovery, addressing strengths, limitations, and future trajectories to advance this vital research domain.
Collapse
Affiliation(s)
- Amit Gangwal
- Department of Natural Product Chemistry, Shri Vile Parle Kelavani Mandal's Institute of Pharmacy, Dhule, 424001 Maharashtra, India
| | - Antonio Lavecchia
- "Drug Discovery" Laboratory, Department of Pharmacy, University of Naples Federico II, I-80131 Naples, Italy
| |
Collapse
|
2
|
Yoo J, Jang W, Shin WH. From part to whole: AI-driven progress in fragment-based drug discovery. Curr Opin Struct Biol 2025; 91:102995. [PMID: 39970579 DOI: 10.1016/j.sbi.2025.102995] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2024] [Revised: 12/23/2024] [Accepted: 01/14/2025] [Indexed: 02/21/2025]
Abstract
Fragment-based drug discovery is a technique that finds potent binding fragments to the binding hotspots and makes them a hit compound. The combination of fragments allows us to explore the large chemical space. Thus, it becomes an effective methodology for identifying lead compounds. Three concepts have been introduced to make the fragments into the compound: growing, merging, and linking. Recently, growing and merging techniques using AI have significantly improved the accuracy and efficiency of molecular design. In this review, recent techniques such as VAE, reinforcement learning, and SE(3)-equivariant models will be discussed. These methods enable precise molecular structure exploration and optimization. Additionally, we address techniques utilizing diffusion models, language models, and deep evolutionary learning. We also introduce linker optimization methods using reinforcement learning and deep learning-based models. This progress of fragment-based drug discovery methods with AI opens the possibility of discovering the vast chemical space with high efficiency.
Collapse
Affiliation(s)
- Jinhyeok Yoo
- Department of Biomedical Informatics, Korea University College of Medicine, Seoul, 02708, Republic of Korea
| | - Wonkyeong Jang
- Department of Biomedical Informatics, Korea University College of Medicine, Seoul, 02708, Republic of Korea
| | - Woong-Hee Shin
- Department of Biomedical Informatics, Korea University College of Medicine, Seoul, 02708, Republic of Korea.
| |
Collapse
|
3
|
Zhang X, Gao H, Qi Y, Li Y, Wang R. Generation of Rational Drug-like Molecular Structures Through a Multiple-Objective Reinforcement Learning Framework. Molecules 2024; 30:18. [PMID: 39795076 PMCID: PMC11721775 DOI: 10.3390/molecules30010018] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2024] [Revised: 12/04/2024] [Accepted: 12/15/2024] [Indexed: 01/13/2025] Open
Abstract
As an appealing approach for discovering novel leads, the key advantage of de novo drug design lies in its ability to explore a much broader dimension of chemical space, without being confined to the knowledge of existing compounds. So far, many generative models have been described in the literature, which have completely redefined the concept of de novo drug design. However, many of them lack practical value for real-world drug discovery. In this work, we have developed a graph-based generative model within a reinforcement learning framework, namely, METEOR (Molecular Exploration Through multiplE-Objective Reinforcement). The backend agent of METEOR is based on the well-established GCPN model. To ensure the overall quality of the generated molecular graphs, we implemented a set of rules to identify and exclude undesired substructures. Importantly, METEOR is designed to conduct multi-objective optimization, i.e., simultaneously optimizing binding affinity, drug-likeness, and synthetic accessibility of the generated molecules under the guidance of a special reward function. We demonstrate in a specific test case that without prior knowledge of true binders to the chosen target protein, METEOR generated molecules with superior properties compared to those in the ZINC 250k data set. In conclusion, we have demonstrated the potential of METEOR as a practical tool for generating rational drug-like molecules in the early phase of drug discovery.
Collapse
Affiliation(s)
| | | | | | - Yan Li
- Department of Medicinal Chemistry, School of Pharmacy, Fudan University, 826 Zhangheng Road, Shanghai 201203, China
| | - Renxiao Wang
- Department of Medicinal Chemistry, School of Pharmacy, Fudan University, 826 Zhangheng Road, Shanghai 201203, China
| |
Collapse
|
4
|
Li B, Tan K, Lao AR, Wang H, Zheng H, Zhang L. A comprehensive review of artificial intelligence for pharmacology research. Front Genet 2024; 15:1450529. [PMID: 39290983 PMCID: PMC11405247 DOI: 10.3389/fgene.2024.1450529] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2024] [Accepted: 08/26/2024] [Indexed: 09/19/2024] Open
Abstract
With the innovation and advancement of artificial intelligence, more and more artificial intelligence techniques are employed in drug research, biomedical frontier research, and clinical medicine practice, especially, in the field of pharmacology research. Thus, this review focuses on the applications of artificial intelligence in drug discovery, compound pharmacokinetic prediction, and clinical pharmacology. We briefly introduced the basic knowledge and development of artificial intelligence, presented a comprehensive review, and then summarized the latest studies and discussed the strengths and limitations of artificial intelligence models. Additionally, we highlighted several important studies and pointed out possible research directions.
Collapse
Affiliation(s)
- Bing Li
- College of Computer Science, Sichuan University, Chengdu, China
| | - Kan Tan
- College of Computer Science, Sichuan University, Chengdu, China
| | - Angelyn R Lao
- Department of Mathematics and Statistics, De La Salle University, Manila, Philippines
| | - Haiying Wang
- School of Computing, Ulster University, Belfast, United Kingdom
| | - Huiru Zheng
- School of Computing, Ulster University, Belfast, United Kingdom
| | - Le Zhang
- College of Computer Science, Sichuan University, Chengdu, China
| |
Collapse
|
5
|
Menke J, Nahal Y, Bjerrum EJ, Kabeshov M, Kaski S, Engkvist O. Metis: a python-based user interface to collect expert feedback for generative chemistry models. J Cheminform 2024; 16:100. [PMID: 39143631 PMCID: PMC11323385 DOI: 10.1186/s13321-024-00892-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2024] [Accepted: 08/02/2024] [Indexed: 08/16/2024] Open
Abstract
One challenge that current de novo drug design models face is a disparity between the user's expectations and the actual output of the model in practical applications. Tailoring models to better align with chemists' implicit knowledge, expectation and preferences is key to overcoming this obstacle effectively. While interest in preference-based and human-in-the-loop machine learning in chemistry is continuously increasing, no tool currently exists that enables the collection of standardized and chemistry-specific feedback. Metis is a Python-based open-source graphical user interface (GUI), designed to solve this and enable the collection of chemists' detailed feedback on molecular structures. The GUI enables chemists to explore and evaluate molecules, offering a user-friendly interface for annotating preferences and specifying desired or undesired structural features. By providing chemists the opportunity to give detailed feedback, allows researchers to capture more efficiently the chemist's implicit knowledge and preferences. This knowledge is crucial to align the chemist's idea with the de novo design agents. The GUI aims to enhance this collaboration between the human and the "machine" by providing an intuitive platform where chemists can interactively provide feedback on molecular structures, aiding in preference learning and refining de novo design strategies. Metis integrates with the existing de novo framework REINVENT, creating a closed-loop system where human expertise can continuously inform and refine the generative models.Scientific contributionWe introduce a novel Graphical User Interface, that allows chemists/researchers to give detailed feedback on substructures and properties of small molecules. This tool can be used to learn the preferences of chemists in order to align de novo drug design models with the chemist's ideas. The GUI can be customized to fit different needs and projects and enables direct integration into de novo REINVENT runs. We believe that Metis can facilitate the discussion and development of novel ways to integrate human feedback that goes beyond binary decisions of liking or disliking a molecule.
Collapse
Affiliation(s)
- Janosch Menke
- Department of Computer Science and Engineering, Chalmers University of Technology, Gothenburg, 41296, Sweden.
| | - Yasmine Nahal
- Department of Computer Science, Aalto University, Espoo, 02150, Finland
| | | | - Mikhail Kabeshov
- Molecular AI, Discovery Sciences AstraZeneca R &D, Mölndal, 43183, Sweden
| | - Samuel Kaski
- Department of Computer Science, Aalto University, Espoo, 02150, Finland
- Department of Computer Science, University of Manchester, Manchester, M13 9PL, UK
| | - Ola Engkvist
- Department of Computer Science and Engineering, Chalmers University of Technology, Gothenburg, 41296, Sweden
- Molecular AI, Discovery Sciences AstraZeneca R &D, Mölndal, 43183, Sweden
| |
Collapse
|
6
|
Johnson H, Gusev F, Dull JT, Seo Y, Priestley RD, Isayev O, Rand BP. Discovery of Crystallizable Organic Semiconductors with Machine Learning. J Am Chem Soc 2024; 146:21583-21590. [PMID: 39051486 PMCID: PMC11311223 DOI: 10.1021/jacs.4c05245] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2024] [Revised: 07/15/2024] [Accepted: 07/16/2024] [Indexed: 07/27/2024]
Abstract
Crystalline organic semiconductors are known to have improved charge carrier mobility and exciton diffusion length in comparison to their amorphous counterparts. Certain organic molecular thin films can be transitioned from initially prepared amorphous layers to large-scale crystalline films via abrupt thermal annealing. Ideally, these films crystallize as platelets with long-range-ordered domains on the scale of tens to hundreds of microns. However, other organic molecular thin films may instead crystallize as spherulites or resist crystallization entirely. Organic molecules that have the capability of transforming into a platelet morphology feature both high melting point (Tm) and crystallization driving force (ΔGc). In this work, we employed machine learning (ML) to identify candidate organic materials with the potential to crystallize into platelets by estimating the aforementioned thermal properties. Six organic molecules identified by the ML algorithm were experimentally evaluated; three crystallized as platelets, one crystallized as a spherulite, and two resisted thin film crystallization. These results demonstrate a successful application of ML in the scope of predicting thermal properties of organic molecules and reinforce the principles of Tm and ΔGc as metrics that aid in predicting the crystallization behavior of organic thin films.
Collapse
Affiliation(s)
- Holly
M. Johnson
- Department
of Electrical and Computer Engineering, Princeton University, Princeton, New Jersey 08544, United States
| | - Filipp Gusev
- Computational
Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213, United States
- Department
of Chemistry, Mellon College of Science, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213, United States
| | - Jordan T. Dull
- Department
of Electrical and Computer Engineering, Princeton University, Princeton, New Jersey 08544, United States
| | - Yejoon Seo
- Department
of Chemical and Biological Engineering, Princeton University, Princeton, New Jersey 08544, United States
| | - Rodney D. Priestley
- Department
of Chemical and Biological Engineering, Princeton University, Princeton, New Jersey 08544, United States
| | - Olexandr Isayev
- Computational
Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213, United States
- Department
of Chemistry, Mellon College of Science, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213, United States
| | - Barry P. Rand
- Department
of Electrical and Computer Engineering, Princeton University, Princeton, New Jersey 08544, United States
- Andlinger
Center for Energy and the Environment, Princeton
University, Princeton, New Jersey 08544, United States
| |
Collapse
|
7
|
Catacutan DB, Alexander J, Arnold A, Stokes JM. Machine learning in preclinical drug discovery. Nat Chem Biol 2024:10.1038/s41589-024-01679-1. [PMID: 39030362 DOI: 10.1038/s41589-024-01679-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2023] [Accepted: 06/13/2024] [Indexed: 07/21/2024]
Abstract
Drug-discovery and drug-development endeavors are laborious, costly and time consuming. These programs can take upward of 12 years and cost US $2.5 billion, with a failure rate of more than 90%. Machine learning (ML) presents an opportunity to improve the drug-discovery process. Indeed, with the growing abundance of public and private large-scale biological and chemical datasets, ML techniques are becoming well positioned as useful tools that can augment the traditional drug-development process. In this Perspective, we discuss the integration of algorithmic methods throughout the preclinical phases of drug discovery. Specifically, we highlight an array of ML-based efforts, across diverse disease areas, to accelerate initial hit discovery, mechanism-of-action (MOA) elucidation and chemical property optimization. With advances in the application of ML across diverse therapeutic areas, we posit that fully ML-integrated drug-discovery pipelines will define the future of drug-development programs.
Collapse
Affiliation(s)
- Denise B Catacutan
- Department of Biochemistry and Biomedical Sciences, McMaster University, Hamilton, Ontario, Canada
- Michael G. DeGroote Institute for Infectious Disease Research, McMaster University, Hamilton, Ontario, Canada
- David Braley Centre for Antibiotic Discovery, McMaster University, Hamilton, Ontario, Canada
| | - Jeremie Alexander
- Department of Biochemistry and Biomedical Sciences, McMaster University, Hamilton, Ontario, Canada
- Michael G. DeGroote Institute for Infectious Disease Research, McMaster University, Hamilton, Ontario, Canada
- David Braley Centre for Antibiotic Discovery, McMaster University, Hamilton, Ontario, Canada
| | - Autumn Arnold
- Department of Biochemistry and Biomedical Sciences, McMaster University, Hamilton, Ontario, Canada
- Michael G. DeGroote Institute for Infectious Disease Research, McMaster University, Hamilton, Ontario, Canada
- David Braley Centre for Antibiotic Discovery, McMaster University, Hamilton, Ontario, Canada
| | - Jonathan M Stokes
- Department of Biochemistry and Biomedical Sciences, McMaster University, Hamilton, Ontario, Canada.
- Michael G. DeGroote Institute for Infectious Disease Research, McMaster University, Hamilton, Ontario, Canada.
- David Braley Centre for Antibiotic Discovery, McMaster University, Hamilton, Ontario, Canada.
| |
Collapse
|
8
|
Thomas M, Ahmad M, Tresadern G, de Fabritiis G. PromptSMILES: prompting for scaffold decoration and fragment linking in chemical language models. J Cheminform 2024; 16:77. [PMID: 38965600 PMCID: PMC11225391 DOI: 10.1186/s13321-024-00866-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2024] [Accepted: 06/04/2024] [Indexed: 07/06/2024] Open
Abstract
SMILES-based generative models are amongst the most robust and successful recent methods used to augment drug design. They are typically used for complete de novo generation, however, scaffold decoration and fragment linking applications are sometimes desirable which requires a different grammar, architecture, training dataset and therefore, re-training of a new model. In this work, we describe a simple procedure to conduct constrained molecule generation with a SMILES-based generative model to extend applicability to scaffold decoration and fragment linking by providing SMILES prompts, without the need for re-training. In combination with reinforcement learning, we show that pre-trained, decoder-only models adapt to these applications quickly and can further optimize molecule generation towards a specified objective. We compare the performance of this approach to a variety of orthogonal approaches and show that performance is comparable or better. For convenience, we provide an easy-to-use python package to facilitate model sampling which can be found on GitHub and the Python Package Index.Scientific contributionThis novel method extends an autoregressive chemical language model to scaffold decoration and fragment linking scenarios. This doesn't require re-training, the use of a bespoke grammar, or curation of a custom dataset, as commonly required by other approaches.
Collapse
Affiliation(s)
- Morgan Thomas
- Computational Science Laboratory, Universitat Pompeu Fabra, Barcelona Biomedical Research Park (PRBB), C Dr. Aguiader 88, 08003, Barcelona, Spain.
| | - Mazen Ahmad
- In Silico Discovery, Janssen Pharmaceutica N. V., Turnhoutseweg 30, 2340, Beerse, Belgium
| | - Gary Tresadern
- In Silico Discovery, Janssen Pharmaceutica N. V., Turnhoutseweg 30, 2340, Beerse, Belgium
| | - Gianni de Fabritiis
- Computational Science Laboratory, Universitat Pompeu Fabra, Barcelona Biomedical Research Park (PRBB), C Dr. Aguiader 88, 08003, Barcelona, Spain.
- Acellera Labs, C Dr. Trueta 183, 08005, Barcelona, Spain.
- Institució Catalana de Recerca i Estudis Avançats (ICREA), Passeig Lluis Companys 23, 08010, Barcelona, Spain.
| |
Collapse
|
9
|
Guo J, Schwaller P. Augmented Memory: Sample-Efficient Generative Molecular Design with Reinforcement Learning. JACS AU 2024; 4:2160-2172. [PMID: 38938817 PMCID: PMC11200228 DOI: 10.1021/jacsau.4c00066] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/23/2024] [Revised: 03/29/2024] [Accepted: 04/01/2024] [Indexed: 06/29/2024]
Abstract
Sample efficiency is a fundamental challenge in de novo molecular design. Ideally, molecular generative models should learn to satisfy a desired objective under minimal calls to oracles (computational property predictors). This problem becomes more apparent when using oracles that can provide increased predictive accuracy but impose significant computational cost. Consequently, designing molecules that are optimized for such oracles cannot be achieved under a practical computational budget. Molecular generative models based on simplified molecular-input line-entry system (SMILES) have shown remarkable sample efficiency when coupled with reinforcement learning, as demonstrated in the practical molecular optimization (PMO) benchmark. Here, we first show that experience replay drastically improves the performance of multiple previously proposed algorithms. Next, we propose a novel algorithm called Augmented Memory that combines data augmentation with experience replay. We show that scores obtained from oracle calls can be reused to update the model multiple times. We compare Augmented Memory to previously proposed algorithms and show significantly enhanced sample efficiency in an exploitation task, a drug discovery case study requiring both exploration and exploitation, and a materials design case study optimizing explicitly for quantum-mechanical properties. Our method achieves a new state-of-the-art in sample-efficient de novo molecular design, outperforming all of the previously reported methods. The code is available at https://github.com/schwallergroup/augmented_memory.
Collapse
Affiliation(s)
- Jeff Guo
- Laboratory
of Artificial Chemical Intelligence (LIAC), Institut des Sciences
et Ingénierie Chimiques, Ecole Polytechnique
Fédérale de Lausanne (EPFL), Lausanne 1015, Switzerland
- National
Centre of Competence in Research (NCCR) Catalysis, Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne 1015, Switzerland
| | - Philippe Schwaller
- Laboratory
of Artificial Chemical Intelligence (LIAC), Institut des Sciences
et Ingénierie Chimiques, Ecole Polytechnique
Fédérale de Lausanne (EPFL), Lausanne 1015, Switzerland
- National
Centre of Competence in Research (NCCR) Catalysis, Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne 1015, Switzerland
| |
Collapse
|
10
|
Hazemann J, Kimmerlin T, Lange R, Mac Sweeney A, Bourquin G, Ritz D, Czodrowski P. Identification of SARS-CoV-2 Mpro inhibitors through deep reinforcement learning for de novo drug design and computational chemistry approaches. RSC Med Chem 2024; 15:2146-2159. [PMID: 38911172 PMCID: PMC11187573 DOI: 10.1039/d4md00106k] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2024] [Accepted: 04/20/2024] [Indexed: 06/25/2024] Open
Abstract
Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has caused a global pandemic of coronavirus disease (COVID-19) since its emergence in December 2019. As of January 2024, there has been over 774 million reported cases and 7 million deaths worldwide. While vaccination efforts have been successful in reducing the severity of the disease and decreasing the transmission rate, the development of effective therapeutics against SARS-CoV-2 remains a critical need. The main protease (Mpro) of SARS-CoV-2 is an essential enzyme required for viral replication and has been identified as a promising target for drug development. In this study, we report the identification of novel Mpro inhibitors, using a combination of deep reinforcement learning for de novo drug design with 3D pharmacophore/shape-based alignment and privileged fragment match count scoring components followed by hit expansions and molecular docking approaches. Our experimentally validated results show that 3 novel series exhibit potent inhibitory activity against SARS-CoV-2 Mpro, with IC50 values ranging from 1.3 μM to 2.3 μM and a high degree of selectivity. These findings represent promising starting points for the development of new antiviral therapies against COVID-19.
Collapse
Affiliation(s)
- Julien Hazemann
- Physical Chemistry, Chemistry Department, Johannes Gutenberg University Duesbergweg 10-14 55128 Mainz Germany
- Drug Discovery Chemistry, Idorsia Pharmaceuticals Ltd. Hegenheimermattweg 91 4123 Allschwil Switzerland
| | - Thierry Kimmerlin
- Drug Discovery Chemistry, Idorsia Pharmaceuticals Ltd. Hegenheimermattweg 91 4123 Allschwil Switzerland
| | - Roland Lange
- Drug Discovery Chemistry, Idorsia Pharmaceuticals Ltd. Hegenheimermattweg 91 4123 Allschwil Switzerland
| | - Aengus Mac Sweeney
- Drug Discovery Chemistry, Idorsia Pharmaceuticals Ltd. Hegenheimermattweg 91 4123 Allschwil Switzerland
| | - Geoffroy Bourquin
- Drug Discovery Chemistry, Idorsia Pharmaceuticals Ltd. Hegenheimermattweg 91 4123 Allschwil Switzerland
| | - Daniel Ritz
- Drug Discovery Chemistry, Idorsia Pharmaceuticals Ltd. Hegenheimermattweg 91 4123 Allschwil Switzerland
| | - Paul Czodrowski
- Physical Chemistry, Chemistry Department, Johannes Gutenberg University Duesbergweg 10-14 55128 Mainz Germany
| |
Collapse
|
11
|
Lourenço MP, Hostaš J, Bellinger C, Tchagang A, Salahub DR. Reinforcement learning for in silico determination of adsorbate-substrate structures. J Comput Chem 2024; 45:1289-1302. [PMID: 38357973 DOI: 10.1002/jcc.27322] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2023] [Revised: 01/18/2024] [Accepted: 01/22/2024] [Indexed: 02/16/2024]
Abstract
Reinforcement learning (RL) methods have helped to define the state of the art in the field of modern artificial intelligence, mostly after the breakthrough involving AlphaGo and the discovery of novel algorithms. In this work, we present a RL method, based on Q-learning, for the structural determination of adsorbate@substrate models in silico, where the minimization of the energy landscape resulting from adsorbate interactions with a substrate is made by actions on states (translations and rotations) chosen from an agent's policy. The proposed RL method is implemented in an early version of the reinforcement learning software for materials design and discovery (RLMaterial), developed in Python3.x. RLMaterial interfaces with deMon2k, DFTB+, ORCA, and Quantum Espresso codes to compute the adsorbate@substrate energies. The RL method was applied for the structural determination of (i) the amino acid glycine and (ii) 2-amino-acetaldehyde, both interacting with a boron nitride (BN) monolayer, (iii) host-guest interactions between phenylboronic acid and β-cyclodextrin and (iv) ammonia on naphthalene. Density functional tight binding calculations were used to build the complex search surfaces with a reasonably low computational cost for systems (i)-(iii) and DFT for system (iv). Artificial neural network and gradient boosting regression techniques were employed to approximate the Q-matrix or Q-table for better decision making (policy) on next actions. Finally, we have developed a transfer-learning protocol within the RL framework that allows learning from one chemical system and transferring the experience to another, as well as from different DFT or DFTB levels.
Collapse
Affiliation(s)
- Maicon Pierre Lourenço
- Departamento de Química e Física-Centro de Ciências Exatas, Naturais e da Saúde-CCENS-Universidade Federal do Espírito Santo, Alegre, Brasil
| | - Jiří Hostaš
- Department of Chemistry, Department of Physics and Astronomy, CMS Centre for Molecular Simulation, IQST Institute for Quantum Science and Technology, Quantum Alberta, University of Calgary, Calgary, Alberta, Canada
- Digital Technologies Research Centre, National Research Council of Canada, Ottawa, Ontario, Canada
| | - Colin Bellinger
- Digital Technologies Research Centre, National Research Council of Canada, Ottawa, Ontario, Canada
| | - Alain Tchagang
- Digital Technologies Research Centre, National Research Council of Canada, Ottawa, Ontario, Canada
| | - Dennis R Salahub
- Department of Chemistry, Department of Physics and Astronomy, CMS Centre for Molecular Simulation, IQST Institute for Quantum Science and Technology, Quantum Alberta, University of Calgary, Calgary, Alberta, Canada
| |
Collapse
|
12
|
Liu D, Song T, Na K, Wang S. PED: a novel predictor-encoder-decoder model for Alzheimer drug molecular generation. Front Artif Intell 2024; 7:1374148. [PMID: 38690194 PMCID: PMC11058643 DOI: 10.3389/frai.2024.1374148] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2024] [Accepted: 04/01/2024] [Indexed: 05/02/2024] Open
Abstract
Alzheimer's disease (AD) is a gradually advancing neurodegenerative disorder characterized by a concealed onset. Acetylcholinesterase (AChE) is an efficient hydrolase that catalyzes the hydrolysis of acetylcholine (ACh), which regulates the concentration of ACh at synapses and then terminates ACh-mediated neurotransmission. There are inhibitors to inhibit the activity of AChE currently, but its side effects are inevitable. In various application fields where Al have gained prominence, neural network-based models for molecular design have recently emerged and demonstrate encouraging outcomes. However, in the conditional molecular generation task, most of the current generation models need additional optimization algorithms to generate molecules with intended properties which make molecular generation inefficient. Consequently, we introduce a cognitive-conditional molecular design model, termed PED, which leverages the variational auto-encoder. Its primary function is to adeptly produce a molecular library tailored for specific properties. From this library, we can then identify molecules that inhibit AChE activity without adverse effects. These molecules serve as lead compounds, hastening AD treatment and concurrently enhancing the AI's cognitive abilities. In this study, we aim to fine-tune a VAE model pre-trained on the ZINC database using active compounds of AChE collected from Binding DB. Different from other molecular generation models, the PED can simultaneously perform both property prediction and molecule generation, consequently, it can generate molecules with intended properties without additional optimization process. Experiments of evaluation show that proposed model performs better than other methods benchmarked on the same data sets. The results indicated that the model learns a good representation of potential chemical space, it can well generate molecules with intended properties. Extensive experiments on benchmark datasets confirmed PED's efficiency and efficacy. Furthermore, we also verified the binding ability of molecules to AChE through molecular docking. The results showed that our molecular generation system for AD shows excellent cognitive capacities, the molecules within the molecular library could bind well to AChE and inhibit its activity, thus preventing the hydrolysis of ACh.
Collapse
Affiliation(s)
- Dayan Liu
- College of Computer Science and Technology, China University of Petroleum (East China), Qingdao, China
| | - Tao Song
- College of Computer Science and Technology, China University of Petroleum (East China), Qingdao, China
| | - Kang Na
- The Ninth Department of Health Care Administration, The Second Medical Center, Chinese PLA General Hospital, Beijing, China
| | - Shudong Wang
- College of Computer Science and Technology, China University of Petroleum (East China), Qingdao, China
| |
Collapse
|
13
|
Balaji E V, Satarker S, Kumar BH, Pandey S, Birangal SR, Nayak UY, Pai KSR. In-silico lead identification of the pan-mutant IDH1 and IDH2 inhibitors to target glioblastoma. J Biomol Struct Dyn 2024; 42:3764-3789. [PMID: 37227789 DOI: 10.1080/07391102.2023.2215884] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2023] [Accepted: 05/12/2023] [Indexed: 05/27/2023]
Abstract
Glioblastoma (GBM) is an aggressive malignant type of brain tumor. Targeting one single intracellular pathway might not alleviate the disease, rather it activates the other molecular pathways that lead to the worsening of the disease condition. Therefore, in this study, we attempted to target both isocitrate dehydrogenase 1 (IDH1) and IDH2, which are one of the most commonly mutated proteins in GBM and other cancer types. Here, standard precision and extra precision docking, IFD, MM-GBSA, QikProp, and molecular dynamics (MD) simulation were performed to identify the potential dual inhibitor for IDH1 and IDH2 from the enamine database containing 59,161 ligands. Upon docking the ligands with IDH1 (PDB: 6VEI) and IDH2 (PDB: 6VFZ), the top eight ligands were selected, based on the XP Glide score. These ligands produced favourable MMGBSA scores and ADME characteristics. Finally, the top four ligands 12953, 44825, 51295, and 53210 were subjected to MD analysis. Interestingly, 53210 showed maximum interaction with Gln 277 for 99% in IDH1 and Gln 316 for 100% in IDH2, which are the crucial amino acids for the inhibitory function of IDH1 and IDH2 to target GBM. Therefore, the present study attempts to identify the novel molecules which could possess a pan-inhibitory action on both IDH1 and IDH that could be crucial in the management of GBM. Yet further evaluation involving in vitro and in vivo studies is warranted to support the data in our current study.Communicated by Ramaswamy H. Sarma.
Collapse
Affiliation(s)
- Vignesh Balaji E
- Department of Pharmacology, Manipal College of Pharmaceutical Sciences, Manipal Academy of Higher Education, Manipal, Karnataka, India
| | - Sairaj Satarker
- Department of Pharmacology, Manipal College of Pharmaceutical Sciences, Manipal Academy of Higher Education, Manipal, Karnataka, India
| | - B Harish Kumar
- Department of Pharmacology, Manipal College of Pharmaceutical Sciences, Manipal Academy of Higher Education, Manipal, Karnataka, India
| | - Samyak Pandey
- Department of Pharmacology, Manipal College of Pharmaceutical Sciences, Manipal Academy of Higher Education, Manipal, Karnataka, India
| | - Sumit Raosaheb Birangal
- Department of Pharmaceutical Chemistry, Manipal College of Pharmaceutical Sciences, Manipal Academy of Higher Education, Manipal, Karnataka, India
| | - Usha Y Nayak
- Department of Pharmaceutics, Manipal College of Pharmaceutical Sciences, Manipal Academy of Higher Education, Manipal, Karnataka, India
| | - K Sreedhara Ranganath Pai
- Department of Pharmacology, Manipal College of Pharmaceutical Sciences, Manipal Academy of Higher Education, Manipal, Karnataka, India
| |
Collapse
|
14
|
Ghandikota SK, Jegga AG. Application of artificial intelligence and machine learning in drug repurposing. PROGRESS IN MOLECULAR BIOLOGY AND TRANSLATIONAL SCIENCE 2024; 205:171-211. [PMID: 38789178 DOI: 10.1016/bs.pmbts.2024.03.030] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/26/2024]
Abstract
The purpose of drug repurposing is to leverage previously approved drugs for a particular disease indication and apply them to another disease. It can be seen as a faster and more cost-effective approach to drug discovery and a powerful tool for achieving precision medicine. In addition, drug repurposing can be used to identify therapeutic candidates for rare diseases and phenotypic conditions with limited information on disease biology. Machine learning and artificial intelligence (AI) methodologies have enabled the construction of effective, data-driven repurposing pipelines by integrating and analyzing large-scale biomedical data. Recent technological advances, especially in heterogeneous network mining and natural language processing, have opened up exciting new opportunities and analytical strategies for drug repurposing. In this review, we first introduce the challenges in repurposing approaches and highlight some success stories, including those during the COVID-19 pandemic. Next, we review some existing computational frameworks in the literature, organized on the basis of the type of biomedical input data analyzed and the computational algorithms involved. In conclusion, we outline some exciting new directions that drug repurposing research may take, as pioneered by the generative AI revolution.
Collapse
Affiliation(s)
- Sudhir K Ghandikota
- Division of Biomedical Informatics, Cincinnati Children's Hospital Medical Center, Cincinnati, OH, United States
| | - Anil G Jegga
- Division of Biomedical Informatics, Cincinnati Children's Hospital Medical Center, Cincinnati, OH, United States; Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, OH, United States.
| |
Collapse
|
15
|
Dodds M, Guo J, Löhr T, Tibo A, Engkvist O, Janet JP. Sample efficient reinforcement learning with active learning for molecular design. Chem Sci 2024; 15:4146-4160. [PMID: 38487235 PMCID: PMC10935729 DOI: 10.1039/d3sc04653b] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2023] [Accepted: 02/07/2024] [Indexed: 03/17/2024] Open
Abstract
Reinforcement learning (RL) is a powerful and flexible paradigm for searching for solutions in high-dimensional action spaces. However, bridging the gap between playing computer games with thousands of simulated episodes and solving real scientific problems with complex and involved environments (up to actual laboratory experiments) requires improvements in terms of sample efficiency to make the most of expensive information. The discovery of new drugs is a major commercial application of RL, motivated by the very large nature of the chemical space and the need to perform multiparameter optimization (MPO) across different properties. In silico methods, such as virtual library screening (VS) and de novo molecular generation with RL, show great promise in accelerating this search. However, incorporation of increasingly complex computational models in these workflows requires increasing sample efficiency. Here, we introduce an active learning system linked with an RL model (RL-AL) for molecular design, which aims to improve the sample-efficiency of the optimization process. We identity and characterize unique challenges combining RL and AL, investigate the interplay between the systems, and develop a novel AL approach to solve the MPO problem. Our approach greatly expedites the search for novel solutions relative to baseline-RL for simple ligand- and structure-based oracle functions, with a 5-66-fold increase in hits generated for a fixed oracle budget and a 4-64-fold reduction in computational time to find a specific number of hits. Furthermore, compounds discovered through RL-AL display substantial enrichment of a multi-parameter scoring objective, indicating superior efficacy in curating high-scoring compounds, without a reduction in output diversity. This significant acceleration improves the feasibility of oracle functions that have largely been overlooked in RL due to high computational costs, for example free energy perturbation methods, and in principle is applicable to any RL domain.
Collapse
Affiliation(s)
- Michael Dodds
- Molecular AI, Discovery Sciences, R&D, AstraZeneca 431 50 Gothenburg Sweden
| | - Jeff Guo
- Molecular AI, Discovery Sciences, R&D, AstraZeneca 431 50 Gothenburg Sweden
| | - Thomas Löhr
- Molecular AI, Discovery Sciences, R&D, AstraZeneca 431 50 Gothenburg Sweden
| | - Alessandro Tibo
- Molecular AI, Discovery Sciences, R&D, AstraZeneca 431 50 Gothenburg Sweden
| | - Ola Engkvist
- Molecular AI, Discovery Sciences, R&D, AstraZeneca 431 50 Gothenburg Sweden
| | - Jon Paul Janet
- Molecular AI, Discovery Sciences, R&D, AstraZeneca 431 50 Gothenburg Sweden
| |
Collapse
|
16
|
Loeffler HH, He J, Tibo A, Janet JP, Voronov A, Mervin LH, Engkvist O. Reinvent 4: Modern AI-driven generative molecule design. J Cheminform 2024; 16:20. [PMID: 38383444 PMCID: PMC10882833 DOI: 10.1186/s13321-024-00812-5] [Citation(s) in RCA: 20] [Impact Index Per Article: 20.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2023] [Accepted: 02/09/2024] [Indexed: 02/23/2024] Open
Abstract
REINVENT 4 is a modern open-source generative AI framework for the design of small molecules. The software utilizes recurrent neural networks and transformer architectures to drive molecule generation. These generators are seamlessly embedded within the general machine learning optimization algorithms, transfer learning, reinforcement learning and curriculum learning. REINVENT 4 enables and facilitates de novo design, R-group replacement, library design, linker design, scaffold hopping and molecule optimization. This contribution gives an overview of the software and describes its design. Algorithms and their applications are discussed in detail. REINVENT 4 is a command line tool which reads a user configuration in either TOML or JSON format. The aim of this release is to provide reference implementations for some of the most common algorithms in AI based molecule generation. An additional goal with the release is to create a framework for education and future innovation in AI based molecular design. The software is available from https://github.com/MolecularAI/REINVENT4 and released under the permissive Apache 2.0 license. Scientific contribution. The software provides an open-source reference implementation for generative molecular design where the software is also being used in production to support in-house drug discovery projects. The publication of the most common machine learning algorithms in one code and full documentation thereof will increase transparency of AI and foster innovation, collaboration and education.
Collapse
Affiliation(s)
- Hannes H Loeffler
- Molecular AI, Discovery Sciences, R&D, AstraZeneca, Gothenburg, Sweden.
| | - Jiazhen He
- Molecular AI, Discovery Sciences, R&D, AstraZeneca, Gothenburg, Sweden
| | - Alessandro Tibo
- Molecular AI, Discovery Sciences, R&D, AstraZeneca, Gothenburg, Sweden
| | - Jon Paul Janet
- Molecular AI, Discovery Sciences, R&D, AstraZeneca, Gothenburg, Sweden
| | - Alexey Voronov
- Molecular AI, Discovery Sciences, R&D, AstraZeneca, Gothenburg, Sweden
| | - Lewis H Mervin
- Molecular AI, Discovery Sciences, R&D, AstraZeneca, Cambridge, UK
| | - Ola Engkvist
- Molecular AI, Discovery Sciences, R&D, AstraZeneca, Gothenburg, Sweden
| |
Collapse
|
17
|
Kyro GW, Morgunov A, Brent RI, Batista VS. ChemSpaceAL: An Efficient Active Learning Methodology Applied to Protein-Specific Molecular Generation. J Chem Inf Model 2024; 64:653-665. [PMID: 38287889 DOI: 10.1021/acs.jcim.3c01456] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/31/2024]
Abstract
The incredible capabilities of generative artificial intelligence models have inevitably led to their application in the domain of drug discovery. Within this domain, the vastness of chemical space motivates the development of more efficient methods for identifying regions with molecules that exhibit desired characteristics. In this work, we present a computationally efficient active learning methodology and demonstrate its applicability to targeted molecular generation. When applied to c-Abl kinase, a protein with FDA-approved small-molecule inhibitors, the model learns to generate molecules similar to the inhibitors without prior knowledge of their existence and even reproduces two of them exactly. We also show that the methodology is effective for a protein without any commercially available small-molecule inhibitors, the HNH domain of the CRISPR-associated protein 9 (Cas9) enzyme. To facilitate implementation and reproducibility, we made all of our software available through the open-source ChemSpaceAL Python package.
Collapse
Affiliation(s)
- Gregory W Kyro
- Department of Chemistry, Yale University, New Haven, Connecticut 06511-8499, United States
| | - Anton Morgunov
- Department of Chemistry, Yale University, New Haven, Connecticut 06511-8499, United States
| | - Rafael I Brent
- Department of Chemistry, Yale University, New Haven, Connecticut 06511-8499, United States
| | - Victor S Batista
- Department of Chemistry, Yale University, New Haven, Connecticut 06511-8499, United States
| |
Collapse
|
18
|
Bo W, Duan Y, Zou Y, Ma Z, Yang T, Wang P, Guo T, Fu Z, Wang J, Fan L, Liu J, Wang T, Chen L. Local Scaffold Diversity-Contributed Generator for Discovering Potential NLRP3 Inhibitors. J Chem Inf Model 2024; 64:737-748. [PMID: 38258981 DOI: 10.1021/acs.jcim.3c01818] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/24/2024]
Abstract
Deep generative models have become crucial tools in de novo drug design. In current models for multiobjective optimization in molecular generation, the scaffold diversity is limited when multiple constraints are introduced. To enhance scaffold diversity, we herein propose a local scaffold diversity-contributed generator (LSDC), which can be utilized to generate diverse lead compounds capable of satisfying multiple constraints. Compared to the state-of-the-art methods, molecules generated by LSDC exhibit greater diversity when applied to the generation of inhibitors targeting the NOD-like receptor (NLR) family, pyrin domain-containing protein 3 (NLRP3). We present 12 molecules, some of which feature previously unreported scaffolds, and demonstrate their reasonable docking binding modes. Consequently, the modification of selected scaffolds and subsequent bioactivity evaluation lead to the discovery of two potent NLRP3 inhibitors, A22 and A14, with IC50 values of 38.1 nM and 44.43 nM, respectively. And the oral bioavailability of compound A14 is very high (F is 83.09% in mice). This work contributes to the discovery of novel NLRP3 inhibitors and provides a reference for integrating AI-based generation with wet experiments.
Collapse
Affiliation(s)
- Weichen Bo
- Department of Biotherapy, Cancer Center and State Key Laboratory of Biotherapy, West China Hospital, Sichuan University, Chengdu 610041, China
| | - Yangqin Duan
- Department of Biotherapy, Cancer Center and State Key Laboratory of Biotherapy, West China Hospital, Sichuan University, Chengdu 610041, China
| | - Yurong Zou
- Department of Biotherapy, Cancer Center and State Key Laboratory of Biotherapy, West China Hospital, Sichuan University, Chengdu 610041, China
| | - Ziyan Ma
- Department of Biotherapy, Cancer Center and State Key Laboratory of Biotherapy, West China Hospital, Sichuan University, Chengdu 610041, China
| | - Tao Yang
- Department of Biotherapy, Cancer Center and State Key Laboratory of Biotherapy, West China Hospital, Sichuan University, Chengdu 610041, China
| | - Peng Wang
- Department of Biotherapy, Cancer Center and State Key Laboratory of Biotherapy, West China Hospital, Sichuan University, Chengdu 610041, China
| | - Tao Guo
- Department of Biotherapy, Cancer Center and State Key Laboratory of Biotherapy, West China Hospital, Sichuan University, Chengdu 610041, China
| | - Zhiyuan Fu
- Department of Biotherapy, Cancer Center and State Key Laboratory of Biotherapy, West China Hospital, Sichuan University, Chengdu 610041, China
| | - Jianmin Wang
- The Interdisciplinary Graduate Program in Integrative Biotechnology and Translational Medicine, Yonsei University, Incheon 21983, Republic of Korea
| | - Linchuan Fan
- College of Automation, Chongqing University, Chongqing 40000, China
| | - Jie Liu
- Department of Biotherapy, Cancer Center and State Key Laboratory of Biotherapy, West China Hospital, Sichuan University, Chengdu 610041, China
| | - Taijin Wang
- Chengdu Zenitar Biomedical Technology Co., Ltd, Chengdu 610041, China
| | - Lijuan Chen
- Department of Biotherapy, Cancer Center and State Key Laboratory of Biotherapy, West China Hospital, Sichuan University, Chengdu 610041, China
- Chengdu Zenitar Biomedical Technology Co., Ltd, Chengdu 610041, China
| |
Collapse
|
19
|
Tropsha A, Isayev O, Varnek A, Schneider G, Cherkasov A. Integrating QSAR modelling and deep learning in drug discovery: the emergence of deep QSAR. Nat Rev Drug Discov 2024; 23:141-155. [PMID: 38066301 DOI: 10.1038/s41573-023-00832-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/21/2023] [Indexed: 02/08/2024]
Abstract
Quantitative structure-activity relationship (QSAR) modelling, an approach that was introduced 60 years ago, is widely used in computer-aided drug design. In recent years, progress in artificial intelligence techniques, such as deep learning, the rapid growth of databases of molecules for virtual screening and dramatic improvements in computational power have supported the emergence of a new field of QSAR applications that we term 'deep QSAR'. Marking a decade from the pioneering applications of deep QSAR to tasks involved in small-molecule drug discovery, we herein describe key advances in the field, including deep generative and reinforcement learning approaches in molecular design, deep learning models for synthetic planning and the application of deep QSAR models in structure-based virtual screening. We also reflect on the emergence of quantum computing, which promises to further accelerate deep QSAR applications and the need for open-source and democratized resources to support computer-aided drug design.
Collapse
Affiliation(s)
| | | | | | | | - Artem Cherkasov
- University of British Columbia, Vancouver, BC, Canada.
- Photonic Inc., Coquitlam, BC, Canada.
| |
Collapse
|
20
|
Abdallah A, Adel N, Elkerdawy AM, Tanabe S, Andres F, Pester A, Ali HH. Geom-SAC: Geometric Multi-Discrete Soft Actor Critic With Applications in De Novo Drug Design. IEEE ACCESS 2024; 12:45519-45529. [DOI: 10.1109/access.2024.3377289] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/02/2024]
Affiliation(s)
- Amgad Abdallah
- Faculty of Informatics and Computer Science, AI Group, The British University in Egypt, El Sherouk City, Cairo, Egypt
| | - Nada Adel
- Department of Computers, Communications and Autonomous Systems Engineering, Faculty of Engineering, New Giza University, First 6th of October, Giza Governorate, Egypt
| | - A. M. Elkerdawy
- Department of Pharmaceutical Chemistry, Faculty of Pharmacy, Cairo University, Cairo, Egypt
| | - Shihori Tanabe
- Division of Risk Assessment, Center for Biological Safety and Research, National Institute of Health Sciences, Kawasaki-ku, Kawasaki, Japan
| | | | - Andreas Pester
- Faculty of Informatics and Computer Science, AI Group, The British University in Egypt, El Sherouk City, Cairo, Egypt
| | - Hesham H. Ali
- College of Information Science and Technology, University of Nebraska at Omaha, Omaha, NE, USA
| |
Collapse
|
21
|
Kyro GW, Morgunov A, Brent RI, Batista VS. ChemSpaceAL: An Efficient Active Learning Methodology Applied to Protein-Specific Molecular Generation. ARXIV 2023:arXiv:2309.05853v2. [PMID: 37744464 PMCID: PMC10516108] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Subscribe] [Scholar Register] [Indexed: 09/26/2023]
Abstract
The incredible capabilities of generative artificial intelligence models have inevitably led to their application in the domain of drug discovery. Within this domain, the vastness of chemical space motivates the development of more efficient methods for identifying regions with molecules that exhibit desired characteristics. In this work, we present a computationally efficient active learning methodology that requires evaluation of only a subset of the generated data in the constructed sample space to successfully align a generative model with respect to a specified objective. We demonstrate the applicability of this methodology to targeted molecular generation by fine-tuning a GPT-based molecular generator toward a protein with FDA-approved small-molecule inhibitors, c-Abl kinase. Remarkably, the model learns to generate molecules similar to the inhibitors without prior knowledge of their existence, and even reproduces two of them exactly. We also show that the methodology is effective for a protein without any commercially available small-molecule inhibitors, the HNH domain of the CRISPR-associated protein 9 (Cas9) enzyme. We believe that the inherent generality of this method ensures that it will remain applicable as the exciting field of in silico molecular generation evolves. To facilitate implementation and reproducibility, we have made all of our software available through the open-source ChemSpaceAL Python package.
Collapse
|
22
|
Li A, Bouhss A, Clément MJ, Bauvais C, Taylor JP, Bollot G, Pastré D. Using the structural diversity of RNA: protein interfaces to selectively target RNA with small molecules in cells: methods and perspectives. Front Mol Biosci 2023; 10:1298441. [PMID: 38033386 PMCID: PMC10687564 DOI: 10.3389/fmolb.2023.1298441] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2023] [Accepted: 10/24/2023] [Indexed: 12/02/2023] Open
Abstract
In recent years, RNA has gained traction both as a therapeutic molecule and as a therapeutic target in several human pathologies. In this review, we consider the approach of targeting RNA using small molecules for both research and therapeutic purposes. Given the primary challenge presented by the low structural diversity of RNA, we discuss the potential for targeting RNA: protein interactions to enhance the structural and sequence specificity of drug candidates. We review available tools and inherent challenges in this approach, ranging from adapted bioinformatics tools to in vitro and cellular high-throughput screening and functional analysis. We further consider two critical steps in targeting RNA/protein interactions: first, the integration of in silico and structural analyses to improve the efficacy of molecules by identifying scaffolds with high affinity, and second, increasing the likelihood of identifying on-target compounds in cells through a combination of high-throughput approaches and functional assays. We anticipate that the development of a new class of molecules targeting RNA: protein interactions to prevent physio-pathological mechanisms could significantly expand the arsenal of effective therapeutic compounds.
Collapse
Affiliation(s)
- Aixiao Li
- Synsight, Genopole Entreprises, Evry, France
| | - Ahmed Bouhss
- Université Paris-Saclay, INSERM U1204, Université d’Évry, Structure-Activité des Biomolécules Normales et Pathologiques (SABNP), Evry, France
| | - Marie-Jeanne Clément
- Université Paris-Saclay, INSERM U1204, Université d’Évry, Structure-Activité des Biomolécules Normales et Pathologiques (SABNP), Evry, France
| | | | - J. Paul Taylor
- Department of Cell and Molecular Biology, St. Jude Children’s Research Hospital, Memphis, TN, United States
| | | | - David Pastré
- Université Paris-Saclay, INSERM U1204, Université d’Évry, Structure-Activité des Biomolécules Normales et Pathologiques (SABNP), Evry, France
| |
Collapse
|
23
|
Tautermann CS, Borghardt JM, Pfau R, Zentgraf M, Weskamp N, Sauer A. Towards holistic Compound Quality Scores: Extending ligand efficiency indices with compound pharmacokinetic characteristics. Drug Discov Today 2023; 28:103758. [PMID: 37660984 DOI: 10.1016/j.drudis.2023.103758] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2023] [Revised: 08/17/2023] [Accepted: 08/28/2023] [Indexed: 09/05/2023]
Abstract
The suitability of small molecules as oral drugs is often assessed by simple physicochemical rules, the application of ligand efficiency scores or by composite scores based on physicochemical compound properties. These rules and scores are empirical and typically lack mechanistic background, such as information on pharmacokinetics (PK). We introduce new types of Compound Quality Scores (CQS, specifically called dose scores and cmax scores), which explicitly include predicted or, when available, experimental PK parameters and combine these with on-target potency. These CQS scores are surrogates for an estimated dose and corresponding cmax and allow prioritizing of compounds within test cascades as well as before synthesis. We demonstrate the complementarity and, in most cases, superior performance relative to existing efficiency metrics by project examples.
Collapse
Affiliation(s)
- Christofer S Tautermann
- Boehringer Ingelheim Pharma GmbH & Co. KG, Medicinal Chemistry, Birkendorfer Strasse 65, Biberach 88397, Germany; Department of General, Inorganic and Theoretical Chemistry, University of Innsbruck, Innsbruck 6020, Austria.
| | - Jens M Borghardt
- Boehringer Ingelheim Pharma GmbH & Co. KG, Drug Discovery Sciences, Birkendorfer Strasse 65, Biberach 88397, Germany.
| | - Roland Pfau
- Boehringer Ingelheim Pharma GmbH & Co. KG, Medicinal Chemistry, Birkendorfer Strasse 65, Biberach 88397, Germany; Boehringer Ingelheim Pharma GmbH & Co. KG, CNS Research, Birkendorfer Strasse 65, Biberach 88397, Germany.
| | - Matthias Zentgraf
- Boehringer Ingelheim Pharma GmbH & Co. KG, Discovery Research Coordination Germany, Birkendorfer Strasse 65, Biberach 88397, Germany.
| | - Nils Weskamp
- Boehringer Ingelheim Pharma GmbH & Co. KG, Medicinal Chemistry, Birkendorfer Strasse 65, Biberach 88397, Germany.
| | - Achim Sauer
- Boehringer Ingelheim Pharma GmbH & Co. KG, Drug Discovery Sciences, Birkendorfer Strasse 65, Biberach 88397, Germany.
| |
Collapse
|
24
|
Anstine D, Isayev O. Generative Models as an Emerging Paradigm in the Chemical Sciences. J Am Chem Soc 2023; 145:8736-8750. [PMID: 37052978 PMCID: PMC10141264 DOI: 10.1021/jacs.2c13467] [Citation(s) in RCA: 65] [Impact Index Per Article: 32.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2022] [Indexed: 04/14/2023]
Abstract
Traditional computational approaches to design chemical species are limited by the need to compute properties for a vast number of candidates, e.g., by discriminative modeling. Therefore, inverse design methods aim to start from the desired property and optimize a corresponding chemical structure. From a machine learning viewpoint, the inverse design problem can be addressed through so-called generative modeling. Mathematically, discriminative models are defined by learning the probability distribution function of properties given the molecular or material structure. In contrast, a generative model seeks to exploit the joint probability of a chemical species with target characteristics. The overarching idea of generative modeling is to implement a system that produces novel compounds that are expected to have a desired set of chemical features, effectively sidestepping issues found in the forward design process. In this contribution, we overview and critically analyze popular generative algorithms like generative adversarial networks, variational autoencoders, flow, and diffusion models. We highlight key differences between each of the models, provide insights into recent success stories, and discuss outstanding challenges for realizing generative modeling discovered solutions in chemical applications.
Collapse
Affiliation(s)
- Dylan
M. Anstine
- Department
of Chemistry, Mellon College of Science, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213, United States
| | - Olexandr Isayev
- Department
of Chemistry, Mellon College of Science, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213, United States
| |
Collapse
|