1
|
Chen H, Lu D, Xiao Z, Li S, Zhang W, Luan X, Zhang W, Zheng G. Comprehensive applications of the artificial intelligence technology in new drug research and development. Health Inf Sci Syst 2024; 12:41. [PMID: 39130617 PMCID: PMC11310389 DOI: 10.1007/s13755-024-00300-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2023] [Accepted: 07/27/2024] [Indexed: 08/13/2024] Open
Abstract
Purpose Target-based strategy is a prevalent means of drug research and development (R&D), since targets provide effector molecules of drug action and offer the foundation of pharmacological investigation. Recently, the artificial intelligence (AI) technology has been utilized in various stages of drug R&D, where AI-assisted experimental methods show higher efficiency than sole experimental ones. It is a critical need to give a comprehensive review of AI applications in drug R &D for biopharmaceutical field. Methods Relevant literatures about AI-assisted drug R&D were collected from the public databases (Including Google Scholar, Web of Science, PubMed, IEEE Xplore Digital Library, Springer, and ScienceDirect) through a keyword searching strategy with the following terms [("Artificial Intelligence" OR "Knowledge Graph" OR "Machine Learning") AND ("Drug Target Identification" OR "New Drug Development")]. Results In this review, we first introduced common strategies and novel trends of drug R&D, followed by characteristic description of AI algorithms widely used in drug R&D. Subsequently, we depicted detailed applications of AI algorithms in target identification, lead compound identification and optimization, drug repurposing, and drug analytical platform construction. Finally, we discussed the challenges and prospects of AI-assisted methods for drug discovery. Conclusion Collectively, this review provides comprehensive overview of AI applications in drug R&D and presents future perspectives for biopharmaceutical field, which may promote the development of drug industry.
Collapse
Affiliation(s)
- Hongyu Chen
- Shanghai Frontiers Science Center for Chinese Medicine Chemical Biology, Institute of Interdisciplinary Integrative Medicine Research, Shanghai University of Traditional Chinese Medicine, Shanghai, China
| | - Dong Lu
- Shanghai Frontiers Science Center for Chinese Medicine Chemical Biology, Institute of Interdisciplinary Integrative Medicine Research, Shanghai University of Traditional Chinese Medicine, Shanghai, China
| | - Ziyi Xiao
- Johns Hopkins Bloomberg School of Public Health, Baltimore, MD USA
| | - Shensuo Li
- Shanghai Frontiers Science Center for Chinese Medicine Chemical Biology, Institute of Interdisciplinary Integrative Medicine Research, Shanghai University of Traditional Chinese Medicine, Shanghai, China
| | - Wen Zhang
- Shanghai Frontiers Science Center for Chinese Medicine Chemical Biology, Institute of Interdisciplinary Integrative Medicine Research, Shanghai University of Traditional Chinese Medicine, Shanghai, China
| | - Xin Luan
- Shanghai Frontiers Science Center for Chinese Medicine Chemical Biology, Institute of Interdisciplinary Integrative Medicine Research, Shanghai University of Traditional Chinese Medicine, Shanghai, China
| | - Weidong Zhang
- Shanghai Frontiers Science Center for Chinese Medicine Chemical Biology, Institute of Interdisciplinary Integrative Medicine Research, Shanghai University of Traditional Chinese Medicine, Shanghai, China
| | - Guangyong Zheng
- Shanghai Frontiers Science Center for Chinese Medicine Chemical Biology, Institute of Interdisciplinary Integrative Medicine Research, Shanghai University of Traditional Chinese Medicine, Shanghai, China
| |
Collapse
|
2
|
Wang J, Zhu F. Multi-objective molecular generation via clustered Pareto-based reinforcement learning. Neural Netw 2024; 179:106596. [PMID: 39163823 DOI: 10.1016/j.neunet.2024.106596] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2024] [Revised: 06/16/2024] [Accepted: 08/01/2024] [Indexed: 08/22/2024]
Abstract
De novo molecular design is the process of learning knowledge from existing data to propose new chemical structures that satisfy the desired properties. By using de novo design to generate compounds in a directed manner, better solutions can be obtained in large chemical libraries with less comparison cost. But drug design needs to take multiple factors into consideration. For example, in polypharmacology, molecules that activate or inhibit multiple target proteins produce multiple pharmacological activities and are less susceptible to drug resistance. However, most existing molecular generation methods either focus only on affinity for a single target or fail to effectively balance the relationship between multiple targets, resulting in insufficient validity and desirability of the generated molecules. To address the problems, an approach called clustered Pareto-based reinforcement learning (CPRL) is proposed. In CPRL, a pre-trained model is constructed to grasp existing molecular knowledge in a supervised learning manner. In addition, the clustered Pareto optimization algorithm is presented to find the best solution between different objectives. The algorithm first extracts an update set from the sampled molecules through the designed aggregation-based molecular clustering. Then, the final reward is computed by constructing the Pareto frontier ranking of the molecules from the updated set. To explore the vast chemical space, a reinforcement learning agent is designed in CPRL that can be updated under the guidance of the final reward to balance multiple properties. Furthermore, to increase the internal diversity of the molecules, a fixed-parameter exploration model is used for sampling in conjunction with the agent. The experimental results demonstrate that CPRL is capable of balancing multiple properties of the molecule and has higher desirability and validity, reaching 0.9551 and 0.9923, respectively.
Collapse
Affiliation(s)
- Jing Wang
- School of Computer Science and Technology, Soochow University, Suzhou, 215006, China.
| | - Fei Zhu
- School of Computer Science and Technology, Soochow University, Suzhou, 215006, China.
| |
Collapse
|
3
|
Chen X, Xu S, Chu B, Guo J, Zhang H, Sun S, Song L, Feng XQ. Applying Spatiotemporal Modeling of Cell Dynamics to Accelerate Drug Development. ACS NANO 2024; 18:29311-29336. [PMID: 39420743 DOI: 10.1021/acsnano.4c12599] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/19/2024]
Abstract
Cells act as physical computational programs that utilize input signals to orchestrate molecule-level protein-protein interactions (PPIs), generating and responding to forces, ultimately shaping all of the physiological and pathophysiological behaviors. Genome editing and molecule drugs targeting PPIs hold great promise for the treatments of diseases. Linking genes and molecular drugs with protein-performed cellular behaviors is a key yet challenging issue due to the wide range of spatial and temporal scales involved. Building predictive spatiotemporal modeling systems that can describe the dynamic behaviors of cells intervened by genome editing and molecular drugs at the intersection of biology, chemistry, physics, and computer science will greatly accelerate pharmaceutical advances. Here, we review the mechanical roles of cytoskeletal proteins in orchestrating cellular behaviors alongside significant advancements in biophysical modeling while also addressing the limitations in these models. Then, by integrating generative artificial intelligence (AI) with spatiotemporal multiscale biophysical modeling, we propose a computational pipeline for developing virtual cells, which can simulate and evaluate the therapeutic effects of drugs and genome editing technologies on various cell dynamic behaviors and could have broad biomedical applications. Such virtual cell modeling systems might revolutionize modern biomedical engineering by moving most of the painstaking wet-laboratory effort to computer simulations, substantially saving time and alleviating the financial burden for pharmaceutical industries.
Collapse
Affiliation(s)
- Xindong Chen
- Institute of Biomechanics and Medical Engineering, Department of Engineering Mechanics, Tsinghua University, Beijing 100084, China
- BioMap, Beijing 100144, China
| | - Shihao Xu
- Institute of Biomechanics and Medical Engineering, Department of Engineering Mechanics, Tsinghua University, Beijing 100084, China
| | - Bizhu Chu
- School of Pharmacy, Shenzhen University, Shenzhen 518055, China
- Medical School, Shenzhen University, Shenzhen 518055, China
| | - Jing Guo
- Department of Medical Oncology, Xiamen Key Laboratory of Antitumor Drug Transformation Research, The First Affiliated Hospital of Xiamen University, Xiamen 361000, China
| | - Huikai Zhang
- Institute of Biomechanics and Medical Engineering, Department of Engineering Mechanics, Tsinghua University, Beijing 100084, China
| | - Shuyi Sun
- Institute of Biomechanics and Medical Engineering, Department of Engineering Mechanics, Tsinghua University, Beijing 100084, China
| | - Le Song
- BioMap, Beijing 100144, China
| | - Xi-Qiao Feng
- Institute of Biomechanics and Medical Engineering, Department of Engineering Mechanics, Tsinghua University, Beijing 100084, China
| |
Collapse
|
4
|
Nakata S, Mori Y, Tanaka S. Navigating Ultralarge Virtual Chemical Spaces with Product-of-Experts Chemical Language Models. J Chem Inf Model 2024; 64:7873-7884. [PMID: 39413401 DOI: 10.1021/acs.jcim.4c01214] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2024]
Abstract
Ultralarge virtual chemical spaces have emerged as a valuable resource for drug discovery, providing access to billions of make-on-demand compounds with high synthetic success rates. Chemical language models can potentially accelerate the exploration of these vast spaces through direct compound generation. However, existing models are not designed to navigate specific virtual chemical spaces and often overlook synthetic accessibility. To address this gap, we introduce product-of-experts (PoE) chemical language models, a modular and scalable approach to navigating ultralarge virtual chemical spaces. This method allows for controlled compound generation within a desired chemical space by combining a prior model pretrained on the target space with expert and anti-expert models fine-tuned using external property-specific data sets. We demonstrate that the PoE chemical language model can generate compounds with desirable properties, such as those that favorably dock to dopamine receptor D2 (DRD2) and are predicted to cross the blood-brain barrier (BBB), while ensuring that the majority of generated compounds are present within the target chemical space. Our results highlight the potential of chemical language models for navigating ultralarge virtual chemical spaces, and we anticipate that this study will motivate further research in this direction. The source code and data are freely available at https://github.com/shuyana/poeclm.
Collapse
Affiliation(s)
- Shuya Nakata
- Graduate School of System Informatics, Kobe University, Kobe 657-8501, Japan
| | - Yoshiharu Mori
- Graduate School of System Informatics, Kobe University, Kobe 657-8501, Japan
| | - Shigenori Tanaka
- Graduate School of System Informatics, Kobe University, Kobe 657-8501, Japan
| |
Collapse
|
5
|
Ahmad W, Chong KT, Tayara H. GGAS2SN: Gated Graph and SmilesToSeq Network for Solubility Prediction. J Chem Inf Model 2024; 64:7833-7843. [PMID: 39387596 DOI: 10.1021/acs.jcim.4c00792] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/15/2024]
Abstract
Aqueous solubility is a critical physicochemical property of drug discovery. Solubility is a key issue in pharmaceutical development because it can limit a drug's absorption capacity. Accurate solubility prediction is crucial for pharmacological, environmental, and drug development studies. This research introduces a novel method for solubility prediction by combining gated graph neural networks (GGNNs) and graph attention neural networks (GATs) with Smiles2Seq encoding. Our methodology involves converting chemical compounds into graph structures with nodes representing atoms and edges indicating chemical bonds. These graphs are then processed by using a specialized graph neural network (GNN) architecture. Incorporating attention mechanisms into GNN allows for capturing subtle structural dependencies, fostering improved solubility predictions. Furthermore, we utilized the Smiles2Seq encoding technique to bridge the semantic gap between molecular structures and their textual representations. Smiles2Seq seamlessly converts chemical notations into numeric sequences, facilitating the efficient transfer of information into our model. We demonstrate the efficacy of our approach through comprehensive experiments on benchmark solubility data sets, showcasing superior predictive performance compared to traditional methods. Our model outperforms existing solubility prediction models and provides interpretable insights into the molecular features driving solubility behavior. This research signifies an important advancement in solubility prediction, offering potent tools for drug discovery, formulation development, and environmental assessments. The fusion of GGNN and Smiles2Seq encoding establishes a robust framework for accurately forecasting solubility across various chemical compounds, fostering innovation in various domains reliant on solubility data.
Collapse
Affiliation(s)
- Waqar Ahmad
- Department of Electronics and Information Engineering, Jeonbuk National University, Jeonju 54896, South Korea
| | - Kil To Chong
- Department of Electronics and Information Engineering, Jeonbuk National University, Jeonju 54896, South Korea
- Advanced Electronics and Information Research Center, Jeonbuk National University, Jeonju 54896, Korea
| | - Hilal Tayara
- School of International Engineering and Science, Jeonbuk National University, Jeonju 54896, Korea
| |
Collapse
|
6
|
Cheng AH, Ser CT, Skreta M, Guzmán-Cordero A, Thiede L, Burger A, Aldossary A, Leong SX, Pablo-García S, Strieth-Kalthoff F, Aspuru-Guzik A. Spiers Memorial Lecture: How to do impactful research in artificial intelligence for chemistry and materials science. Faraday Discuss 2024. [PMID: 39400305 DOI: 10.1039/d4fd00153b] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/15/2024]
Abstract
Machine learning has been pervasively touching many fields of science. Chemistry and materials science are no exception. While machine learning has been making a great impact, it is still not reaching its full potential or maturity. In this perspective, we first outline current applications across a diversity of problems in chemistry. Then, we discuss how machine learning researchers view and approach problems in the field. Finally, we provide our considerations for maximizing impact when researching machine learning for chemistry.
Collapse
Affiliation(s)
- Austin H Cheng
- Department of Chemistry, University of Toronto, Toronto, Ontario M5S 3H6, Canada.
- Department of Computer Science, University of Toronto, Toronto, Ontario M5S 2E4, Canada
- Vector Institute for Artificial Intelligence, Toronto, Ontario M5G 1M1, Canada
| | - Cher Tian Ser
- Department of Chemistry, University of Toronto, Toronto, Ontario M5S 3H6, Canada.
- Department of Computer Science, University of Toronto, Toronto, Ontario M5S 2E4, Canada
- Vector Institute for Artificial Intelligence, Toronto, Ontario M5G 1M1, Canada
| | - Marta Skreta
- Department of Computer Science, University of Toronto, Toronto, Ontario M5S 2E4, Canada
- Vector Institute for Artificial Intelligence, Toronto, Ontario M5G 1M1, Canada
| | - Andrés Guzmán-Cordero
- Vector Institute for Artificial Intelligence, Toronto, Ontario M5G 1M1, Canada
- Tinbergen Institute, University of Amsterdam, Amsterdam, Netherlands
| | - Luca Thiede
- Department of Computer Science, University of Toronto, Toronto, Ontario M5S 2E4, Canada
- Vector Institute for Artificial Intelligence, Toronto, Ontario M5G 1M1, Canada
| | - Andreas Burger
- Department of Computer Science, University of Toronto, Toronto, Ontario M5S 2E4, Canada
- Vector Institute for Artificial Intelligence, Toronto, Ontario M5G 1M1, Canada
| | | | - Shi Xuan Leong
- Department of Chemistry, University of Toronto, Toronto, Ontario M5S 3H6, Canada.
- School of Chemistry, Chemical Engineering and Biotechnology, Nanyang Technological University, Singapore 63737, Singapore
| | | | | | - Alán Aspuru-Guzik
- Department of Chemistry, University of Toronto, Toronto, Ontario M5S 3H6, Canada.
- Department of Computer Science, University of Toronto, Toronto, Ontario M5S 2E4, Canada
- Vector Institute for Artificial Intelligence, Toronto, Ontario M5G 1M1, Canada
- Acceleration Consortium, Toronto, Ontario M5G 1X6, Canada
- Department of Chemical Engineering and Applied Chemistry, University of Toronto, Canada
- Department of Materials Science and Engineering, University of Toronto, Canada
- Lebovic Fellow, Canadian Institute for Advanced Research (CIFAR), Canada
| |
Collapse
|
7
|
Malusare A, Aggarwal V. Improving Molecule Generation and Drug Discovery with a Knowledge-enhanced Generative Model. ARXIV 2024:arXiv:2402.08790v2. [PMID: 38410649 PMCID: PMC10896363] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Subscribe] [Scholar Register] [Indexed: 02/28/2024]
Abstract
Recent advancements in generative models have established state-of-the-art benchmarks in the generation of molecules and novel drug candidates. Despite these successes, a significant gap persists between generative models and the utilization of extensive biomedical knowledge, often systematized within knowledge graphs, whose potential to inform and enhance generative processes has not been realized. In this paper, we present a novel approach that bridges this divide by developing a framework for knowledge-enhanced generative models called KARL. We develop a scalable methodology to extend the functionality of knowledge graphs while preserving semantic integrity, and incorporate this contextual information into a generative framework to guide a diffusion-based model. The integration of knowledge graph embeddings with our generative model furnishes a robust mechanism for producing novel drug candidates possessing specific characteristics while ensuring validity and synthesizability. KARL outperforms state-of-the-art generative models on both unconditional and targeted generation tasks.
Collapse
Affiliation(s)
- Aditya Malusare
- Edwardson School of Industrial Engineering and the Institute of Cancer Research, Purdue University
| | - Vaneet Aggarwal
- Edwardson School of Industrial Engineering and the Institute of Cancer Research, Purdue University
| |
Collapse
|
8
|
Hoque A, Surve M, Kalyanakrishnan S, Sunoj RB. Reinforcement Learning for Improving Chemical Reaction Performance. J Am Chem Soc 2024. [PMID: 39356950 DOI: 10.1021/jacs.4c08866] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/04/2024]
Abstract
Deep learning (DL) methods have gained notable prominence in predictive and generative tasks in molecular space. However, their application in chemical reactions remains grossly underutilized. Chemical reactions are intrinsically complex: typically involving multiple molecules besides bond-breaking/forming events. In reaction discovery, one aims to maximize yield and/or selectivity that depends on a number of factors, mostly centered on reacting partners and reaction conditions. Herein, we introduce RE-EXPLORE, a novel approach that integrates deep reinforcement learning (RL) with an RNN-based deep generative model to identify prospective new reactants/catalysts, whose yield/selectivity is estimated using a pretrained regressor. Three chemical databases (ChEMBL, ZINC, and COCONUT containing half a million to one million unlabeled molecules) are independently used for pretraining the generators to enrich them with valuable information from diverse chemical space. Standard RL methods are found to be insufficient, as learners tend to prioritize exploitation for immediate gains, resulting in repetitive generation of same/similar molecules. Our engineered reward function includes a Tanimoto-based uniqueness factor within the RL loop that improved the exploration of the environment and has helped accrue larger returns. Integration of a user-defined core fragment into the generated molecules facilitated learning of specific reaction types. Together, RE-EXPLORE can navigate the reaction space toward practically meaningful regions and offers notable improvements across the three distinct reaction types considered in this study. It identifies high-yielding substrates and highly enantioselective chiral catalysts. This RL-based approach has the potential to expedite reaction discovery and aid in the synthesis planning of important compounds, including drugs and pharmaceuticals.
Collapse
Affiliation(s)
- Ajnabiul Hoque
- Department of Chemistry, Indian Institute of Technology Bombay, Powai, Mumbai 400076, India
| | - Mihir Surve
- Department of Chemistry, Indian Institute of Technology Bombay, Powai, Mumbai 400076, India
| | - Shivaram Kalyanakrishnan
- Department of Computer Science and Engineering, Indian Institute of Technology Bombay, Powai, Mumbai 400076, India
| | - Raghavan B Sunoj
- Department of Chemistry, Indian Institute of Technology Bombay, Powai, Mumbai 400076, India
- Center for Machine Intelligence and Data Science (CMInDS), Indian Institute of Technology Bombay, Powai, Mumbai 400076, India
| |
Collapse
|
9
|
Loeffler HH, Wan S, Klähn M, Bhati AP, Coveney PV. Optimal Molecular Design: Generative Active Learning Combining REINVENT with Precise Binding Free Energy Ranking Simulations. J Chem Theory Comput 2024; 20. [PMID: 39225482 PMCID: PMC11428133 DOI: 10.1021/acs.jctc.4c00576] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2024] [Revised: 08/08/2024] [Accepted: 08/08/2024] [Indexed: 09/04/2024]
Abstract
Active learning (AL) is a specific instance of sequential experimental design and uses machine learning to intelligently choose the next data point or batch of molecular structures to be evaluated. In this sense, it closely mimics the iterative design-make-test-analysis cycle of laboratory experiments to find optimized compounds for a given design task. Here, we describe an AL protocol which combines generative molecular AI, using REINVENT, and physics-based absolute binding free energy molecular dynamics simulation, using ESMACS, to discover new ligands for two different target proteins, 3CLpro and TNKS2. We have deployed our generative active learning (GAL) protocol on Frontier, the world's only exa-scale machine. We show that the protocol can find higher-scoring molecules compared to the baseline, a surrogate ML docking model for 3CLpro and compounds with experimentally determined binding affinities for TNKS2. The ligands found are also chemically diverse and occupy a different chemical space than the baseline. We vary the batch sizes that are put forward for free energy assessment in each GAL cycle to assess the impact on their efficiency on the GAL protocol and recommend their optimal values in different scenarios. Overall, we demonstrate a powerful capability of the combination of physics-based and AI methods which yields effective chemical space sampling at an unprecedented scale and is of immediate and direct relevance to modern, data-driven drug discovery.
Collapse
Affiliation(s)
- Hannes H. Loeffler
- Molecular
AI, Discovery Sciences, R&D, AstraZeneca, Mölndal 431 83, Sweden
| | - Shunzhou Wan
- Centre
for Computational Science, Department of Chemistry, University College London, London WC1H 0AJ, U.K.
| | - Marco Klähn
- Molecular
AI, Discovery Sciences, R&D, AstraZeneca, Mölndal 431 83, Sweden
| | - Agastya P. Bhati
- Centre
for Computational Science, Department of Chemistry, University College London, London WC1H 0AJ, U.K.
| | - Peter V. Coveney
- Centre
for Computational Science, Department of Chemistry, University College London, London WC1H 0AJ, U.K.
- Advanced
Research Computing Centre, University College
London, London WC1H 0AJ, U.K.
- Institute
for Informatics, Faculty of Science, University
of Amsterdam, Amsterdam 1098XH, The Netherlands
| |
Collapse
|
10
|
Yang Y, Chen G, Li J, Li J, Zhang O, Zhang X, Li L, Hao J, Wang E, Heng PA. Enabling target-aware molecule generation to follow multi objectives with Pareto MCTS. Commun Biol 2024; 7:1074. [PMID: 39223327 PMCID: PMC11368924 DOI: 10.1038/s42003-024-06746-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2024] [Accepted: 08/16/2024] [Indexed: 09/04/2024] Open
Abstract
Target-aware drug discovery has greatly accelerated the drug discovery process to design small-molecule ligands with high binding affinity to disease-related protein targets. Conditioned on targeted proteins, previous works utilize various kinds of deep generative models and have shown great potential in generating molecules with strong protein-ligand binding interactions. However, beyond binding affinity, effective drug molecules must manifest other essential properties such as high drug-likeness, which are not explicitly addressed by current target-aware generative methods. In this article, aiming to bridge the gap of multi-objective target-aware molecule generation in the field of deep learning-based drug discovery, we propose ParetoDrug, a Pareto Monte Carlo Tree Search (MCTS) generation algorithm. ParetoDrug searches molecules on the Pareto Front in chemical space using MCTS to enable synchronous optimization of multiple properties. Specifically, ParetoDrug utilizes pretrained atom-by-atom autoregressive generative models for the exploration guidance to desired molecules during MCTS searching. Besides, when selecting the next atom symbol, a scheme named ParetoPUCT is proposed to balance exploration and exploitation. Benchmark experiments and case studies demonstrate that ParetoDrug is highly effective in traversing the large and complex chemical space to discover novel compounds with satisfactory binding affinities and drug-like properties for various multi-objective target-aware drug discovery tasks.
Collapse
Affiliation(s)
- Yaodong Yang
- Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong, China
| | | | - Jinpeng Li
- Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong, China
| | | | | | | | | | - Jianye Hao
- Noah's Ark Lab, Huawei, Shenzhen, China.
| | | | - Pheng-Ann Heng
- Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong, China
| |
Collapse
|
11
|
Proszewska M, Wolczyk M, Zieba M, Wielopolski P, Maziarka L, Smieja M. Multi-Label Conditional Generation From Pre-Trained Models. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2024; 46:6185-6198. [PMID: 38530738 DOI: 10.1109/tpami.2024.3382008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/28/2024]
Abstract
Although modern generative models achieve excellent quality in a variety of tasks, they often lack the essential ability to generate examples with requested properties, such as the age of the person in the photo or the weight of the generated molecule. To overcome these limitations we propose PluGeN (Plugin Generative Network), a simple yet effective generative technique that can be used as a plugin for pre-trained generative models. The idea behind our approach is to transform the entangled latent representation using a flow-based module into a multi-dimensional space where the values of each attribute are modeled as an independent one-dimensional distribution. In consequence, PluGeN can generate new samples with desired attributes as well as manipulate labeled attributes of existing examples. Due to the disentangling of the latent representation, we are even able to generate samples with rare or unseen combinations of attributes in the dataset, such as a young person with gray hair, men with make-up, or women with beards. In contrast to competitive approaches, PluGeN can be trained on partially labeled data. We combined PluGeN with GAN and VAE models and applied it to conditional generation and manipulation of images, chemical molecule modeling and 3D point clouds generation.
Collapse
|
12
|
Singh S, Kaur N, Gehlot A. Application of artificial intelligence in drug design: A review. Comput Biol Med 2024; 179:108810. [PMID: 38991316 DOI: 10.1016/j.compbiomed.2024.108810] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2024] [Revised: 05/31/2024] [Accepted: 06/24/2024] [Indexed: 07/13/2024]
Abstract
Artificial intelligence (AI) is a field of computer science that involves acquiring information, developing rule bases, and mimicking human behaviour. The fundamental concept behind AI is to create intelligent computer systems that can operate with minimal human intervention or without any intervention at all. These rule-based systems are developed using various machine learning and deep learning models, enabling them to solve complex problems. AI is integrated with these models to learn, understand, and analyse provided data. The rapid advancement of Artificial Intelligence (AI) is reshaping numerous industries, with the pharmaceutical sector experiencing a notable transformation. AI is increasingly being employed to automate, optimize, and personalize various facets of the pharmaceutical industry, particularly in pharmacological research. Traditional drug development methods areknown for being time-consuming, expensive, and less efficient, often taking around a decade and costing billions of dollars. The integration of artificial intelligence (AI) techniques addresses these challenges by enabling the examination of compounds with desired properties from a vast pool of input drugs. Furthermore, it plays a crucial role in drug screening by predicting toxicity, bioactivity, ADME properties (absorption, distribution, metabolism, and excretion), physicochemical properties, and more. AI enhances the drug design process by improving the efficiency and accuracy of predicting drug behaviour, interactions, and properties. These approaches further significantly improve the precision of drug discovery processes and decrease clinical trial costs leading to the development of more effective drugs.
Collapse
Affiliation(s)
- Simrandeep Singh
- Department of Electronics & Communication Engineering, UCRD, Chandigarh University, Gharuan, Punjab, India.
| | - Navjot Kaur
- Department of Pharmacognosy, Amar Shaheed Baba Ajit Singh Jujhar Singh Memorial College of Pharmacy, Bela, Ropar, India
| | - Anita Gehlot
- Uttaranchal Institute of technology, Uttaranchal University, Dehradun, India
| |
Collapse
|
13
|
Lavecchia A. Navigating the frontier of drug-like chemical space with cutting-edge generative AI models. Drug Discov Today 2024; 29:104133. [PMID: 39103144 DOI: 10.1016/j.drudis.2024.104133] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2024] [Revised: 07/20/2024] [Accepted: 07/31/2024] [Indexed: 08/07/2024]
Abstract
Deep generative models (GMs) have transformed the exploration of drug-like chemical space (CS) by generating novel molecules through complex, nontransparent processes, bypassing direct structural similarity. This review examines five key architectures for CS exploration: recurrent neural networks (RNNs), variational autoencoders (VAEs), generative adversarial networks (GANs), normalizing flows (NF), and Transformers. It discusses molecular representation choices, training strategies for focused CS exploration, evaluation criteria for CS coverage, and related challenges. Future directions include refining models, exploring new notations, improving benchmarks, and enhancing interpretability to better understand biologically relevant molecular properties.
Collapse
Affiliation(s)
- Antonio Lavecchia
- 'Drug Discovery' Laboratory, Department of Pharmacy, University of Naples Federico II, I-80131 Naples, Italy.
| |
Collapse
|
14
|
Mariani R, De Vuono MC, Businaro E, Ivaldi S, Dell'Armi T, Gallo M, Ardigò D. P.O.L.A.R. Star: A New Framework Developed and Applied by One Mid-Sized Pharmaceutical Company to Drive Digital Transformation in R&D. Pharmaceut Med 2024; 38:343-353. [PMID: 39120788 PMCID: PMC11473631 DOI: 10.1007/s40290-024-00533-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/25/2024] [Indexed: 08/10/2024]
Abstract
Digital transformation has become a cornerstone of innovation in pharmaceutical research and development (R&D). Pharmaceutical companies now have an imperative to embrace transformation, including mid-sized and small-sized companies despite resource limitations that do not allow economies of scale compared with larger organizations. This article describes the journey undertaken by Chiesi to develop an efficient framework to drive digital transformation along its R&D value chain with the objective of building and refreshing a clear roadmap and relevant priorities, together with identifying and enabling new digital capabilities and skills within R&D, defining tools and processes that will guide Chiesi activities in the space up to mid-long term. This work has led so far to five main achievements, which align with the steps in the framework: a strategically aligned roadmap with key focus areas for digital transformation and a dedicated team to lead the effort; a common language for data across the R&D value chain; an internal mindset that's open to innovation and participation in key external networks and consortia; a set of quick-win use cases for the new framework; and a defined set of Key Performance Indicators (KPIs) and monitoring tools for digital transformation. The work presented here demonstrates that R&D digital transformation should represent an ongoing process to enable cross-functional collaboration and integration within complex corporate environments that face an ever-growing volume of diverse data, to efficiently support business needs, and to ensure a positive impact on patient care.
Collapse
Affiliation(s)
- Riccardo Mariani
- Chiesi Farmaceutici Spa, Largo Francesco Belloli 11/A, 43122, Parma, Italy.
| | | | - Elena Businaro
- Chiesi Farmaceutici Spa, Largo Francesco Belloli 11/A, 43122, Parma, Italy
| | - Silvia Ivaldi
- Chiesi Farmaceutici Spa, Largo Francesco Belloli 11/A, 43122, Parma, Italy
| | | | | | - Diego Ardigò
- Chiesi Farmaceutici Spa, Largo Francesco Belloli 11/A, 43122, Parma, Italy
| |
Collapse
|
15
|
Tom G, Schmid SP, Baird SG, Cao Y, Darvish K, Hao H, Lo S, Pablo-García S, Rajaonson EM, Skreta M, Yoshikawa N, Corapi S, Akkoc GD, Strieth-Kalthoff F, Seifrid M, Aspuru-Guzik A. Self-Driving Laboratories for Chemistry and Materials Science. Chem Rev 2024; 124:9633-9732. [PMID: 39137296 PMCID: PMC11363023 DOI: 10.1021/acs.chemrev.4c00055] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/15/2024]
Abstract
Self-driving laboratories (SDLs) promise an accelerated application of the scientific method. Through the automation of experimental workflows, along with autonomous experimental planning, SDLs hold the potential to greatly accelerate research in chemistry and materials discovery. This review provides an in-depth analysis of the state-of-the-art in SDL technology, its applications across various scientific disciplines, and the potential implications for research and industry. This review additionally provides an overview of the enabling technologies for SDLs, including their hardware, software, and integration with laboratory infrastructure. Most importantly, this review explores the diverse range of scientific domains where SDLs have made significant contributions, from drug discovery and materials science to genomics and chemistry. We provide a comprehensive review of existing real-world examples of SDLs, their different levels of automation, and the challenges and limitations associated with each domain.
Collapse
Affiliation(s)
- Gary Tom
- Department
of Chemistry, University of Toronto, 80 St. George St, Toronto, Ontario M5S 3H6, Canada
- Department
of Computer Science, University of Toronto, 40 St. George St, Toronto, Ontario M5S 2E4, Canada
- Vector Institute
for Artificial Intelligence, 661 University Ave Suite 710, Toronto, Ontario M5G 1M1, Canada
| | - Stefan P. Schmid
- Department
of Chemistry and Applied Biosciences, ETH
Zurich, Vladimir-Prelog-Weg 1, CH-8093 Zurich, Switzerland
| | - Sterling G. Baird
- Acceleration
Consortium, 80 St. George
St, Toronto, Ontario M5S 3H6, Canada
| | - Yang Cao
- Department
of Chemistry, University of Toronto, 80 St. George St, Toronto, Ontario M5S 3H6, Canada
- Department
of Computer Science, University of Toronto, 40 St. George St, Toronto, Ontario M5S 2E4, Canada
- Acceleration
Consortium, 80 St. George
St, Toronto, Ontario M5S 3H6, Canada
| | - Kourosh Darvish
- Department
of Computer Science, University of Toronto, 40 St. George St, Toronto, Ontario M5S 2E4, Canada
- Vector Institute
for Artificial Intelligence, 661 University Ave Suite 710, Toronto, Ontario M5G 1M1, Canada
- Acceleration
Consortium, 80 St. George
St, Toronto, Ontario M5S 3H6, Canada
| | - Han Hao
- Department
of Chemistry, University of Toronto, 80 St. George St, Toronto, Ontario M5S 3H6, Canada
- Department
of Computer Science, University of Toronto, 40 St. George St, Toronto, Ontario M5S 2E4, Canada
- Acceleration
Consortium, 80 St. George
St, Toronto, Ontario M5S 3H6, Canada
| | - Stanley Lo
- Department
of Chemistry, University of Toronto, 80 St. George St, Toronto, Ontario M5S 3H6, Canada
| | - Sergio Pablo-García
- Department
of Chemistry, University of Toronto, 80 St. George St, Toronto, Ontario M5S 3H6, Canada
- Department
of Computer Science, University of Toronto, 40 St. George St, Toronto, Ontario M5S 2E4, Canada
| | - Ella M. Rajaonson
- Department
of Chemistry, University of Toronto, 80 St. George St, Toronto, Ontario M5S 3H6, Canada
- Vector Institute
for Artificial Intelligence, 661 University Ave Suite 710, Toronto, Ontario M5G 1M1, Canada
| | - Marta Skreta
- Department
of Computer Science, University of Toronto, 40 St. George St, Toronto, Ontario M5S 2E4, Canada
- Vector Institute
for Artificial Intelligence, 661 University Ave Suite 710, Toronto, Ontario M5G 1M1, Canada
| | - Naruki Yoshikawa
- Department
of Computer Science, University of Toronto, 40 St. George St, Toronto, Ontario M5S 2E4, Canada
- Vector Institute
for Artificial Intelligence, 661 University Ave Suite 710, Toronto, Ontario M5G 1M1, Canada
| | - Samantha Corapi
- Department
of Chemistry, University of Toronto, 80 St. George St, Toronto, Ontario M5S 3H6, Canada
| | - Gun Deniz Akkoc
- Forschungszentrum
Jülich GmbH, Helmholtz Institute
for Renewable Energy Erlangen-Nürnberg, Cauerstr. 1, 91058 Erlangen, Germany
- Department
of Chemical and Biological Engineering, Friedrich-Alexander Universität Erlangen-Nürnberg, Egerlandstr. 3, 91058 Erlangen, Germany
| | - Felix Strieth-Kalthoff
- Department
of Chemistry, University of Toronto, 80 St. George St, Toronto, Ontario M5S 3H6, Canada
- Department
of Computer Science, University of Toronto, 40 St. George St, Toronto, Ontario M5S 2E4, Canada
- School of
Mathematics and Natural Sciences, University
of Wuppertal, Gaußstraße
20, 42119 Wuppertal, Germany
| | - Martin Seifrid
- Department
of Chemistry, University of Toronto, 80 St. George St, Toronto, Ontario M5S 3H6, Canada
- Department
of Computer Science, University of Toronto, 40 St. George St, Toronto, Ontario M5S 2E4, Canada
- Department
of Materials Science and Engineering, North
Carolina State University, Raleigh, North Carolina 27695, United States of America
| | - Alán Aspuru-Guzik
- Department
of Chemistry, University of Toronto, 80 St. George St, Toronto, Ontario M5S 3H6, Canada
- Department
of Computer Science, University of Toronto, 40 St. George St, Toronto, Ontario M5S 2E4, Canada
- Vector Institute
for Artificial Intelligence, 661 University Ave Suite 710, Toronto, Ontario M5G 1M1, Canada
- Acceleration
Consortium, 80 St. George
St, Toronto, Ontario M5S 3H6, Canada
- Department
of Chemical Engineering & Applied Chemistry, University of Toronto, Toronto, Ontario M5S 3E5, Canada
- Department
of Materials Science & Engineering, University of Toronto, Toronto, Ontario M5S 3E4, Canada
- Lebovic
Fellow, Canadian Institute for Advanced
Research (CIFAR), 661
University Ave, Toronto, Ontario M5G 1M1, Canada
| |
Collapse
|
16
|
Sultan A, Sieg J, Mathea M, Volkamer A. Transformers for Molecular Property Prediction: Lessons Learned from the Past Five Years. J Chem Inf Model 2024; 64:6259-6280. [PMID: 39136669 DOI: 10.1021/acs.jcim.4c00747] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/27/2024]
Abstract
Molecular Property Prediction (MPP) is vital for drug discovery, crop protection, and environmental science. Over the last decades, diverse computational techniques have been developed, from using simple physical and chemical properties and molecular fingerprints in statistical models and classical machine learning to advanced deep learning approaches. In this review, we aim to distill insights from current research on employing transformer models for MPP. We analyze the currently available models and explore key questions that arise when training and fine-tuning a transformer model for MPP. These questions encompass the choice and scale of the pretraining data, optimal architecture selections, and promising pretraining objectives. Our analysis highlights areas not yet covered in current research, inviting further exploration to enhance the field's understanding. Additionally, we address the challenges in comparing different models, emphasizing the need for standardized data splitting and robust statistical analysis.
Collapse
Affiliation(s)
- Afnan Sultan
- Data Driven Drug Design, Center for Bioinformatics, Saarland University, Saarbrücken 66123, Germany
| | | | | | - Andrea Volkamer
- Data Driven Drug Design, Center for Bioinformatics, Saarland University, Saarbrücken 66123, Germany
| |
Collapse
|
17
|
Tibo A, He J, Janet JP, Nittinger E, Engkvist O. Exhaustive local chemical space exploration using a transformer model. Nat Commun 2024; 15:7315. [PMID: 39183239 PMCID: PMC11345417 DOI: 10.1038/s41467-024-51672-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2023] [Accepted: 08/12/2024] [Indexed: 08/27/2024] Open
Abstract
How many near-neighbors does a molecule have? This fundamental question in chemistry is crucial for molecular optimization problems under the similarity principle assumption. Generative models can sample molecules from a vast chemical space but lack explicit knowledge about molecular similarity. Therefore, these models need guidance from reinforcement learning to sample a relevant similar chemical space. However, they still miss a mechanism to measure the coverage of a specific region of the chemical space. To overcome these limitations, a source-target molecular transformer model, regularized via a similarity kernel function, is proposed. Trained on a largest dataset of ≥200 billion molecular pairs, the model enforces a direct relationship between generating a target molecule and its similarity to a source molecule. Results indicate that the regularization term significantly improves the correlation between generation probability and molecular similarity, enabling exhaustive exploration of molecule near-neighborhoods.
Collapse
Affiliation(s)
- Alessandro Tibo
- Molecular AI, Discovery Sciences, R&D, AstraZeneca, Gothenburg, Sweden.
| | - Jiazhen He
- Molecular AI, Discovery Sciences, R&D, AstraZeneca, Gothenburg, Sweden
| | - Jon Paul Janet
- Molecular AI, Discovery Sciences, R&D, AstraZeneca, Gothenburg, Sweden
| | - Eva Nittinger
- Medicinal Chemistry, Research and Early Development, Respiratory and Immunology (R&I), BioPharmaceuticals R&D AstraZeneca, Gothenburg, Sweden
| | - Ola Engkvist
- Molecular AI, Discovery Sciences, R&D, AstraZeneca, Gothenburg, Sweden
- Data Science and AI, Computer Science and Engineering, Chalmers, Gothenburg, Sweden
| |
Collapse
|
18
|
Suriyaamporn P, Pamornpathomkul B, Patrojanasophon P, Ngawhirunpat T, Rojanarata T, Opanasopit P. The Artificial Intelligence-Powered New Era in Pharmaceutical Research and Development: A Review. AAPS PharmSciTech 2024; 25:188. [PMID: 39147952 DOI: 10.1208/s12249-024-02901-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2024] [Accepted: 07/22/2024] [Indexed: 08/17/2024] Open
Abstract
Currently, artificial intelligence (AI), machine learning (ML), and deep learning (DL) are gaining increased interest in many fields, particularly in pharmaceutical research and development, where they assist in decision-making in complex situations. Numerous research studies and advancements have demonstrated how these computational technologies are used in various pharmaceutical research and development aspects, including drug discovery, personalized medicine, drug formulation, optimization, predictions, drug interactions, pharmacokinetics/ pharmacodynamics, quality control/quality assurance, and manufacturing processes. Using advanced modeling techniques, these computational technologies can enhance efficiency and accuracy, handle complex data, and facilitate novel discoveries within minutes. Furthermore, these technologies offer several advantages over conventional statistics. They allow for pattern recognition from complex datasets, and the models, typically developed from data-driven algorithms, can predict a given outcome (model output) from a set of features (model inputs). Additionally, this review discusses emerging trends and provides perspectives on the application of AI with quality by design (QbD) and the future role of AI in this field. Ethical and regulatory considerations associated with integrating AI into pharmaceutical technology were also examined. This review aims to offer insights to researchers, professionals, and others on the current state of AI applications in pharmaceutical research and development and their potential role in the future of research and the era of pharmaceutical Industry 4.0 and 5.0.
Collapse
Affiliation(s)
- Phuvamin Suriyaamporn
- Pharmaceutical Development of Green Innovations Group (PDGIG), Department of Industrial Pharmacy, Faculty of Pharmacy, Silpakorn University, Nakhon Pathom, Thailand
| | - Boonnada Pamornpathomkul
- Pharmaceutical Development of Green Innovations Group (PDGIG), Department of Industrial Pharmacy, Faculty of Pharmacy, Silpakorn University, Nakhon Pathom, Thailand
| | - Prasopchai Patrojanasophon
- Pharmaceutical Development of Green Innovations Group (PDGIG), Department of Industrial Pharmacy, Faculty of Pharmacy, Silpakorn University, Nakhon Pathom, Thailand
| | - Tanasait Ngawhirunpat
- Pharmaceutical Development of Green Innovations Group (PDGIG), Department of Industrial Pharmacy, Faculty of Pharmacy, Silpakorn University, Nakhon Pathom, Thailand
| | - Theerasak Rojanarata
- Pharmaceutical Development of Green Innovations Group (PDGIG), Department of Industrial Pharmacy, Faculty of Pharmacy, Silpakorn University, Nakhon Pathom, Thailand
| | - Praneet Opanasopit
- Pharmaceutical Development of Green Innovations Group (PDGIG), Department of Industrial Pharmacy, Faculty of Pharmacy, Silpakorn University, Nakhon Pathom, Thailand.
| |
Collapse
|
19
|
Renz P, Luukkonen S, Klambauer G. Diverse Hits in De Novo Molecule Design: Diversity-Based Comparison of Goal-Directed Generators. J Chem Inf Model 2024; 64:5756-5761. [PMID: 39029090 PMCID: PMC11323242 DOI: 10.1021/acs.jcim.4c00519] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2024] [Revised: 07/10/2024] [Accepted: 07/11/2024] [Indexed: 07/21/2024]
Abstract
Since the rise of generative AI models, many goal-directed molecule generators have been proposed as tools for discovering novel drug candidates. However, molecule generators often produce highly similar molecules and tend to overemphasize conformity to an imperfect scoring function rather than capturing the true underlying properties sought. We rectify these two shortcomings by offering diversity-based evaluations using the #Circles metric and considering constraints on scoring function calls or computation time. Our findings highlight the superior performance of SMILES-based autoregressive models in generating diverse sets of desired molecules compared to graph-based models or genetic algorithms.
Collapse
Affiliation(s)
- Philipp Renz
- Johannes Kepler University Linz, Altenbergerstraße 69, Linz, AT 4040, Austria
| | - Sohvi Luukkonen
- Johannes Kepler University Linz, ELLIS Unit Linz, LIT AI Lab, Institute for Machine Learning, Altenbergerstraße 69, Linz, AT 4040, Austria
| | - Günter Klambauer
- Johannes Kepler University Linz, ELLIS Unit Linz, LIT AI Lab, Institute for Machine Learning, Altenbergerstraße 69, Linz, AT 4040, Austria
| |
Collapse
|
20
|
Matsukiyo Y, Tengeiji A, Li C, Yamanishi Y. Transcriptionally Conditional Recurrent Neural Network for De Novo Drug Design. J Chem Inf Model 2024; 64:5844-5852. [PMID: 39049516 DOI: 10.1021/acs.jcim.4c00531] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/27/2024]
Abstract
Computational molecular generation methods that generate chemical structures from gene expression profiles have been actively developed for de novo drug design. However, most omics-based methods involve complex models consisting of multiple neural networks, which require pretraining. In this study, we propose a straightforward molecular generation method called GxRNN (gene expression profile-based recurrent neural network), employing a single recurrent neural network (RNN) that necessitates no pretraining for omics-based drug design. Specifically, our method utilizes the desired gene expression profile as input for the RNN, conditioning it to generate molecules likely to induce a similar profile. In a case study involving ten target proteins, GxRNN exhibited superior structural reproducibility of known ligands, surpassing several existing methods. This advancement positions our proposed method as a promising tool for facilitating de novo drug design.
Collapse
Affiliation(s)
- Yuki Matsukiyo
- Department of Bioscience and Bioinformatics, Faculty of Computer Science and Systems Engineering, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka, Fukuoka 820-8502, Japan
- Department of Complex Systems Science, Graduate School of Informatics, Nagoya University, Chikusa, Nagoya, Aichi 464-8601, Japan
| | - Atsushi Tengeiji
- Modality Research Laboratories I, Daiichi Sankyo Co., Ltd., 1-2-58 Hiromachi, Shinagawa, Tokyo 140-8710, Japan
| | - Chen Li
- Department of Complex Systems Science, Graduate School of Informatics, Nagoya University, Chikusa, Nagoya, Aichi 464-8601, Japan
| | - Yoshihiro Yamanishi
- Department of Bioscience and Bioinformatics, Faculty of Computer Science and Systems Engineering, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka, Fukuoka 820-8502, Japan
- Department of Complex Systems Science, Graduate School of Informatics, Nagoya University, Chikusa, Nagoya, Aichi 464-8601, Japan
| |
Collapse
|
21
|
Bou A, Thomas M, Dittert S, Navarro C, Majewski M, Wang Y, Patel S, Tresadern G, Ahmad M, Moens V, Sherman W, Sciabola S, De Fabritiis G. ACEGEN: Reinforcement Learning of Generative Chemical Agents for Drug Discovery. J Chem Inf Model 2024; 64:5900-5911. [PMID: 39092857 DOI: 10.1021/acs.jcim.4c00895] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/04/2024]
Abstract
In recent years, reinforcement learning (RL) has emerged as a valuable tool in drug design, offering the potential to propose and optimize molecules with desired properties. However, striking a balance between capabilities, flexibility, reliability, and efficiency remains challenging due to the complexity of advanced RL algorithms and the significant reliance on specialized code. In this work, we introduce ACEGEN, a comprehensive and streamlined toolkit tailored for generative drug design, built using TorchRL, a modern RL library that offers thoroughly tested reusable components. We validate ACEGEN by benchmarking against other published generative modeling algorithms and show comparable or improved performance. We also show examples of ACEGEN applied in multiple drug discovery case studies. ACEGEN is accessible at https://github.com/acellera/acegen-open and available for use under the MIT license.
Collapse
Affiliation(s)
- Albert Bou
- Computational Science Laboratory, Universitat Pompeu Fabra, Barcelona Biomedical Research Park (PRBB), C Dr. Aiguader 88, 08003 Barcelona, Spain
- Acellera Labs, C Dr. Trueta 183, 08005, Barcelona, Spain
| | - Morgan Thomas
- Computational Science Laboratory, Universitat Pompeu Fabra, Barcelona Biomedical Research Park (PRBB), C Dr. Aiguader 88, 08003 Barcelona, Spain
| | - Sebastian Dittert
- Computational Science Laboratory, Universitat Pompeu Fabra, Barcelona Biomedical Research Park (PRBB), C Dr. Aiguader 88, 08003 Barcelona, Spain
| | - Carles Navarro
- Acellera Labs, C Dr. Trueta 183, 08005, Barcelona, Spain
| | | | - Ye Wang
- Biogen Research and Development, 225 Binney Street, Cambridge, Massachusetts 02142, United States
| | - Shivam Patel
- Psivant Therapeutics, 451 D Street, Boston, Massachusetts 02210, United States
| | - Gary Tresadern
- In Silico Discovery, Janssen Research & Development, Janssen Pharmaceutica N. V., Turnhoutseweg 30, B-2340 Beerse, Belgium
| | - Mazen Ahmad
- In Silico Discovery, Janssen Research & Development, Janssen Pharmaceutica N. V., Turnhoutseweg 30, B-2340 Beerse, Belgium
| | - Vincent Moens
- PyTorch Team, Meta, 11-21 Canal Reach, London, N1C 4DB, United Kingdom
| | - Woody Sherman
- Psivant Therapeutics, 451 D Street, Boston, Massachusetts 02210, United States
| | - Simone Sciabola
- Biogen Research and Development, 225 Binney Street, Cambridge, Massachusetts 02142, United States
| | - Gianni De Fabritiis
- Computational Science Laboratory, Universitat Pompeu Fabra, Barcelona Biomedical Research Park (PRBB), C Dr. Aiguader 88, 08003 Barcelona, Spain
- Acellera Labs, C Dr. Trueta 183, 08005, Barcelona, Spain
- Institució Catalana de Recerca i Estudis Avançats (ICREA), Passeig Lluis Companys 23, 08010 Barcelona, Spain
| |
Collapse
|
22
|
He J, Tibo A, Janet JP, Nittinger E, Tyrchan C, Czechtizky W, Engkvist O. Evaluation of reinforcement learning in transformer-based molecular design. J Cheminform 2024; 16:95. [PMID: 39118113 PMCID: PMC11312936 DOI: 10.1186/s13321-024-00887-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2024] [Accepted: 07/21/2024] [Indexed: 08/10/2024] Open
Abstract
Designing compounds with a range of desirable properties is a fundamental challenge in drug discovery. In pre-clinical early drug discovery, novel compounds are often designed based on an already existing promising starting compound through structural modifications for further property optimization. Recently, transformer-based deep learning models have been explored for the task of molecular optimization by training on pairs of similar molecules. This provides a starting point for generating similar molecules to a given input molecule, but has limited flexibility regarding user-defined property profiles. Here, we evaluate the effect of reinforcement learning on transformer-based molecular generative models. The generative model can be considered as a pre-trained model with knowledge of the chemical space close to an input compound, while reinforcement learning can be viewed as a tuning phase, steering the model towards chemical space with user-specific desirable properties. The evaluation of two distinct tasks-molecular optimization and scaffold discovery-suggest that reinforcement learning could guide the transformer-based generative model towards the generation of more compounds of interest. Additionally, the impact of pre-trained models, learning steps and learning rates are investigated.Scientific contributionOur study investigates the effect of reinforcement learning on a transformer-based generative model initially trained for generating molecules similar to starting molecules. The reinforcement learning framework is applied to facilitate multiparameter optimisation of starting molecules. This approach allows for more flexibility for optimizing user-specific property profiles and helps finding more ideas of interest.
Collapse
Affiliation(s)
- Jiazhen He
- Molecular AI, Discovery Sciences, R&D, AstraZeneca, Gothenburg, Sweden.
| | - Alessandro Tibo
- Molecular AI, Discovery Sciences, R&D, AstraZeneca, Gothenburg, Sweden
| | - Jon Paul Janet
- Molecular AI, Discovery Sciences, R&D, AstraZeneca, Gothenburg, Sweden
| | - Eva Nittinger
- Medicinal Chemistry, Research and Early Development, Respiratory and Immunology (R&I), BioPharmaceuticals R&D, AstraZeneca, Gothenburg, Sweden
| | - Christian Tyrchan
- Medicinal Chemistry, Research and Early Development, Respiratory and Immunology (R&I), BioPharmaceuticals R&D, AstraZeneca, Gothenburg, Sweden
| | - Werngard Czechtizky
- Medicinal Chemistry, Research and Early Development, Respiratory and Immunology (R&I), BioPharmaceuticals R&D, AstraZeneca, Gothenburg, Sweden
| | - Ola Engkvist
- Molecular AI, Discovery Sciences, R&D, AstraZeneca, Gothenburg, Sweden
- Department of Computer Science and Engineering, Chalmers University of Technology, Gothenburg, Sweden
| |
Collapse
|
23
|
Hu X, Liu G, Yao Q, Zhao Y, Zhang H. Hamiltonian diversity: effectively measuring molecular diversity by shortest Hamiltonian circuits. J Cheminform 2024; 16:94. [PMID: 39113120 PMCID: PMC11308660 DOI: 10.1186/s13321-024-00883-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2024] [Accepted: 07/11/2024] [Indexed: 08/10/2024] Open
Abstract
In recent years, significant advancements have been made in molecular generation algorithms aimed at facilitating drug development, and molecular diversity holds paramount importance within the realm of molecular generation. Nonetheless, the effective quantification of molecular diversity remains an elusive challenge, as extant metrics exemplified by Richness and Internal Diversity fall short in concurrently encapsulating the two main aspects of such diversity: quantity and dissimilarity. To address this quandary, we propose Hamiltonian diversity, a novel molecular diversity metric predicated upon the shortest Hamiltonian circuit. This metric embodies both aspects of molecular diversity in principle, and we implement its calculation with high efficiency and accuracy. Furthermore, through empirical experiments we demonstrate the high consistency of Hamiltonian diversity with real-world chemical diversity, and substantiate its effects in promoting diversity of molecular generation algorithms. Our implementation of Hamiltonian diversity in Python is available at: https://github.com/HXYfighter/HamDiv .Scientific contributionWe propose a more rational molecular diversity metric for the community of cheminformatics and drug development. This metric can be applied to evaluation of existing molecular generation methods and enhancing drug design algorithms.
Collapse
Affiliation(s)
- Xiuyuan Hu
- Department of Electronic Engineering, Tsinghua University, Beijing, China
- Microsoft Research AI for Science, Beijing, China
| | - Guoqing Liu
- Microsoft Research AI for Science, Beijing, China
| | - Quanming Yao
- Department of Electronic Engineering, Tsinghua University, Beijing, China
| | - Yang Zhao
- Department of Electronic Engineering, Tsinghua University, Beijing, China
| | - Hao Zhang
- Department of Electronic Engineering, Tsinghua University, Beijing, China.
| |
Collapse
|
24
|
Duo L, Liu Y, Ren J, Tang B, Hirst JD. Artificial intelligence for small molecule anticancer drug discovery. Expert Opin Drug Discov 2024; 19:933-948. [PMID: 39074493 DOI: 10.1080/17460441.2024.2367014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2024] [Accepted: 06/07/2024] [Indexed: 07/31/2024]
Abstract
INTRODUCTION The transition from conventional cytotoxic chemotherapy to targeted cancer therapy with small-molecule anticancer drugs has enhanced treatment outcomes. This approach, which now dominates cancer treatment, has its advantages. Despite the regulatory approval of several targeted molecules for clinical use, challenges such as low response rates and drug resistance still persist. Conventional drug discovery methods are costly and time-consuming, necessitating more efficient approaches. The rise of artificial intelligence (AI) and access to large-scale datasets have revolutionized the field of small-molecule cancer drug discovery. Machine learning (ML), particularly deep learning (DL) techniques, enables the rapid identification and development of novel anticancer agents by analyzing vast amounts of genomic, proteomic, and imaging data to uncover hidden patterns and relationships. AREA COVERED In this review, the authors explore the important landmarks in the history of AI-driven drug discovery. They also highlight various applications in small-molecule cancer drug discovery, outline the challenges faced, and provide insights for future research. EXPERT OPINION The advent of big data has allowed AI to penetrate and enable innovations in almost every stage of medicine discovery, transforming the landscape of oncology research through the development of state-of-the-art algorithms and models. Despite challenges in data quality, model interpretability, and technical limitations, advancements promise breakthroughs in personalized and precision oncology, revolutionizing future cancer management.
Collapse
Affiliation(s)
- Lihui Duo
- Faculty of Science and Engineering, University of Nottingham Ningbo China, Ningbo, China
| | - Yu Liu
- Faculty of Science and Engineering, University of Nottingham Ningbo China, Ningbo, China
| | - Jianfeng Ren
- Faculty of Science and Engineering, University of Nottingham Ningbo China, Ningbo, China
| | - Bencan Tang
- Faculty of Science and Engineering, University of Nottingham Ningbo China, Ningbo, China
| | - Jonathan D Hirst
- School of Chemistry, University of Nottingham University Park, Nottingham, UK
| |
Collapse
|
25
|
Sinha K, Parwez S, Mv S, Yadav A, Siddiqi MI, Banerjee D. Machine learning and biological evaluation-based identification of a potential MMP-9 inhibitor, effective against ovarian cancer cells SKOV3. J Biomol Struct Dyn 2024; 42:6823-6841. [PMID: 37504963 DOI: 10.1080/07391102.2023.2240416] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2023] [Accepted: 07/08/2023] [Indexed: 07/29/2023]
Abstract
MMP-9, also known as gelatinase B, is a zinc-metalloproteinase family protein that plays a key role in the degradation of the extracellular matrix (ECM). The normal function of MMP-9 includes the breakdown of ECM, a process that aids in normal physiological processes such as embryonic development, angiogenesis, etc. Interruptions in these processes due to the over-expression or downregulation of MMP-9 are reported to cause some pathological conditions like neurodegenerative diseases and cancer. In the present study, an integrated approach for ML-based virtual screening of the Maybridge library was carried out and their biological activity was tested in an attempt to identify novel small molecule scaffolds that can inhibit the activity of MMP-9. The top hits were identified and selected for target-based activity against MMP-9 protein using the kit (Biovision K844). Further, MTT assay was performed in various cancer cell lines such as breast (MCF-7, MDA-MB-231), colorectal (HCT119, DL-D-1), cervical (HeLa), lung (A549) and ovarian cancer (SKOV3). Interestingly, one compound viz., RJF02215 exhibited anti-cancer activity selectively in SKOV3. Wound healing assay and colony formation assay performed on SKOV3 cell line in the presence of RJF02215 confirmed that the compound had a significant inhibitory effect on this cell line. Thus, we have identified a novel molecule that can inhibit MMP-9 activity in vitro and inhibits the proliferation of SKOV3 cells. Novel molecules based on the structure of RJF02215 may become a good value addition for the treatment of ovarian cancer by exhibiting selective MMP-9 activity.Communicated by Ramaswamy H. Sarma.
Collapse
Affiliation(s)
- Khushboo Sinha
- Cancer Biology Division, CSIR-Central Drug Research Institute, Lucknow, India
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, India
| | - Shahid Parwez
- Biochemistry and Structural Biology Division, CSIR-Central Drug Research Institute, Lucknow, India
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, India
| | - Shahana Mv
- Cancer Biology Division, CSIR-Central Drug Research Institute, Lucknow, India
| | - Ananya Yadav
- Cancer Biology Division, CSIR-Central Drug Research Institute, Lucknow, India
| | - Mohammad Imran Siddiqi
- Biochemistry and Structural Biology Division, CSIR-Central Drug Research Institute, Lucknow, India
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, India
| | - Dibyendu Banerjee
- Cancer Biology Division, CSIR-Central Drug Research Institute, Lucknow, India
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, India
| |
Collapse
|
26
|
Fallani A, Medrano Sandonas L, Tkatchenko A. Inverse mapping of quantum properties to structures for chemical space of small organic molecules. Nat Commun 2024; 15:6061. [PMID: 39025883 PMCID: PMC11258234 DOI: 10.1038/s41467-024-50401-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2023] [Accepted: 07/01/2024] [Indexed: 07/20/2024] Open
Abstract
Computer-driven molecular design combines the principles of chemistry, physics, and artificial intelligence to identify chemical compounds with tailored properties. While quantum-mechanical (QM) methods, coupled with machine learning, already offer a direct mapping from 3D molecular structures to their properties, effective methodologies for the inverse mapping in chemical space remain elusive. We address this challenge by demonstrating the possibility of parametrizing a chemical space with a finite set of QM properties. Our proof-of-concept implementation achieves an approximate property-to-structure mapping, the QIM model (which stands for "Quantum Inverse Mapping"), by forcing a variational auto-encoder with a property encoder to obtain a common internal representation for both structures and properties. After validating this mapping for small drug-like molecules, we illustrate its capabilities with an explainability study as well as by the generation of de novo molecular structures with targeted properties and transition pathways between conformational isomers. Our findings thus provide a proof-of-principle demonstration aiming to enable the inverse property-to-structure design in diverse chemical spaces.
Collapse
Affiliation(s)
- Alessio Fallani
- Department of Physics and Materials Science, University of Luxembourg, L-1511, Luxembourg City, Luxembourg.
| | - Leonardo Medrano Sandonas
- Department of Physics and Materials Science, University of Luxembourg, L-1511, Luxembourg City, Luxembourg.
- Institute for Materials Science and Max Bergmann Center of Biomaterials, TU Dresden, 01062, Dresden, Germany.
| | - Alexandre Tkatchenko
- Department of Physics and Materials Science, University of Luxembourg, L-1511, Luxembourg City, Luxembourg.
| |
Collapse
|
27
|
Xia X, Liu Y, Zheng C, Zhang X, Wu Q, Gao X, Zeng X, Su Y. Evolutionary Multiobjective Molecule Optimization in an Implicit Chemical Space. J Chem Inf Model 2024; 64:5161-5174. [PMID: 38870455 PMCID: PMC11235097 DOI: 10.1021/acs.jcim.4c00031] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2024] [Revised: 05/08/2024] [Accepted: 05/13/2024] [Indexed: 06/15/2024]
Abstract
Optimization techniques play a pivotal role in advancing drug development, serving as the foundation of numerous generative methods tailored to efficiently design optimized molecules derived from existing lead compounds. However, existing methods often encounter difficulties in generating diverse, novel, and high-property molecules that simultaneously optimize multiple drug properties. To overcome this bottleneck, we propose a multiobjective molecule optimization framework (MOMO). MOMO employs a specially designed Pareto-based multiproperty evaluation strategy at the molecular sequence level to guide the evolutionary search in an implicit chemical space. A comparative analysis of MOMO with five state-of-the-art methods across two benchmark multiproperty molecule optimization tasks reveals that MOMO markedly outperforms them in terms of diversity, novelty, and optimized properties. The practical applicability of MOMO in drug discovery has also been validated on four challenging tasks in the real-world discovery problem. These results suggest that MOMO can provide a useful tool to facilitate molecule optimization problems with multiple properties.
Collapse
Affiliation(s)
- Xin Xia
- The
Key Laboratory of Intelligent Computing and Signal Processing of Ministry
of Education, School of Artificial Intelligence, Anhui University, Hefei 230601, China
- Institute
of Artificial Intelligence, Hefei Comprehensive
National Science Center, 5089 Wangjiang West Road, Hefei 230088, AnhuiChina
| | - Yiping Liu
- College
of Computer Science and Electronic Engineering, Hunan University, Changsha 410012, China
| | - Chunhou Zheng
- The
Key Laboratory of Intelligent Computing and Signal Processing of Ministry
of Education, School of Artificial Intelligence, Anhui University, Hefei 230601, China
| | - Xingyi Zhang
- The
Key Laboratory of Intelligent Computing and Signal Processing of Ministry
of Education, School of Artificial Intelligence, Anhui University, Hefei 230601, China
| | - Qingwen Wu
- The
Key Laboratory of Intelligent Computing and Signal Processing of Ministry
of Education, School of Artificial Intelligence, Anhui University, Hefei 230601, China
| | - Xin Gao
- Computer
Science Program, Computer, Electrical and Mathematical Sciences and
Engineering Division, Computational Bioscience Research Center, King Abdullah University of Science and Technology
(KAUST), Thuwal 23955-6900, Kingdom
of Saudi Arabia
| | - Xiangxiang Zeng
- College
of Computer Science and Electronic Engineering, Hunan University, Changsha 410012, China
| | - Yansen Su
- The
Key Laboratory of Intelligent Computing and Signal Processing of Ministry
of Education, School of Artificial Intelligence, Anhui University, Hefei 230601, China
- Institute
of Artificial Intelligence, Hefei Comprehensive
National Science Center, 5089 Wangjiang West Road, Hefei 230088, AnhuiChina
| |
Collapse
|
28
|
Thomas M, Ahmad M, Tresadern G, de Fabritiis G. PromptSMILES: prompting for scaffold decoration and fragment linking in chemical language models. J Cheminform 2024; 16:77. [PMID: 38965600 PMCID: PMC11225391 DOI: 10.1186/s13321-024-00866-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2024] [Accepted: 06/04/2024] [Indexed: 07/06/2024] Open
Abstract
SMILES-based generative models are amongst the most robust and successful recent methods used to augment drug design. They are typically used for complete de novo generation, however, scaffold decoration and fragment linking applications are sometimes desirable which requires a different grammar, architecture, training dataset and therefore, re-training of a new model. In this work, we describe a simple procedure to conduct constrained molecule generation with a SMILES-based generative model to extend applicability to scaffold decoration and fragment linking by providing SMILES prompts, without the need for re-training. In combination with reinforcement learning, we show that pre-trained, decoder-only models adapt to these applications quickly and can further optimize molecule generation towards a specified objective. We compare the performance of this approach to a variety of orthogonal approaches and show that performance is comparable or better. For convenience, we provide an easy-to-use python package to facilitate model sampling which can be found on GitHub and the Python Package Index.Scientific contributionThis novel method extends an autoregressive chemical language model to scaffold decoration and fragment linking scenarios. This doesn't require re-training, the use of a bespoke grammar, or curation of a custom dataset, as commonly required by other approaches.
Collapse
Affiliation(s)
- Morgan Thomas
- Computational Science Laboratory, Universitat Pompeu Fabra, Barcelona Biomedical Research Park (PRBB), C Dr. Aguiader 88, 08003, Barcelona, Spain.
| | - Mazen Ahmad
- In Silico Discovery, Janssen Pharmaceutica N. V., Turnhoutseweg 30, 2340, Beerse, Belgium
| | - Gary Tresadern
- In Silico Discovery, Janssen Pharmaceutica N. V., Turnhoutseweg 30, 2340, Beerse, Belgium
| | - Gianni de Fabritiis
- Computational Science Laboratory, Universitat Pompeu Fabra, Barcelona Biomedical Research Park (PRBB), C Dr. Aguiader 88, 08003, Barcelona, Spain.
- Acellera Labs, C Dr. Trueta 183, 08005, Barcelona, Spain.
- Institució Catalana de Recerca i Estudis Avançats (ICREA), Passeig Lluis Companys 23, 08010, Barcelona, Spain.
| |
Collapse
|
29
|
Yang L, Guo Q, Zhang L. AI-assisted chemistry research: a comprehensive analysis of evolutionary paths and hotspots through knowledge graphs. Chem Commun (Camb) 2024; 60:6977-6987. [PMID: 38910536 DOI: 10.1039/d4cc01892c] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/25/2024]
Abstract
Artificial intelligence (AI) offers transformative potential for chemical research through its ability to optimize reactions and processes, enhance energy efficiency, and reduce waste. AI-assisted chemical research (AI + chem) has become a global hotspot. To better understand the current research status of "AI + chem", this study conducted a scientific bibliometric investigation using CiteSpace. The web of science core collection was utilized to retrieve original articles related to "AI + chem" published from 2000 to 2024. The obtained data allowed for the visualization of the knowledge background, current research status, and latest knowledge structure of "AI + chem". The "AI + chem" has entered a stage of explosive growth, and the number of papers will maintain long-term high-speed growth. This article systematically analyzes the latest progress in "AI + chem" and objectively predicts future trends, including molecular design, reaction prediction, materials design, drug design, and quantum chemistry. The outcomes of this study will provide readers with a comprehensive understanding of the overall landscape of "AI + chem".
Collapse
Affiliation(s)
- Lin Yang
- School of Intellectual Property, Dalian University of Technology, Dalian 116024, Liaoning, P. R. China
| | - Qingle Guo
- School of Intellectual Property, Dalian University of Technology, Dalian 116024, Liaoning, P. R. China
| | - Lijing Zhang
- School of Chemistry, Dalian University of Technology, Dalian 116024, Liaoning, P. R. China.
| |
Collapse
|
30
|
Nguyen ATN, Nguyen DTN, Koh HY, Toskov J, MacLean W, Xu A, Zhang D, Webb GI, May LT, Halls ML. The application of artificial intelligence to accelerate G protein-coupled receptor drug discovery. Br J Pharmacol 2024; 181:2371-2384. [PMID: 37161878 DOI: 10.1111/bph.16140] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2022] [Revised: 04/14/2023] [Accepted: 04/27/2023] [Indexed: 05/11/2023] Open
Abstract
The application of artificial intelligence (AI) approaches to drug discovery for G protein-coupled receptors (GPCRs) is a rapidly expanding area. Artificial intelligence can be used at multiple stages during the drug discovery process, from aiding our understanding of the fundamental actions of GPCRs to the discovery of new ligand-GPCR interactions or the prediction of clinical responses. Here, we provide an overview of the concepts behind artificial intelligence, including the subfields of machine learning and deep learning. We summarise the published applications of artificial intelligence to different stages of the GPCR drug discovery process. Finally, we reflect on the benefits and limitations of artificial intelligence and share our vision for the exciting potential for further development of applications to aid GPCR drug discovery. In addition to making the drug discovery process "faster, smarter and cheaper," we anticipate that the application of artificial intelligence will create exciting new opportunities for GPCR drug discovery. LINKED ARTICLES: This article is part of a themed issue Therapeutic Targeting of G Protein-Coupled Receptors: hot topics from the Australasian Society of Clinical and Experimental Pharmacologists and Toxicologists 2021 Virtual Annual Scientific Meeting. To view the other articles in this section visit http://onlinelibrary.wiley.com/doi/10.1111/bph.v181.14/issuetoc.
Collapse
Affiliation(s)
- Anh T N Nguyen
- Drug Discovery Biology Theme, Monash Institute of Pharmaceutical Sciences, Monash University, Parkville, Victoria, Australia
| | - Diep T N Nguyen
- Department of Information Technology, Faculty of Engineering and Technology, Vietnam National University, Cau Giay, Hanoi, Vietnam
| | - Huan Yee Koh
- Drug Discovery Biology Theme, Monash Institute of Pharmaceutical Sciences, Monash University, Parkville, Victoria, Australia
- Monash Data Futures Institute and Department of Data Science and Artificial Intelligence, Monash University, Clayton, Victoria, Australia
| | - Jason Toskov
- Monash DeepNeuron, Monash University, Clayton, Victoria, Australia
| | - William MacLean
- Monash DeepNeuron, Monash University, Clayton, Victoria, Australia
| | - Andrew Xu
- Monash DeepNeuron, Monash University, Clayton, Victoria, Australia
| | - Daokun Zhang
- Drug Discovery Biology Theme, Monash Institute of Pharmaceutical Sciences, Monash University, Parkville, Victoria, Australia
- Monash Data Futures Institute and Department of Data Science and Artificial Intelligence, Monash University, Clayton, Victoria, Australia
| | - Geoffrey I Webb
- Monash Data Futures Institute and Department of Data Science and Artificial Intelligence, Monash University, Clayton, Victoria, Australia
| | - Lauren T May
- Drug Discovery Biology Theme, Monash Institute of Pharmaceutical Sciences, Monash University, Parkville, Victoria, Australia
| | - Michelle L Halls
- Drug Discovery Biology Theme, Monash Institute of Pharmaceutical Sciences, Monash University, Parkville, Victoria, Australia
| |
Collapse
|
31
|
Jiang X, Lu L, Li J, Jiang J, Zhang J, Zhou S, Wen H, Cai H, Luo X, Li Z, Wang J, Ju B, Bai R. Synthetically Feasible De Novo Molecular Design of Leads Based on a Reinforcement Learning Model: AI-Assisted Discovery of an Anti-IBD Lead Targeting CXCR4. J Med Chem 2024; 67:10057-10075. [PMID: 38863440 DOI: 10.1021/acs.jmedchem.4c00184] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/13/2024]
Abstract
Artificial intelligence (AI) de novo molecular generation provides leads with novel structures for drug discovery. However, the target affinity and synthesizability of the generated molecules present critical challenges for the successful application of AI technology. Therefore, we developed an advanced reinforcement learning model to bridge the gap between the theory of de novo molecular generation and the practical aspects of drug discovery. This model utilizes chemical reaction templates and commercially available building blocks as a starting point and employs forward reaction prediction to generate molecules, while real-time docking and drug-likeness predictions are conducted to ensure synthesizability and drug-likeness. We applied this model to design active molecules targeting the inflammation-related receptor CXCR4 and successfully prepared them according to the AI-proposed synthetic routes. Several molecules exhibited potent anti-CXCR4 and anti-inflammatory activity in subsequent in vitro and in vivo assays. The top-performing compound XVI alleviated symptoms related to inflammatory bowel disease and showed reasonable pharmacokinetic properties.
Collapse
Affiliation(s)
- Xiaoying Jiang
- School of Pharmacy, Hangzhou Normal University, Hangzhou 311121, PR China
- Key Laboratory of Elemene Class Anti-Cancer Chinese Medicines; Engineering Laboratory of Development and Application of Traditional Chinese Medicines; Collaborative Innovation Center of Traditional Chinese Medicines of Zhejiang Province, Hangzhou Normal University, Hangzhou 311121, PR China
| | - Liuxin Lu
- School of Pharmacy, Hangzhou Normal University, Hangzhou 311121, PR China
- Key Laboratory of Elemene Class Anti-Cancer Chinese Medicines; Engineering Laboratory of Development and Application of Traditional Chinese Medicines; Collaborative Innovation Center of Traditional Chinese Medicines of Zhejiang Province, Hangzhou Normal University, Hangzhou 311121, PR China
| | - Junjie Li
- School of Pharmacy, Hangzhou Normal University, Hangzhou 311121, PR China
- Key Laboratory of Elemene Class Anti-Cancer Chinese Medicines; Engineering Laboratory of Development and Application of Traditional Chinese Medicines; Collaborative Innovation Center of Traditional Chinese Medicines of Zhejiang Province, Hangzhou Normal University, Hangzhou 311121, PR China
| | - Jing Jiang
- SanOmics AI Co. Ltd., Hangzhou 311103, PR China
| | - Jiapeng Zhang
- School of Pharmaceutical Science and Technology, Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Hangzhou 310024, PR China
| | - Shengbin Zhou
- School of Pharmaceutical Science and Technology, Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Hangzhou 310024, PR China
| | - Hao Wen
- School of Pharmacy, Hangzhou Normal University, Hangzhou 311121, PR China
- Key Laboratory of Elemene Class Anti-Cancer Chinese Medicines; Engineering Laboratory of Development and Application of Traditional Chinese Medicines; Collaborative Innovation Center of Traditional Chinese Medicines of Zhejiang Province, Hangzhou Normal University, Hangzhou 311121, PR China
| | - Hong Cai
- School of Pharmacy, Hangzhou Normal University, Hangzhou 311121, PR China
- Key Laboratory of Elemene Class Anti-Cancer Chinese Medicines; Engineering Laboratory of Development and Application of Traditional Chinese Medicines; Collaborative Innovation Center of Traditional Chinese Medicines of Zhejiang Province, Hangzhou Normal University, Hangzhou 311121, PR China
| | - Xinyu Luo
- School of Pharmacy, Hangzhou Normal University, Hangzhou 311121, PR China
- Key Laboratory of Elemene Class Anti-Cancer Chinese Medicines; Engineering Laboratory of Development and Application of Traditional Chinese Medicines; Collaborative Innovation Center of Traditional Chinese Medicines of Zhejiang Province, Hangzhou Normal University, Hangzhou 311121, PR China
| | - Zhen Li
- SanOmics AI Co. Ltd., Hangzhou 311103, PR China
| | - Jiahui Wang
- School of Pharmacy, Hangzhou Normal University, Hangzhou 311121, PR China
- Key Laboratory of Elemene Class Anti-Cancer Chinese Medicines; Engineering Laboratory of Development and Application of Traditional Chinese Medicines; Collaborative Innovation Center of Traditional Chinese Medicines of Zhejiang Province, Hangzhou Normal University, Hangzhou 311121, PR China
| | - Bin Ju
- SanOmics AI Co. Ltd., Hangzhou 311103, PR China
| | - Renren Bai
- School of Pharmacy, Hangzhou Normal University, Hangzhou 311121, PR China
- Key Laboratory of Elemene Class Anti-Cancer Chinese Medicines; Engineering Laboratory of Development and Application of Traditional Chinese Medicines; Collaborative Innovation Center of Traditional Chinese Medicines of Zhejiang Province, Hangzhou Normal University, Hangzhou 311121, PR China
| |
Collapse
|
32
|
Guo J, Schwaller P. Augmented Memory: Sample-Efficient Generative Molecular Design with Reinforcement Learning. JACS AU 2024; 4:2160-2172. [PMID: 38938817 PMCID: PMC11200228 DOI: 10.1021/jacsau.4c00066] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/23/2024] [Revised: 03/29/2024] [Accepted: 04/01/2024] [Indexed: 06/29/2024]
Abstract
Sample efficiency is a fundamental challenge in de novo molecular design. Ideally, molecular generative models should learn to satisfy a desired objective under minimal calls to oracles (computational property predictors). This problem becomes more apparent when using oracles that can provide increased predictive accuracy but impose significant computational cost. Consequently, designing molecules that are optimized for such oracles cannot be achieved under a practical computational budget. Molecular generative models based on simplified molecular-input line-entry system (SMILES) have shown remarkable sample efficiency when coupled with reinforcement learning, as demonstrated in the practical molecular optimization (PMO) benchmark. Here, we first show that experience replay drastically improves the performance of multiple previously proposed algorithms. Next, we propose a novel algorithm called Augmented Memory that combines data augmentation with experience replay. We show that scores obtained from oracle calls can be reused to update the model multiple times. We compare Augmented Memory to previously proposed algorithms and show significantly enhanced sample efficiency in an exploitation task, a drug discovery case study requiring both exploration and exploitation, and a materials design case study optimizing explicitly for quantum-mechanical properties. Our method achieves a new state-of-the-art in sample-efficient de novo molecular design, outperforming all of the previously reported methods. The code is available at https://github.com/schwallergroup/augmented_memory.
Collapse
Affiliation(s)
- Jeff Guo
- Laboratory
of Artificial Chemical Intelligence (LIAC), Institut des Sciences
et Ingénierie Chimiques, Ecole Polytechnique
Fédérale de Lausanne (EPFL), Lausanne 1015, Switzerland
- National
Centre of Competence in Research (NCCR) Catalysis, Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne 1015, Switzerland
| | - Philippe Schwaller
- Laboratory
of Artificial Chemical Intelligence (LIAC), Institut des Sciences
et Ingénierie Chimiques, Ecole Polytechnique
Fédérale de Lausanne (EPFL), Lausanne 1015, Switzerland
- National
Centre of Competence in Research (NCCR) Catalysis, Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne 1015, Switzerland
| |
Collapse
|
33
|
Hazemann J, Kimmerlin T, Lange R, Mac Sweeney A, Bourquin G, Ritz D, Czodrowski P. Identification of SARS-CoV-2 Mpro inhibitors through deep reinforcement learning for de novo drug design and computational chemistry approaches. RSC Med Chem 2024; 15:2146-2159. [PMID: 38911172 PMCID: PMC11187573 DOI: 10.1039/d4md00106k] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2024] [Accepted: 04/20/2024] [Indexed: 06/25/2024] Open
Abstract
Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has caused a global pandemic of coronavirus disease (COVID-19) since its emergence in December 2019. As of January 2024, there has been over 774 million reported cases and 7 million deaths worldwide. While vaccination efforts have been successful in reducing the severity of the disease and decreasing the transmission rate, the development of effective therapeutics against SARS-CoV-2 remains a critical need. The main protease (Mpro) of SARS-CoV-2 is an essential enzyme required for viral replication and has been identified as a promising target for drug development. In this study, we report the identification of novel Mpro inhibitors, using a combination of deep reinforcement learning for de novo drug design with 3D pharmacophore/shape-based alignment and privileged fragment match count scoring components followed by hit expansions and molecular docking approaches. Our experimentally validated results show that 3 novel series exhibit potent inhibitory activity against SARS-CoV-2 Mpro, with IC50 values ranging from 1.3 μM to 2.3 μM and a high degree of selectivity. These findings represent promising starting points for the development of new antiviral therapies against COVID-19.
Collapse
Affiliation(s)
- Julien Hazemann
- Physical Chemistry, Chemistry Department, Johannes Gutenberg University Duesbergweg 10-14 55128 Mainz Germany
- Drug Discovery Chemistry, Idorsia Pharmaceuticals Ltd. Hegenheimermattweg 91 4123 Allschwil Switzerland
| | - Thierry Kimmerlin
- Drug Discovery Chemistry, Idorsia Pharmaceuticals Ltd. Hegenheimermattweg 91 4123 Allschwil Switzerland
| | - Roland Lange
- Drug Discovery Chemistry, Idorsia Pharmaceuticals Ltd. Hegenheimermattweg 91 4123 Allschwil Switzerland
| | - Aengus Mac Sweeney
- Drug Discovery Chemistry, Idorsia Pharmaceuticals Ltd. Hegenheimermattweg 91 4123 Allschwil Switzerland
| | - Geoffroy Bourquin
- Drug Discovery Chemistry, Idorsia Pharmaceuticals Ltd. Hegenheimermattweg 91 4123 Allschwil Switzerland
| | - Daniel Ritz
- Drug Discovery Chemistry, Idorsia Pharmaceuticals Ltd. Hegenheimermattweg 91 4123 Allschwil Switzerland
| | - Paul Czodrowski
- Physical Chemistry, Chemistry Department, Johannes Gutenberg University Duesbergweg 10-14 55128 Mainz Germany
| |
Collapse
|
34
|
Yoo S, Kim J. Adapt-cMolGPT: A Conditional Generative Pre-Trained Transformer with Adapter-Based Fine-Tuning for Target-Specific Molecular Generation. Int J Mol Sci 2024; 25:6641. [PMID: 38928346 PMCID: PMC11203498 DOI: 10.3390/ijms25126641] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2024] [Revised: 06/09/2024] [Accepted: 06/14/2024] [Indexed: 06/28/2024] Open
Abstract
Small-molecule drug design aims to generate compounds that target specific proteins, playing a crucial role in the early stages of drug discovery. Recently, research has emerged that utilizes the GPT model, which has achieved significant success in various fields to generate molecular compounds. However, due to the persistent challenge of small datasets in the pharmaceutical field, there has been some degradation in the performance of generating target-specific compounds. To address this issue, we propose an enhanced target-specific drug generation model, Adapt-cMolGPT, which modifies molecular representation and optimizes the fine-tuning process. In particular, we introduce a new fine-tuning method that incorporates an adapter module into a pre-trained base model and alternates weight updates by sections. We evaluated the proposed model through multiple experiments and demonstrated performance improvements compared to previous models. In the experimental results, Adapt-cMolGPT generated a greater number of novel and valid compounds compared to other models, with these generated compounds exhibiting properties similar to those of real molecular data. These results indicate that our proposed method is highly effective in designing drugs targeting specific proteins.
Collapse
Affiliation(s)
- Soyoung Yoo
- Department of Artificial Intelligence, Sejong University, Seoul 05006, Republic of Korea;
| | - Junghyun Kim
- Department of Artificial Intelligence, Sejong University, Seoul 05006, Republic of Korea;
- Deep Learning Architecture Research Center, Sejong University, Seoul 05006, Republic of Korea
| |
Collapse
|
35
|
Xu X, Xu C, He W, Wei L, Li H, Zhou J, Zhang R, Wang Y, Xiong Y, Gao X. HELM-GPT: de novo macrocyclic peptide design using generative pre-trained transformer. Bioinformatics 2024; 40:btae364. [PMID: 38867692 PMCID: PMC11256930 DOI: 10.1093/bioinformatics/btae364] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2024] [Revised: 05/08/2024] [Accepted: 06/10/2024] [Indexed: 06/14/2024] Open
Abstract
MOTIVATION Macrocyclic peptides hold great promise as therapeutics targeting intracellular proteins. This stems from their remarkable ability to bind flat protein surfaces with high affinity and specificity while potentially traversing the cell membrane. Research has already explored their use in developing inhibitors for intracellular proteins, such as KRAS, a well-known driver in various cancers. However, computational approaches for de novo macrocyclic peptide design remain largely unexplored. RESULTS Here, we introduce HELM-GPT, a novel method that combines the strength of the hierarchical editing language for macromolecules (HELM) representation and generative pre-trained transformer (GPT) for de novo macrocyclic peptide design. Through reinforcement learning (RL), our experiments demonstrate that HELM-GPT has the ability to generate valid macrocyclic peptides and optimize their properties. Furthermore, we introduce a contrastive preference loss during the RL process, further enhanced the optimization performance. Finally, to co-optimize peptide permeability and KRAS binding affinity, we propose a step-by-step optimization strategy, demonstrating its effectiveness in generating molecules fulfilling both criteria. In conclusion, the HELM-GPT method can be used to identify novel macrocyclic peptides to target intracellular proteins. AVAILABILITY AND IMPLEMENTATION The code and data of HELM-GPT are freely available on GitHub (https://github.com/charlesxu90/helm-gpt).
Collapse
Affiliation(s)
- Xiaopeng Xu
- Computer Science Program, Computer, Electrical and Mathematical Science and Engineering (CEMSE), King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Makkah, Kingdom of Saudi Arabia
- Computational Bioscience Research Center, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Makkah, Kingdom of Saudi Arabia
| | - Chencheng Xu
- Computer Science Program, Computer, Electrical and Mathematical Science and Engineering (CEMSE), King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Makkah, Kingdom of Saudi Arabia
- Computational Bioscience Research Center, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Makkah, Kingdom of Saudi Arabia
| | - Wenjia He
- Computer Science Program, Computer, Electrical and Mathematical Science and Engineering (CEMSE), King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Makkah, Kingdom of Saudi Arabia
- Computational Bioscience Research Center, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Makkah, Kingdom of Saudi Arabia
| | - Lesong Wei
- Computer Science Program, Computer, Electrical and Mathematical Science and Engineering (CEMSE), King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Makkah, Kingdom of Saudi Arabia
- Computational Bioscience Research Center, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Makkah, Kingdom of Saudi Arabia
| | - Haoyang Li
- Computer Science Program, Computer, Electrical and Mathematical Science and Engineering (CEMSE), King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Makkah, Kingdom of Saudi Arabia
- Computational Bioscience Research Center, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Makkah, Kingdom of Saudi Arabia
| | - Juexiao Zhou
- Computer Science Program, Computer, Electrical and Mathematical Science and Engineering (CEMSE), King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Makkah, Kingdom of Saudi Arabia
- Computational Bioscience Research Center, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Makkah, Kingdom of Saudi Arabia
| | | | - Yu Wang
- Syneron Technology, Guangzhou 510000, China
| | | | - Xin Gao
- Computer Science Program, Computer, Electrical and Mathematical Science and Engineering (CEMSE), King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Makkah, Kingdom of Saudi Arabia
- Computational Bioscience Research Center, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Makkah, Kingdom of Saudi Arabia
| |
Collapse
|
36
|
Ai C, Yang H, Liu X, Dong R, Ding Y, Guo F. MTMol-GPT: De novo multi-target molecular generation with transformer-based generative adversarial imitation learning. PLoS Comput Biol 2024; 20:e1012229. [PMID: 38924082 PMCID: PMC11233020 DOI: 10.1371/journal.pcbi.1012229] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2023] [Revised: 07/09/2024] [Accepted: 06/03/2024] [Indexed: 06/28/2024] Open
Abstract
De novo drug design is crucial in advancing drug discovery, which aims to generate new drugs with specific pharmacological properties. Recently, deep generative models have achieved inspiring progress in generating drug-like compounds. However, the models prioritize a single target drug generation for pharmacological intervention, neglecting the complicated inherent mechanisms of diseases, and influenced by multiple factors. Consequently, developing novel multi-target drugs that simultaneously target specific targets can enhance anti-tumor efficacy and address issues related to resistance mechanisms. To address this issue and inspired by Generative Pre-trained Transformers (GPT) models, we propose an upgraded GPT model with generative adversarial imitation learning for multi-target molecular generation called MTMol-GPT. The multi-target molecular generator employs a dual discriminator model using the Inverse Reinforcement Learning (IRL) method for a concurrently multi-target molecular generation. Extensive results show that MTMol-GPT generates various valid, novel, and effective multi-target molecules for various complex diseases, demonstrating robustness and generalization capability. In addition, molecular docking and pharmacophore mapping experiments demonstrate the drug-likeness properties and effectiveness of generated molecules potentially improve neuropsychiatric interventions. Furthermore, our model's generalizability is exemplified by a case study focusing on the multi-targeted drug design for breast cancer. As a broadly applicable solution for multiple targets, MTMol-GPT provides new insight into future directions to enhance potential complex disease therapeutics by generating high-quality multi-target molecules in drug discovery.
Collapse
Affiliation(s)
- Chengwei Ai
- School of computer science and engineering, Central South University, Changsha, China
| | - Hongpeng Yang
- Department of computer science and engineering, University of South Carolina, Columbia, South Carolina, United States of America
| | - Xiaoyi Liu
- School of Chinese Materia Medica, Beijing University of Chinese Medicine, Beijing, China
- Ministry of Education, Engineering Research Center for Pharmaceutics of Chinese Materia Medica and New Drug Development, Beijing, China
| | - Ruihan Dong
- Academy for Advanced Interdisciplinary Studies, Peking University, Beijing, China
| | - Yijie Ding
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, China
| | - Fei Guo
- School of computer science and engineering, Central South University, Changsha, China
| |
Collapse
|
37
|
Gangwal A, Lavecchia A. Unleashing the power of generative AI in drug discovery. Drug Discov Today 2024; 29:103992. [PMID: 38663579 DOI: 10.1016/j.drudis.2024.103992] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2024] [Revised: 03/22/2024] [Accepted: 04/18/2024] [Indexed: 05/04/2024]
Abstract
Artificial intelligence (AI) is revolutionizing drug discovery by enhancing precision, reducing timelines and costs, and enabling AI-driven computer-aided drug design. This review focuses on recent advancements in deep generative models (DGMs) for de novo drug design, exploring diverse algorithms and their profound impact. It critically analyses the challenges that are intricately interwoven into these technologies, proposing strategies to unlock their full potential. It features case studies of both successes and failures in advancing drugs to clinical trials with AI assistance. Last, it outlines a forward-looking plan for optimizing DGMs in de novo drug design, thereby fostering faster and more cost-effective drug development.
Collapse
Affiliation(s)
- Amit Gangwal
- Department of Natural Product Chemistry, Shri Vile Parle Kelavani Mandal's Institute of Pharmacy, Dhule 424001, Maharashtra, India
| | - Antonio Lavecchia
- "Drug Discovery" Laboratory, Department of Pharmacy, University of Naples Federico II, I-80131 Naples, Italy.
| |
Collapse
|
38
|
Krishnan SR, Bung N, Srinivasan R, Roy A. Target-specific novel molecules with their recipe: Incorporating synthesizability in the design process. J Mol Graph Model 2024; 129:108734. [PMID: 38442440 DOI: 10.1016/j.jmgm.2024.108734] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2023] [Revised: 02/14/2024] [Accepted: 02/15/2024] [Indexed: 03/07/2024]
Abstract
Application of Artificial intelligence (AI) in drug discovery has led to several success stories in recent times. While traditional methods mostly relied upon screening large chemical libraries for early-stage drug-design, de novo design can help identify novel target-specific molecules by sampling from a much larger chemical space. Although this has increased the possibility of finding diverse and novel molecules from previously unexplored chemical space, this has also posed a great challenge for medicinal chemists to synthesize at least some of the de novo designed novel molecules for experimental validation. To address this challenge, in this work, we propose a novel forward synthesis-based generative AI method, which is used to explore the synthesizable chemical space. The method uses a structure-based drug design framework, where the target protein structure and a target-specific seed fragment from co-crystal structures can be the initial inputs. A random fragment from a purchasable fragment library can also be the input if a target-specific fragment is unavailable. Then a template-based forward synthesis route prediction and molecule generation is performed in parallel using the Monte Carlo Tree Search (MCTS) method where, the subsequent fragments for molecule growth can again be obtained from a purchasable fragment library. The rewards for each iteration of MCTS are computed using a drug-target affinity (DTA) model based on the docking pose of the generated reaction intermediates at the binding site of the target protein of interest. With the help of the proposed method, it is now possible to overcome one of the major obstacles posed to the AI-based drug design approaches through the ability of the method to design novel target-specific synthesizable molecules.
Collapse
Affiliation(s)
| | - Navneet Bung
- TCS Research (Life Sciences Division), Tata Consultancy Services Limited, Hyderabad, 500081, India
| | - Rajgopal Srinivasan
- TCS Research (Life Sciences Division), Tata Consultancy Services Limited, Hyderabad, 500081, India
| | - Arijit Roy
- TCS Research (Life Sciences Division), Tata Consultancy Services Limited, Hyderabad, 500081, India.
| |
Collapse
|
39
|
Alberga D, Lamanna G, Graziano G, Delre P, Lomuscio MC, Corriero N, Ligresti A, Siliqi D, Saviano M, Contino M, Stefanachi A, Mangiatordi GF. DeLA-DrugSelf: Empowering multi-objective de novo design through SELFIES molecular representation. Comput Biol Med 2024; 175:108486. [PMID: 38653065 DOI: 10.1016/j.compbiomed.2024.108486] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2024] [Revised: 04/08/2024] [Accepted: 04/15/2024] [Indexed: 04/25/2024]
Abstract
In this paper, we introduce DeLA-DrugSelf, an upgraded version of DeLA-Drug [J. Chem. Inf. Model. 62 (2022) 1411-1424], which incorporates essential advancements for automated multi-objective de novo design. Unlike its predecessor, which relies on SMILES notation for molecular representation, DeLA-DrugSelf employs a novel and robust molecular representation string named SELFIES (SELF-referencing Embedded String). The generation process in DeLA-DrugSelf not only involves substitutions to the initial string representing the starting query molecule but also incorporates insertions and deletions. This enhancement makes DeLA-DrugSelf significantly more adept at executing data-driven scaffold decoration and lead optimization strategies. Remarkably, DeLA-DrugSelf explicitly addresses the SELFIES-related collapse issue, considering only collapse-free compounds during generation. These compounds undergo a rigorous quality metrics evaluation, highlighting substantial advancements in terms of drug-likeness, uniqueness, and novelty compared to the molecules generated by the previous version of the algorithm. To evaluate the potential of DeLA-DrugSelf as a mutational operator within a genetic algorithm framework for multi-objective optimization, we employed a fitness function based on Pareto dominance. Our objectives focused on target-oriented properties aimed at optimizing known cannabinoid receptor 2 (CB2R) ligands. The results obtained indicate that DeLA-DrugSelf, available as a user-friendly web platform (https://www.ba.ic.cnr.it/softwareic/delaself/), can effectively contribute to the data-driven optimization of starting bioactive molecules based on user-defined parameters.
Collapse
Affiliation(s)
- Domenico Alberga
- CNR - Institute of Crystallography, Via Amendola 122/o, 70126, Bari, Italy
| | - Giuseppe Lamanna
- CNR - Institute of Crystallography, Via Amendola 122/o, 70126, Bari, Italy
| | - Giovanni Graziano
- Department of Pharmacy - Pharmaceutical Sciences, University of Bari "Aldo Moro", via E. Orabona, 4, I-70125, Bari, Italy
| | - Pietro Delre
- CNR - Institute of Crystallography, Via Amendola 122/o, 70126, Bari, Italy
| | | | - Nicola Corriero
- CNR - Institute of Crystallography, Via Amendola 122/o, 70126, Bari, Italy
| | - Alessia Ligresti
- CNR - Institute of Biomolecular Chemistry, Via Campi Flegrei 34, 80078, Pozzuoli, Italy
| | - Dritan Siliqi
- CNR - Institute of Crystallography, Via Amendola 122/o, 70126, Bari, Italy
| | - Michele Saviano
- CNR - Institute of Crystallography, Via Vivaldi 43, 81100, Caserta, Italy
| | - Marialessandra Contino
- Department of Pharmacy - Pharmaceutical Sciences, University of Bari "Aldo Moro", via E. Orabona, 4, I-70125, Bari, Italy
| | - Angela Stefanachi
- Department of Pharmacy - Pharmaceutical Sciences, University of Bari "Aldo Moro", via E. Orabona, 4, I-70125, Bari, Italy
| | | |
Collapse
|
40
|
Thomas M, O'Boyle NM, Bender A, De Graaf C. MolScore: a scoring, evaluation and benchmarking framework for generative models in de novo drug design. J Cheminform 2024; 16:64. [PMID: 38816825 PMCID: PMC11141043 DOI: 10.1186/s13321-024-00861-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2023] [Accepted: 05/15/2024] [Indexed: 06/01/2024] Open
Abstract
Generative models are undergoing rapid research and application to de novo drug design. To facilitate their application and evaluation, we present MolScore. MolScore already contains many drug-design-relevant scoring functions commonly used in benchmarks such as, molecular similarity, molecular docking, predictive models, synthesizability, and more. In addition, providing performance metrics to evaluate generative model performance based on the chemistry generated. With this unification of functionality, MolScore re-implements commonly used benchmarks in the field (such as GuacaMol, MOSES, and MolOpt). Moreover, new benchmarks can be created trivially. We demonstrate this by testing a chemical language model with reinforcement learning on three new tasks of increasing complexity related to the design of 5-HT2a ligands that utilise either molecular descriptors, 266 pre-trained QSAR models, or dual molecular docking. Lastly, MolScore can be integrated into an existing Python script with just three lines of code. This framework is a step towards unifying generative model application and evaluation as applied to drug design for both practitioners and researchers. The framework can be found on GitHub and downloaded directly from the Python Package Index.Scientific ContributionMolScore is an open-source platform to facilitate generative molecular design and evaluation thereof for application in drug design. This platform takes important steps towards unifying existing benchmarks, providing a platform to share new benchmarks, and improves customisation, flexibility and usability for practitioners over existing solutions.
Collapse
Affiliation(s)
- Morgan Thomas
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Cambridge, CB2 1EW, UK.
| | - Noel M O'Boyle
- Computational Chemistry, Nxera Pharma, Steinmetz Building, Granta Park, Great Abington, Cambridge, CB21 6DG, UK
| | - Andreas Bender
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Cambridge, CB2 1EW, UK
| | - Chris De Graaf
- Computational Chemistry, Nxera Pharma, Steinmetz Building, Granta Park, Great Abington, Cambridge, CB21 6DG, UK
| |
Collapse
|
41
|
Kim H, Choi H, Kang D, Lee WB, Na J. Materials discovery with extreme properties via reinforcement learning-guided combinatorial chemistry. Chem Sci 2024; 15:7908-7925. [PMID: 38817562 PMCID: PMC11134411 DOI: 10.1039/d3sc05281h] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2023] [Accepted: 04/23/2024] [Indexed: 06/01/2024] Open
Abstract
The goal of most materials discovery is to discover materials that are superior to those currently known. Fundamentally, this is close to extrapolation, which is a weak point for most machine learning models that learn the probability distribution of data. Herein, we develop reinforcement learning-guided combinatorial chemistry, which is a rule-based molecular designer driven by trained policy for selecting subsequent molecular fragments to get a target molecule. Since our model has the potential to generate all possible molecular structures that can be obtained from combinations of molecular fragments, unknown molecules with superior properties can be discovered. We theoretically and empirically demonstrate that our model is more suitable for discovering better compounds than probability distribution-learning models. In an experiment aimed at discovering molecules that hit seven extreme target properties, our model discovered 1315 of all target-hitting molecules and 7629 of five target-hitting molecules out of 100 000 trials, whereas the probability distribution-learning models failed. Moreover, it has been confirmed that every molecule generated under the binding rules of molecular fragments is 100% chemically valid. To illustrate the performance in actual problems, we also demonstrate that our models work well on two practical applications: discovering protein docking molecules and HIV inhibitors.
Collapse
Affiliation(s)
- Hyunseung Kim
- School of Chemical and Biological Engineering, Seoul National University Republic of Korea
| | - Haeyeon Choi
- Department of Chemical Engineering and Materials Science, Ewha Womans University Republic of Korea
- Graduate Program in System Health Science and Engineering, Ewha Womans University Republic of Korea
| | - Dongju Kang
- School of Chemical and Biological Engineering, Seoul National University Republic of Korea
| | - Won Bo Lee
- School of Chemical and Biological Engineering, Seoul National University Republic of Korea
| | - Jonggeol Na
- Department of Chemical Engineering and Materials Science, Ewha Womans University Republic of Korea
- Graduate Program in System Health Science and Engineering, Ewha Womans University Republic of Korea
| |
Collapse
|
42
|
Nomura KI, Mishra A, Sang T, Kalia RK, Nakano A, Vashishta P. Molecular Autonomous Pathfinder Using Deep Reinforcement Learning. J Phys Chem Lett 2024; 15:5288-5294. [PMID: 38722699 PMCID: PMC11103691 DOI: 10.1021/acs.jpclett.4c00438] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2024] [Revised: 04/21/2024] [Accepted: 04/22/2024] [Indexed: 05/22/2024]
Abstract
Diffusion in solids is a slow process that dictates rate-limiting processes in key chemical reactions. Unlike crystalline solids that offer well-defined diffusion pathways, the lack of similar structural motifs in amorphous or glassy materials poses great challenges in bridging the slow diffusion process and material failures. To tackle this problem, we propose an AI-guided long-term atomistic simulation approach: molecular autonomous pathfinder (MAP) framework based on deep reinforcement learning (DRL), where the RL agent is trained to uncover energy efficient diffusion pathways. We employ a Deep Q-Network architecture with distributed prioritized replay buffer, enabling fully online agent training with accelerated experience sampling by an ensemble of asynchronous agents. After training, the agents provide atomistic configurations of diffusion pathways with their energy profile. We use a piecewise nudged elastic band to refine the energy profile of the obtained pathway and the corresponding diffusion time on the basis of transition-state theory. With the MAP framework, we demonstrate atomistic diffusion mechanisms in amorphous silica with time scales comparable to experiments.
Collapse
Affiliation(s)
- Ken-ichi Nomura
- Collaboratory for Advanced
Computing and Simulations, University of
Southern California, Los Angeles, California 90089, United States
| | - Ankit Mishra
- Collaboratory for Advanced
Computing and Simulations, University of
Southern California, Los Angeles, California 90089, United States
| | - Tian Sang
- Collaboratory for Advanced
Computing and Simulations, University of
Southern California, Los Angeles, California 90089, United States
| | - Rajiv K. Kalia
- Collaboratory for Advanced
Computing and Simulations, University of
Southern California, Los Angeles, California 90089, United States
| | - Aiichiro Nakano
- Collaboratory for Advanced
Computing and Simulations, University of
Southern California, Los Angeles, California 90089, United States
| | - Priya Vashishta
- Collaboratory for Advanced
Computing and Simulations, University of
Southern California, Los Angeles, California 90089, United States
| |
Collapse
|
43
|
Chandraghatgi R, Ji HF, Rosen GL, Sokhansanj BA. Streamlining Computational Fragment-Based Drug Discovery through Evolutionary Optimization Informed by Ligand-Based Virtual Prescreening. J Chem Inf Model 2024; 64:3826-3840. [PMID: 38696451 PMCID: PMC11197033 DOI: 10.1021/acs.jcim.4c00234] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2024] [Revised: 04/18/2024] [Accepted: 04/19/2024] [Indexed: 05/04/2024]
Abstract
Recent advances in computational methods provide the promise of dramatically accelerating drug discovery. While mathematical modeling and machine learning have become vital in predicting drug-target interactions and properties, there is untapped potential in computational drug discovery due to the vast and complex chemical space. This paper builds on our recently published computational fragment-based drug discovery (FBDD) method called fragment databases from screened ligand drug discovery (FDSL-DD). FDSL-DD uses in silico screening to identify ligands from a vast library, fragmenting them while attaching specific attributes based on predicted binding affinity and interaction with the target subdomain. In this paper, we further propose a two-stage optimization method that utilizes the information from prescreening to optimize computational ligand synthesis. We hypothesize that using prescreening information for optimization shrinks the search space and focuses on promising regions, thereby improving the optimization for candidate ligands. The first optimization stage assembles these fragments into larger compounds using genetic algorithms, followed by a second stage of iterative refinement to produce compounds with enhanced bioactivity. To demonstrate broad applicability, the methodology is demonstrated on three diverse protein targets found in human solid cancers, bacterial antimicrobial resistance, and the SARS-CoV-2 virus. Combined, the proposed FDSL-DD and a two-stage optimization approach yield high-affinity ligand candidates more efficiently than other state-of-the-art computational FBDD methods. We further show that a multiobjective optimization method accounting for drug-likeness can still produce potential candidate ligands with a high binding affinity. Overall, the results demonstrate that integrating detailed chemical information with a constrained search framework can markedly optimize the initial drug discovery process, offering a more precise and efficient route to developing new therapeutics.
Collapse
Affiliation(s)
- Rohan Chandraghatgi
- Department
of Biology, Drexel University, Philadelphia, Pennsylvania 19104, United States
| | - Hai-Feng Ji
- Department
of Chemistry, Drexel University, Philadelphia, Pennsylvania 19104, United States
| | - Gail L. Rosen
- Department
of Electrical & Computer Engineering, Drexel University, Philadelphia, Pennsylvania 19104, United States
| | - Bahrad A. Sokhansanj
- Department
of Electrical & Computer Engineering, Drexel University, Philadelphia, Pennsylvania 19104, United States
| |
Collapse
|
44
|
Xia W, Xiao J, Bian H, Zhang J, Zhang JZH, Zhang H. Deep Learning-Based construction of a Drug-Like compound database and its application in virtual screening of HsDHODH inhibitors. Methods 2024; 225:44-51. [PMID: 38518843 DOI: 10.1016/j.ymeth.2024.03.008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2023] [Revised: 01/24/2024] [Accepted: 03/13/2024] [Indexed: 03/24/2024] Open
Abstract
The process of virtual screening relies heavily on the databases, but it is disadvantageous to conduct virtual screening based on commercial databases with patent-protected compounds, high compound toxicity and side effects. Therefore, this paper utilizes generative recurrent neural networks (RNN) containing long short-term memory (LSTM) cells to learn the properties of drug compounds in the DrugBank, aiming to obtain a new and virtual screening compounds database with drug-like properties. Ultimately, a compounds database consisting of 26,316 compounds is obtained by this method. To evaluate the potential of this compounds database, a series of tests are performed, including chemical space, ADME properties, compound fragmentation, and synthesizability analysis. As a result, it is proved that the database is equipped with good drug-like properties and a relatively new backbone, its potential in virtual screening is further tested. Finally, a series of seedling compounds with completely new backbones are obtained through docking and binding free energy calculations.
Collapse
Affiliation(s)
- Wei Xia
- Key Laboratory of Quantitative Synthetic Biology, Shenzhen Institute of Synthetic Biology, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China
| | - Jin Xiao
- Shanghai Engineering Research Center of Molecular Therapeutics and New Drug Development, Shanghai Key Laboratory of Green Chemistry & Chemical Process, School of Chemistry and Molecular Engineering, East China Normal University at Shanghai, 200062, China
| | - Hengwei Bian
- Shanghai Engineering Research Center of Molecular Therapeutics and New Drug Development, Shanghai Key Laboratory of Green Chemistry & Chemical Process, School of Chemistry and Molecular Engineering, East China Normal University at Shanghai, 200062, China.
| | - Jiajun Zhang
- Key Laboratory of Quantitative Synthetic Biology, Shenzhen Institute of Synthetic Biology, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China
| | - John Z H Zhang
- Key Laboratory of Quantitative Synthetic Biology, Shenzhen Institute of Synthetic Biology, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China; Shanghai Engineering Research Center of Molecular Therapeutics and New Drug Development, Shanghai Key Laboratory of Green Chemistry & Chemical Process, School of Chemistry and Molecular Engineering, East China Normal University at Shanghai, 200062, China; NYU-ECNU Center for Computational Chemistry at NYU Shanghai, Shanghai 200062, China; Department of Chemistry, New York University, NY, NY10003, USA; Collaborative Innovation Center of Extreme Optics, Shanxi University, Taiyuan, Shanxi, 030006, China.
| | - Haiping Zhang
- Key Laboratory of Quantitative Synthetic Biology, Shenzhen Institute of Synthetic Biology, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China.
| |
Collapse
|
45
|
Shields JD, Howells R, Lamont G, Leilei Y, Madin A, Reimann CE, Rezaei H, Reuillon T, Smith B, Thomson C, Zheng Y, Ziegler RE. AiZynth impact on medicinal chemistry practice at AstraZeneca. RSC Med Chem 2024; 15:1085-1095. [PMID: 38665822 PMCID: PMC11042116 DOI: 10.1039/d3md00651d] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2023] [Accepted: 02/15/2024] [Indexed: 04/28/2024] Open
Abstract
AstraZeneca chemists have been using the AI retrosynthesis tool AiZynth for three years. In this article, we present seven examples of how medicinal chemists using AiZynth positively impacted their drug discovery programmes. These programmes run the gamut from early-stage hit confirmation to late-stage route optimisation efforts. We also discuss the different use cases for which AI retrosynthesis tools are best suited.
Collapse
Affiliation(s)
- Jason D Shields
- Early Oncology R&D, AstraZeneca 35 Gatehouse Drive Waltham MA 02451 USA
| | - Rachel Howells
- Early Oncology R&D, AstraZeneca 1 Francis Crick Avenue Cambridge CB2 0AA UK
| | - Gillian Lamont
- Early Oncology R&D, AstraZeneca 1 Francis Crick Avenue Cambridge CB2 0AA UK
| | - Yin Leilei
- Pharmaron Beijing Co., Ltd. 6 Taihe Road BDA Beijing 100176 P.R. China
| | - Andrew Madin
- Discovery Sciences, AstraZeneca 1 Francis Crick Avenue Cambridge CB2 0AA UK
| | | | - Hadi Rezaei
- Early Oncology R&D, AstraZeneca 35 Gatehouse Drive Waltham MA 02451 USA
| | - Tristan Reuillon
- Respiratory & Immunology, BioPharmaceuticals R&D, AstraZeneca Pepparedsleden 1 43183 Mölndal Sweden
| | - Bryony Smith
- Early Oncology R&D, AstraZeneca 1 Francis Crick Avenue Cambridge CB2 0AA UK
| | - Clare Thomson
- Early Oncology R&D, AstraZeneca 1 Francis Crick Avenue Cambridge CB2 0AA UK
| | - Yuting Zheng
- Pharmaron Beijing Co., Ltd. 6 Taihe Road BDA Beijing 100176 P.R. China
| | - Robert E Ziegler
- Early Oncology R&D, AstraZeneca 35 Gatehouse Drive Waltham MA 02451 USA
| |
Collapse
|
46
|
Liu D, Song T, Na K, Wang S. PED: a novel predictor-encoder-decoder model for Alzheimer drug molecular generation. Front Artif Intell 2024; 7:1374148. [PMID: 38690194 PMCID: PMC11058643 DOI: 10.3389/frai.2024.1374148] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2024] [Accepted: 04/01/2024] [Indexed: 05/02/2024] Open
Abstract
Alzheimer's disease (AD) is a gradually advancing neurodegenerative disorder characterized by a concealed onset. Acetylcholinesterase (AChE) is an efficient hydrolase that catalyzes the hydrolysis of acetylcholine (ACh), which regulates the concentration of ACh at synapses and then terminates ACh-mediated neurotransmission. There are inhibitors to inhibit the activity of AChE currently, but its side effects are inevitable. In various application fields where Al have gained prominence, neural network-based models for molecular design have recently emerged and demonstrate encouraging outcomes. However, in the conditional molecular generation task, most of the current generation models need additional optimization algorithms to generate molecules with intended properties which make molecular generation inefficient. Consequently, we introduce a cognitive-conditional molecular design model, termed PED, which leverages the variational auto-encoder. Its primary function is to adeptly produce a molecular library tailored for specific properties. From this library, we can then identify molecules that inhibit AChE activity without adverse effects. These molecules serve as lead compounds, hastening AD treatment and concurrently enhancing the AI's cognitive abilities. In this study, we aim to fine-tune a VAE model pre-trained on the ZINC database using active compounds of AChE collected from Binding DB. Different from other molecular generation models, the PED can simultaneously perform both property prediction and molecule generation, consequently, it can generate molecules with intended properties without additional optimization process. Experiments of evaluation show that proposed model performs better than other methods benchmarked on the same data sets. The results indicated that the model learns a good representation of potential chemical space, it can well generate molecules with intended properties. Extensive experiments on benchmark datasets confirmed PED's efficiency and efficacy. Furthermore, we also verified the binding ability of molecules to AChE through molecular docking. The results showed that our molecular generation system for AD shows excellent cognitive capacities, the molecules within the molecular library could bind well to AChE and inhibit its activity, thus preventing the hydrolysis of ACh.
Collapse
Affiliation(s)
- Dayan Liu
- College of Computer Science and Technology, China University of Petroleum (East China), Qingdao, China
| | - Tao Song
- College of Computer Science and Technology, China University of Petroleum (East China), Qingdao, China
| | - Kang Na
- The Ninth Department of Health Care Administration, The Second Medical Center, Chinese PLA General Hospital, Beijing, China
| | - Shudong Wang
- College of Computer Science and Technology, China University of Petroleum (East China), Qingdao, China
| |
Collapse
|
47
|
Pang C, Qiao J, Zeng X, Zou Q, Wei L. Deep Generative Models in De Novo Drug Molecule Generation. J Chem Inf Model 2024; 64:2174-2194. [PMID: 37934070 DOI: 10.1021/acs.jcim.3c01496] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2023]
Abstract
The discovery of new drugs has important implications for human health. Traditional methods for drug discovery rely on experiments to optimize the structure of lead molecules, which are time-consuming and high-cost. Recently, artificial intelligence has exhibited promising and efficient performance for drug-like molecule generation. In particular, deep generative models achieve great success in de novo generation of drug-like molecules with desired properties, showing massive potential for novel drug discovery. In this study, we review the recent progress of molecule generation using deep generative models, mainly focusing on molecule representations, public databases, data processing tools, and advanced artificial intelligence based molecule generation frameworks. In particular, we present a comprehensive comparison of state-of-the-art deep generative models for molecule generation and a summary of commonly used molecular design strategies. We identify research gaps and challenges of molecule generation such as the need for better databases, missing 3D information in molecular representation, and the lack of high-precision evaluation metrics. We suggest future directions for molecular generation and drug discovery.
Collapse
Affiliation(s)
- Chao Pang
- School of Software, Shandong University, Jinan 250100, China
- Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Shandong University, Jinan 250100, China
| | - Jianbo Qiao
- School of Software, Shandong University, Jinan 250100, China
- Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Shandong University, Jinan 250100, China
| | - Xiangxiang Zeng
- College of Information Science and Engineering, Hunan University, Changsha 410082, China
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Leyi Wei
- School of Software, Shandong University, Jinan 250100, China
- Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Shandong University, Jinan 250100, China
| |
Collapse
|
48
|
Mao J, Wang J, Zeb A, Cho KH, Jin H, Kim J, Lee O, Wang Y, No KT. Transformer-Based Molecular Generative Model for Antiviral Drug Design. J Chem Inf Model 2024; 64:2733-2745. [PMID: 37366644 PMCID: PMC11005037 DOI: 10.1021/acs.jcim.3c00536] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2023] [Indexed: 06/28/2023]
Abstract
Since the Simplified Molecular Input Line Entry System (SMILES) is oriented to the atomic-level representation of molecules and is not friendly in terms of human readability and editable, however, IUPAC is the closest to natural language and is very friendly in terms of human-oriented readability and performing molecular editing, we can manipulate IUPAC to generate corresponding new molecules and produce programming-friendly molecular forms of SMILES. In addition, antiviral drug design, especially analogue-based drug design, is also more appropriate to edit and design directly from the functional group level of IUPAC than from the atomic level of SMILES, since designing analogues involves altering the R group only, which is closer to the knowledge-based molecular design of a chemist. Herein, we present a novel data-driven self-supervised pretraining generative model called "TransAntivirus" to make select-and-replace edits and convert organic molecules into the desired properties for design of antiviral candidate analogues. The results indicated that TransAntivirus is significantly superior to the control models in terms of novelty, validity, uniqueness, and diversity. TransAntivirus showed excellent performance in the design and optimization of nucleoside and non-nucleoside analogues by chemical space analysis and property prediction analysis. Furthermore, to validate the applicability of TransAntivirus in the design of antiviral drugs, we conducted two case studies on the design of nucleoside analogues and non-nucleoside analogues and screened four candidate lead compounds against anticoronavirus disease (COVID-19). Finally, we recommend this framework for accelerating antiviral drug discovery.
Collapse
Affiliation(s)
- Jiashun Mao
- The
Interdisciplinary Graduate Program in Integrative Biotechnology and
Translational Medicine, Yonsei University, Incheon 21983, Republic of Korea
| | - Jianmin Wang
- The
Interdisciplinary Graduate Program in Integrative Biotechnology and
Translational Medicine, Yonsei University, Incheon 21983, Republic of Korea
| | - Amir Zeb
- Faculty
of Natural and Basic Sciences, University
of Turbat, Balochistan 92600, Pakistan
| | - Kwang-Hwi Cho
- School
of Systems Biomedical Science, Soongsil
University, Seoul 06978, Republic of Korea
| | - Haiyan Jin
- The
Interdisciplinary Graduate Program in Integrative Biotechnology and
Translational Medicine, Yonsei University, Incheon 21983, Republic of Korea
| | - Jongwan Kim
- Department
of Biotechnology, Yonsei University, Seoul 03722, Republic of Korea
- Bioinformatics
and Molecular Design Research Center (BMDRC), Incheon 21983, Republic of Korea
| | - Onju Lee
- The
Interdisciplinary Graduate Program in Integrative Biotechnology and
Translational Medicine, Yonsei University, Incheon 21983, Republic of Korea
| | - Yunyun Wang
- School
of Pharmacy and Jiangsu Province Key Laboratory for Inflammation and
Molecular Drug Target, Nantong University, Nantong 226001, Jiangsu, P. R. China
| | - Kyoung Tai No
- The
Interdisciplinary Graduate Program in Integrative Biotechnology and
Translational Medicine, Yonsei University, Incheon 21983, Republic of Korea
| |
Collapse
|
49
|
Tian T, Li S, Fang M, Zhao D, Zeng J. MolSHAP: Interpreting Quantitative Structure-Activity Relationships Using Shapley Values of R-Groups. J Chem Inf Model 2024; 64:2236-2249. [PMID: 37584270 DOI: 10.1021/acs.jcim.3c00465] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/17/2023]
Abstract
Optimizing the activities and properties of lead compounds is an essential step in the drug discovery process. Despite recent advances in machine learning-aided drug discovery, most of the existing methods focus on making predictions for the desired objectives directly while ignoring the explanations for predictions. Although several techniques can provide interpretations for machine learning-based methods such as feature attribution, there are still gaps between these interpretations and the principles commonly adopted by medicinal chemists when designing and optimizing molecules. Here, we propose an interpretation framework, named MolSHAP, for quantitative structure-activity relationship analysis by estimating the contributions of R-groups. Instead of attributing the activities to individual input features, MolSHAP regards the R-group fragments as the basic units of interpretation, which is in accordance with the fragment-based modifications in molecule optimization. MolSHAP is a model-agnostic method that can interpret activity regression models with arbitrary input formats and model architectures. Based on the evaluations of numerous representative activity regression models on a specially designed R-group ranking task, MolSHAP achieved significantly better interpretation power compared with other methods. In addition, we developed a compound optimization algorithm based on MolSHAP and illustrated the reliability of the optimized compounds using an independent case study. These results demonstrated that MolSHAP can provide a useful tool for accurately interpreting the quantitative structure-activity relationships and rationally optimizing the compound activities in drug discovery.
Collapse
Affiliation(s)
- Tingzhong Tian
- Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing 100084, China
| | - Shuya Li
- Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing 100084, China
| | - Meng Fang
- Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing 100084, China
| | - Dan Zhao
- Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing 100084, China
| | - Jianyang Zeng
- Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing 100084, China
| |
Collapse
|
50
|
Vogt M. Chemoinformatic approaches for navigating large chemical spaces. Expert Opin Drug Discov 2024; 19:403-414. [PMID: 38300511 DOI: 10.1080/17460441.2024.2313475] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2023] [Accepted: 01/30/2024] [Indexed: 02/02/2024]
Abstract
INTRODUCTION Large chemical spaces (CSs) include traditional large compound collections, combinatorial libraries covering billions to trillions of molecules, DNA-encoded chemical libraries comprising complete combinatorial CSs in a single mixture, and virtual CSs explored by generative models. The diverse nature of these types of CSs require different chemoinformatic approaches for navigation. AREAS COVERED An overview of different types of large CSs is provided. Molecular representations and similarity metrics suitable for large CS exploration are discussed. A summary of navigation of CSs in generative models is provided. Methods for characterizing and comparing CSs are discussed. EXPERT OPINION The size of large CSs might restrict navigation to specialized algorithms and limit it to considering neighborhoods of structurally similar molecules. Efficient navigation of large CSs not only requires methods that scale with size but also requires smart approaches that focus on better but not necessarily larger molecule selections. Deep generative models aim to provide such approaches by implicitly learning features relevant for targeted biological properties. It is unclear whether these models can fulfill this ideal as validation is difficult as long as the covered CSs remain mainly virtual without experimental verification.
Collapse
Affiliation(s)
- Martin Vogt
- Department of Life Science Informatics, B-IT, LIMES Program Unit Chemical Biology and Medicinal Chemistry, Rheinische Friedrich-Wilhelms-Universität, Bonn, Germany
| |
Collapse
|