1
|
Wang J, Zhu F. Multi-objective molecular generation via clustered Pareto-based reinforcement learning. Neural Netw 2024; 179:106596. [PMID: 39163823 DOI: 10.1016/j.neunet.2024.106596] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2024] [Revised: 06/16/2024] [Accepted: 08/01/2024] [Indexed: 08/22/2024]
Abstract
De novo molecular design is the process of learning knowledge from existing data to propose new chemical structures that satisfy the desired properties. By using de novo design to generate compounds in a directed manner, better solutions can be obtained in large chemical libraries with less comparison cost. But drug design needs to take multiple factors into consideration. For example, in polypharmacology, molecules that activate or inhibit multiple target proteins produce multiple pharmacological activities and are less susceptible to drug resistance. However, most existing molecular generation methods either focus only on affinity for a single target or fail to effectively balance the relationship between multiple targets, resulting in insufficient validity and desirability of the generated molecules. To address the problems, an approach called clustered Pareto-based reinforcement learning (CPRL) is proposed. In CPRL, a pre-trained model is constructed to grasp existing molecular knowledge in a supervised learning manner. In addition, the clustered Pareto optimization algorithm is presented to find the best solution between different objectives. The algorithm first extracts an update set from the sampled molecules through the designed aggregation-based molecular clustering. Then, the final reward is computed by constructing the Pareto frontier ranking of the molecules from the updated set. To explore the vast chemical space, a reinforcement learning agent is designed in CPRL that can be updated under the guidance of the final reward to balance multiple properties. Furthermore, to increase the internal diversity of the molecules, a fixed-parameter exploration model is used for sampling in conjunction with the agent. The experimental results demonstrate that CPRL is capable of balancing multiple properties of the molecule and has higher desirability and validity, reaching 0.9551 and 0.9923, respectively.
Collapse
Affiliation(s)
- Jing Wang
- School of Computer Science and Technology, Soochow University, Suzhou, 215006, China.
| | - Fei Zhu
- School of Computer Science and Technology, Soochow University, Suzhou, 215006, China.
| |
Collapse
|
2
|
Suzuki T, Ma D, Yasuo N, Sekijima M. Mothra: Multiobjective de novo Molecular Generation Using Monte Carlo Tree Search. J Chem Inf Model 2024; 64:7291-7302. [PMID: 39317969 PMCID: PMC11481094 DOI: 10.1021/acs.jcim.4c00759] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/26/2024]
Abstract
In the field of drug discovery, identifying compounds that satisfy multiple criteria, such as target protein affinity, pharmacokinetics, and membrane permeability, is challenging because of the vast chemical space. Until now, multiobjective optimization via generative models has often involved linear combinations of different reward functions. Linear combinations solve multiobjective optimization problems by turning multiobjective optimization into a single-objective task and causing problems with weighting for each objective. Herein, we propose a scalable multiobjective molecular generative model developed using deep learning techniques. This model integrates the capabilities of recurrent neural networks for molecular generation and Pareto multiobjective Monte Carlo tree search to determine the optimal search direction. Through this integration, our model can generate compounds using enhanced evaluation functions that include important aspects like target protein affinity, drug similarity, and toxicity. The proposed model addresses the limitations of previous linear combination methods, and its effectiveness is demonstrated via extensive experimentation. The improvements achieved in the evaluation metrics underscore the potential utility of our approach toward drug discovery applications. In addition, we provide the source code for our model such that researchers can easily access and use our framework in their own investigations. The source code and pretrained model for Mothra, developed in this study, along with the Docker image for the Pareto front explorer and compound picker, designed to streamline the selection and visualization of optimal chemical compounds, are released under the GNU General Public License v3.0 and available at https://github.com/sekijima-lab/Mothra.
Collapse
Affiliation(s)
- Takamasa Suzuki
- Department of Computer Science, Tokyo Institute of Technology, Yokohama, Kanagawa 226-8501Japan
| | - Dian Ma
- Department of Computer Science, Tokyo Institute of Technology, Yokohama, Kanagawa 226-8501Japan
| | - Nobuaki Yasuo
- Tokyo Tech Academy for Convergence of Materials and Informatics (TAC-MI), Tokyo Institute of Technology, Tokyo 152-8550, Japan
| | - Masakazu Sekijima
- Department of Computer Science, Tokyo Institute of Technology, Yokohama, Kanagawa 226-8501Japan
| |
Collapse
|
3
|
Chakraborty C, Bhattacharya M, Lee SS, Wen ZH, Lo YH. The changing scenario of drug discovery using AI to deep learning: Recent advancement, success stories, collaborations, and challenges. MOLECULAR THERAPY. NUCLEIC ACIDS 2024; 35:102295. [PMID: 39257717 PMCID: PMC11386122 DOI: 10.1016/j.omtn.2024.102295] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/12/2024]
Abstract
Due to the transformation of artificial intelligence (AI) tools and technologies, AI-driven drug discovery has come to the forefront. It reduces the time and expenditure. Due to these advantages, pharmaceutical industries are concentrating on AI-driven drug discovery. Several drug molecules have been discovered using AI-based techniques and tools, and several newly AI-discovered drug molecules have already entered clinical trials. In this review, we first present the data and their resources in the pharmaceutical sector for AI-driven drug discovery and illustrated some significant algorithms or techniques used for AI and ML which are used in this field. We gave an overview of the deep neural network (NN) models and compared them with artificial NNs. Then, we illustrate the recent advancement of the landscape of drug discovery using AI to deep learning, such as the identification of drug targets, prediction of their structure, estimation of drug-target interaction, estimation of drug-target binding affinity, design of de novo drug, prediction of drug toxicity, estimation of absorption, distribution, metabolism, excretion, toxicity; and estimation of drug-drug interaction. Moreover, we highlighted the success stories of AI-driven drug discovery and discussed several collaboration and the challenges in this area. The discussions in the article will enrich the pharmaceutical industry.
Collapse
Affiliation(s)
- Chiranjib Chakraborty
- Department of Biotechnology, School of Life Science and Biotechnology, Adamas University, Kolkata, West Bengal 700126, India
| | - Manojit Bhattacharya
- Department of Zoology, Fakir Mohan University, Vyasa Vihar, Balasore, Odisha 756020, India
| | - Sang-Soo Lee
- Institute for Skeletal Aging & Orthopedic Surgery, Hallym University-Chuncheon Sacred Heart Hospital, Chuncheon, Gangwon-Do 24252, Republic of Korea
| | - Zhi-Hong Wen
- Department of Marine Biotechnology and Resources, National Sun Yat-sen University, Kaohsiung 80424, Taiwan
| | - Yi-Hao Lo
- Department of Family Medicine, Zuoying Armed Forces General Hospital, Kaohsiung 813204, Taiwan
- Shu-Zen Junior College of Medicine and Management, Kaohsiung 821004, Taiwan
- Institute of Medical Science and Technology, National Sun Yat-sen University, Kaohsiung 804201, Taiwan
| |
Collapse
|
4
|
Lavecchia A. Advancing drug discovery with deep attention neural networks. Drug Discov Today 2024; 29:104067. [PMID: 38925473 DOI: 10.1016/j.drudis.2024.104067] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2024] [Revised: 06/10/2024] [Accepted: 06/19/2024] [Indexed: 06/28/2024]
Abstract
In the dynamic field of drug discovery, deep attention neural networks are revolutionizing our approach to complex data. This review explores the attention mechanism and its extended architectures, including graph attention networks (GATs), transformers, bidirectional encoder representations from transformers (BERT), generative pre-trained transformers (GPTs) and bidirectional and auto-regressive transformers (BART). Delving into their core principles and multifaceted applications, we uncover their pivotal roles in catalyzing de novo drug design, predicting intricate molecular properties and deciphering elusive drug-target interactions. Despite challenges, these attention-based architectures hold unparalleled promise to drive transformative breakthroughs and accelerate progress in pharmaceutical research.
Collapse
Affiliation(s)
- Antonio Lavecchia
- Drug Discovery Laboratory, Department of Pharmacy, University of Napoli Federico II, I-80131 Naples, Italy.
| |
Collapse
|
5
|
Ekins S, Lane TR, Urbina F, Puhl AC. In silico ADME/tox comes of age: twenty years later. Xenobiotica 2024; 54:352-358. [PMID: 37539466 PMCID: PMC10850432 DOI: 10.1080/00498254.2023.2245049] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2023] [Revised: 08/01/2023] [Accepted: 08/02/2023] [Indexed: 08/05/2023]
Abstract
In the early 2000s pharmaceutical drug discovery was beginning to use computational approaches for absorption, distribution, metabolism, excretion and toxicity (ADME/Tox, also known as ADMET) prediction. This emphasis on prediction was an effort to reduce the risk of later stage failures from ADME/Tox.Much has been written in the intervening twenty plus years and significant expenditure has occurred in companies developing these in silico capabilities which can be gleaned from publications. It is therefore an appropriate time to briefly reflect on what was proposed then and what the reality is today.20 years ago, we tended to optimise bioactivity and perhaps one ADME/Tox property at a time. Previously pharmaceutical companies needed a whole infrastructure for models - in silico and in vitro experts, IT, champions on a project team, educators and management support. Now we are in the age of generative de novo design where bioactivity and many ADME/Tox properties can be optimised and large language model technologies are available.There are also some challenges such as the focus on very large molecules which may be outside of current ADME/Tox models.We provide an opportunity to look forward with the increasing public data for ADME/Tox as well as expanded types of algorithms available.
Collapse
Affiliation(s)
- Sean Ekins
- Collaborations Pharmaceuticals, Inc., 840 Main Campus Drive, Lab 3510, Raleigh, NC 27606, USA
| | - Thomas R. Lane
- Collaborations Pharmaceuticals, Inc., 840 Main Campus Drive, Lab 3510, Raleigh, NC 27606, USA
| | - Fabio Urbina
- Collaborations Pharmaceuticals, Inc., 840 Main Campus Drive, Lab 3510, Raleigh, NC 27606, USA
| | - Ana C. Puhl
- Collaborations Pharmaceuticals, Inc., 840 Main Campus Drive, Lab 3510, Raleigh, NC 27606, USA
| |
Collapse
|
6
|
Abubakar ML, Kapoor N, Sharma A, Gambhir L, Jasuja ND, Sharma G. Artificial Intelligence in Drug Identification and Validation: A Scoping Review. Drug Res (Stuttg) 2024; 74:208-219. [PMID: 38830370 DOI: 10.1055/a-2306-8311] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/05/2024]
Abstract
The end-to-end process in the discovery of drugs involves therapeutic candidate identification, validation of identified targets, identification of hit compound series, lead identification and optimization, characterization, and formulation and development. The process is lengthy, expensive, tedious, and inefficient, with a large attrition rate for novel drug discovery. Today, the pharmaceutical industry is focused on improving the drug discovery process. Finding and selecting acceptable drug candidates effectively can significantly impact the price and profitability of new medications. Aside from the cost, there is a need to reduce the end-to-end process time, limiting the number of experiments at various stages. To achieve this, artificial intelligence (AI) has been utilized at various stages of drug discovery. The present study aims to identify the recent work that has developed AI-based models at various stages of drug discovery, identify the stages that need more concern, present the taxonomy of AI methods in drug discovery, and provide research opportunities. From January 2016 to September 1, 2023, the study identified all publications that were cited in the electronic databases including Scopus, NCBI PubMed, MEDLINE, Anthropology Plus, Embase, APA PsycInfo, SOCIndex, and CINAHL. Utilising a standardized form, data were extracted, and presented possible research prospects based on the analysis of the extracted data.
Collapse
Affiliation(s)
| | - Neha Kapoor
- School of Applied Sciences, Suresh Gyan Vihar University, Jaipur, Rajasthan, India
| | - Asha Sharma
- Department of Zoology, Swargiya P. N. K. S. Govt. PG College, Dausa, Rajasthan, India
| | - Lokesh Gambhir
- School of Basic and Applied Sciences, Shri Guru Ram Rai University, Dehradun, Uttarakhand, India
| | | | - Gaurav Sharma
- School of Applied Sciences, Suresh Gyan Vihar University, Jaipur, Rajasthan, India
| |
Collapse
|
7
|
Gangwal A, Lavecchia A. Unleashing the power of generative AI in drug discovery. Drug Discov Today 2024; 29:103992. [PMID: 38663579 DOI: 10.1016/j.drudis.2024.103992] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2024] [Revised: 03/22/2024] [Accepted: 04/18/2024] [Indexed: 05/04/2024]
Abstract
Artificial intelligence (AI) is revolutionizing drug discovery by enhancing precision, reducing timelines and costs, and enabling AI-driven computer-aided drug design. This review focuses on recent advancements in deep generative models (DGMs) for de novo drug design, exploring diverse algorithms and their profound impact. It critically analyses the challenges that are intricately interwoven into these technologies, proposing strategies to unlock their full potential. It features case studies of both successes and failures in advancing drugs to clinical trials with AI assistance. Last, it outlines a forward-looking plan for optimizing DGMs in de novo drug design, thereby fostering faster and more cost-effective drug development.
Collapse
Affiliation(s)
- Amit Gangwal
- Department of Natural Product Chemistry, Shri Vile Parle Kelavani Mandal's Institute of Pharmacy, Dhule 424001, Maharashtra, India
| | - Antonio Lavecchia
- "Drug Discovery" Laboratory, Department of Pharmacy, University of Naples Federico II, I-80131 Naples, Italy.
| |
Collapse
|
8
|
Thomas M, O'Boyle NM, Bender A, De Graaf C. MolScore: a scoring, evaluation and benchmarking framework for generative models in de novo drug design. J Cheminform 2024; 16:64. [PMID: 38816825 PMCID: PMC11141043 DOI: 10.1186/s13321-024-00861-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2023] [Accepted: 05/15/2024] [Indexed: 06/01/2024] Open
Abstract
Generative models are undergoing rapid research and application to de novo drug design. To facilitate their application and evaluation, we present MolScore. MolScore already contains many drug-design-relevant scoring functions commonly used in benchmarks such as, molecular similarity, molecular docking, predictive models, synthesizability, and more. In addition, providing performance metrics to evaluate generative model performance based on the chemistry generated. With this unification of functionality, MolScore re-implements commonly used benchmarks in the field (such as GuacaMol, MOSES, and MolOpt). Moreover, new benchmarks can be created trivially. We demonstrate this by testing a chemical language model with reinforcement learning on three new tasks of increasing complexity related to the design of 5-HT2a ligands that utilise either molecular descriptors, 266 pre-trained QSAR models, or dual molecular docking. Lastly, MolScore can be integrated into an existing Python script with just three lines of code. This framework is a step towards unifying generative model application and evaluation as applied to drug design for both practitioners and researchers. The framework can be found on GitHub and downloaded directly from the Python Package Index.Scientific ContributionMolScore is an open-source platform to facilitate generative molecular design and evaluation thereof for application in drug design. This platform takes important steps towards unifying existing benchmarks, providing a platform to share new benchmarks, and improves customisation, flexibility and usability for practitioners over existing solutions.
Collapse
Affiliation(s)
- Morgan Thomas
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Cambridge, CB2 1EW, UK.
| | - Noel M O'Boyle
- Computational Chemistry, Nxera Pharma, Steinmetz Building, Granta Park, Great Abington, Cambridge, CB21 6DG, UK
| | - Andreas Bender
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Cambridge, CB2 1EW, UK
| | - Chris De Graaf
- Computational Chemistry, Nxera Pharma, Steinmetz Building, Granta Park, Great Abington, Cambridge, CB21 6DG, UK
| |
Collapse
|
9
|
Chandraghatgi R, Ji HF, Rosen GL, Sokhansanj BA. Streamlining Computational Fragment-Based Drug Discovery through Evolutionary Optimization Informed by Ligand-Based Virtual Prescreening. J Chem Inf Model 2024; 64:3826-3840. [PMID: 38696451 PMCID: PMC11197033 DOI: 10.1021/acs.jcim.4c00234] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2024] [Revised: 04/18/2024] [Accepted: 04/19/2024] [Indexed: 05/04/2024]
Abstract
Recent advances in computational methods provide the promise of dramatically accelerating drug discovery. While mathematical modeling and machine learning have become vital in predicting drug-target interactions and properties, there is untapped potential in computational drug discovery due to the vast and complex chemical space. This paper builds on our recently published computational fragment-based drug discovery (FBDD) method called fragment databases from screened ligand drug discovery (FDSL-DD). FDSL-DD uses in silico screening to identify ligands from a vast library, fragmenting them while attaching specific attributes based on predicted binding affinity and interaction with the target subdomain. In this paper, we further propose a two-stage optimization method that utilizes the information from prescreening to optimize computational ligand synthesis. We hypothesize that using prescreening information for optimization shrinks the search space and focuses on promising regions, thereby improving the optimization for candidate ligands. The first optimization stage assembles these fragments into larger compounds using genetic algorithms, followed by a second stage of iterative refinement to produce compounds with enhanced bioactivity. To demonstrate broad applicability, the methodology is demonstrated on three diverse protein targets found in human solid cancers, bacterial antimicrobial resistance, and the SARS-CoV-2 virus. Combined, the proposed FDSL-DD and a two-stage optimization approach yield high-affinity ligand candidates more efficiently than other state-of-the-art computational FBDD methods. We further show that a multiobjective optimization method accounting for drug-likeness can still produce potential candidate ligands with a high binding affinity. Overall, the results demonstrate that integrating detailed chemical information with a constrained search framework can markedly optimize the initial drug discovery process, offering a more precise and efficient route to developing new therapeutics.
Collapse
Affiliation(s)
- Rohan Chandraghatgi
- Department
of Biology, Drexel University, Philadelphia, Pennsylvania 19104, United States
| | - Hai-Feng Ji
- Department
of Chemistry, Drexel University, Philadelphia, Pennsylvania 19104, United States
| | - Gail L. Rosen
- Department
of Electrical & Computer Engineering, Drexel University, Philadelphia, Pennsylvania 19104, United States
| | - Bahrad A. Sokhansanj
- Department
of Electrical & Computer Engineering, Drexel University, Philadelphia, Pennsylvania 19104, United States
| |
Collapse
|
10
|
Doostmohammadi A, Jooya H, Ghorbanian K, Gohari S, Dadashpour M. Potentials and future perspectives of multi-target drugs in cancer treatment: the next generation anti-cancer agents. Cell Commun Signal 2024; 22:228. [PMID: 38622735 PMCID: PMC11020265 DOI: 10.1186/s12964-024-01607-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2023] [Accepted: 04/05/2024] [Indexed: 04/17/2024] Open
Abstract
Cancer is a major public health problem worldwide with more than an estimated 19.3 million new cases in 2020. The occurrence rises dramatically with age, and the overall risk accumulation is combined with the tendency for cellular repair mechanisms to be less effective in older individuals. Conventional cancer treatments, such as radiotherapy, surgery, and chemotherapy, have been used for decades to combat cancer. However, the emergence of novel fields of cancer research has led to the exploration of innovative treatment approaches focused on immunotherapy, epigenetic therapy, targeted therapy, multi-omics, and also multi-target therapy. The hypothesis was based on that drugs designed to act against individual targets cannot usually battle multigenic diseases like cancer. Multi-target therapies, either in combination or sequential order, have been recommended to combat acquired and intrinsic resistance to anti-cancer treatments. Several studies focused on multi-targeting treatments due to their advantages include; overcoming clonal heterogeneity, lower risk of multi-drug resistance (MDR), decreased drug toxicity, and thereby lower side effects. In this study, we'll discuss about multi-target drugs, their benefits in improving cancer treatments, and recent advances in the field of multi-targeted drugs. Also, we will study the research that performed clinical trials using multi-target therapeutic agents for cancer treatment.
Collapse
Affiliation(s)
- Ali Doostmohammadi
- Nervous System Stem Cells Research Center, Semnan University of Medical Sciences, Semnan, Iran
- Student Research Committee, Semnan University of Medical Sciences, Semnan, Iran
| | - Hossein Jooya
- Biochemistry Group, Department of Chemistry, Faculty of Science, Ferdowsi University of Mashhad, Mashhad, Iran
| | - Kimia Ghorbanian
- Student Research Committee, Semnan University of Medical Sciences, Semnan, Iran
| | - Sargol Gohari
- Department of Biology, Central Tehran Branch, Islamic Azad University, Tehran, Iran
| | - Mehdi Dadashpour
- Department of Medical Biotechnology, Faculty of Medicine, Semnan University of Medical Sciences, Semnan, Iran.
- Cancer Research Center, Semnan University of Medical Sciences, Semnan, Iran.
| |
Collapse
|
11
|
Pang C, Qiao J, Zeng X, Zou Q, Wei L. Deep Generative Models in De Novo Drug Molecule Generation. J Chem Inf Model 2024; 64:2174-2194. [PMID: 37934070 DOI: 10.1021/acs.jcim.3c01496] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2023]
Abstract
The discovery of new drugs has important implications for human health. Traditional methods for drug discovery rely on experiments to optimize the structure of lead molecules, which are time-consuming and high-cost. Recently, artificial intelligence has exhibited promising and efficient performance for drug-like molecule generation. In particular, deep generative models achieve great success in de novo generation of drug-like molecules with desired properties, showing massive potential for novel drug discovery. In this study, we review the recent progress of molecule generation using deep generative models, mainly focusing on molecule representations, public databases, data processing tools, and advanced artificial intelligence based molecule generation frameworks. In particular, we present a comprehensive comparison of state-of-the-art deep generative models for molecule generation and a summary of commonly used molecular design strategies. We identify research gaps and challenges of molecule generation such as the need for better databases, missing 3D information in molecular representation, and the lack of high-precision evaluation metrics. We suggest future directions for molecular generation and drug discovery.
Collapse
Affiliation(s)
- Chao Pang
- School of Software, Shandong University, Jinan 250100, China
- Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Shandong University, Jinan 250100, China
| | - Jianbo Qiao
- School of Software, Shandong University, Jinan 250100, China
- Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Shandong University, Jinan 250100, China
| | - Xiangxiang Zeng
- College of Information Science and Engineering, Hunan University, Changsha 410082, China
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Leyi Wei
- School of Software, Shandong University, Jinan 250100, China
- Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Shandong University, Jinan 250100, China
| |
Collapse
|
12
|
Mao J, Wang J, Zeb A, Cho KH, Jin H, Kim J, Lee O, Wang Y, No KT. Transformer-Based Molecular Generative Model for Antiviral Drug Design. J Chem Inf Model 2024; 64:2733-2745. [PMID: 37366644 PMCID: PMC11005037 DOI: 10.1021/acs.jcim.3c00536] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2023] [Indexed: 06/28/2023]
Abstract
Since the Simplified Molecular Input Line Entry System (SMILES) is oriented to the atomic-level representation of molecules and is not friendly in terms of human readability and editable, however, IUPAC is the closest to natural language and is very friendly in terms of human-oriented readability and performing molecular editing, we can manipulate IUPAC to generate corresponding new molecules and produce programming-friendly molecular forms of SMILES. In addition, antiviral drug design, especially analogue-based drug design, is also more appropriate to edit and design directly from the functional group level of IUPAC than from the atomic level of SMILES, since designing analogues involves altering the R group only, which is closer to the knowledge-based molecular design of a chemist. Herein, we present a novel data-driven self-supervised pretraining generative model called "TransAntivirus" to make select-and-replace edits and convert organic molecules into the desired properties for design of antiviral candidate analogues. The results indicated that TransAntivirus is significantly superior to the control models in terms of novelty, validity, uniqueness, and diversity. TransAntivirus showed excellent performance in the design and optimization of nucleoside and non-nucleoside analogues by chemical space analysis and property prediction analysis. Furthermore, to validate the applicability of TransAntivirus in the design of antiviral drugs, we conducted two case studies on the design of nucleoside analogues and non-nucleoside analogues and screened four candidate lead compounds against anticoronavirus disease (COVID-19). Finally, we recommend this framework for accelerating antiviral drug discovery.
Collapse
Affiliation(s)
- Jiashun Mao
- The
Interdisciplinary Graduate Program in Integrative Biotechnology and
Translational Medicine, Yonsei University, Incheon 21983, Republic of Korea
| | - Jianmin Wang
- The
Interdisciplinary Graduate Program in Integrative Biotechnology and
Translational Medicine, Yonsei University, Incheon 21983, Republic of Korea
| | - Amir Zeb
- Faculty
of Natural and Basic Sciences, University
of Turbat, Balochistan 92600, Pakistan
| | - Kwang-Hwi Cho
- School
of Systems Biomedical Science, Soongsil
University, Seoul 06978, Republic of Korea
| | - Haiyan Jin
- The
Interdisciplinary Graduate Program in Integrative Biotechnology and
Translational Medicine, Yonsei University, Incheon 21983, Republic of Korea
| | - Jongwan Kim
- Department
of Biotechnology, Yonsei University, Seoul 03722, Republic of Korea
- Bioinformatics
and Molecular Design Research Center (BMDRC), Incheon 21983, Republic of Korea
| | - Onju Lee
- The
Interdisciplinary Graduate Program in Integrative Biotechnology and
Translational Medicine, Yonsei University, Incheon 21983, Republic of Korea
| | - Yunyun Wang
- School
of Pharmacy and Jiangsu Province Key Laboratory for Inflammation and
Molecular Drug Target, Nantong University, Nantong 226001, Jiangsu, P. R. China
| | - Kyoung Tai No
- The
Interdisciplinary Graduate Program in Integrative Biotechnology and
Translational Medicine, Yonsei University, Incheon 21983, Republic of Korea
| |
Collapse
|
13
|
Jones J, Clark RD, Lawless MS, Miller DW, Waldman M. The AI-driven Drug Design (AIDD) platform: an interactive multi-parameter optimization system integrating molecular evolution with physiologically based pharmacokinetic simulations. J Comput Aided Mol Des 2024; 38:14. [PMID: 38499823 DOI: 10.1007/s10822-024-00552-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2023] [Accepted: 02/13/2024] [Indexed: 03/20/2024]
Abstract
Computer-aided drug design has advanced rapidly in recent years, and multiple instances of in silico designed molecules advancing to the clinic have demonstrated the contribution of this field to medicine. Properly designed and implemented platforms can drastically reduce drug development timelines and costs. While such efforts were initially focused primarily on target affinity/activity, it is now appreciated that other parameters are equally important in the successful development of a drug and its progression to the clinic, including pharmacokinetic properties as well as absorption, distribution, metabolic, excretion and toxicological (ADMET) properties. In the last decade, several programs have been developed that incorporate these properties into the drug design and optimization process and to varying degrees, allowing for multi-parameter optimization. Here, we introduce the Artificial Intelligence-driven Drug Design (AIDD) platform, which automates the drug design process by integrating high-throughput physiologically-based pharmacokinetic simulations (powered by GastroPlus) and ADMET predictions (powered by ADMET Predictor) with an advanced evolutionary algorithm that is quite different than current generative models. AIDD uses these and other estimates in iteratively performing multi-objective optimizations to produce novel molecules that are active and lead-like. Here we describe the AIDD workflow and details of the methodologies involved therein. We use a dataset of triazolopyrimidine inhibitors of the dihydroorotate dehydrogenase from Plasmodium falciparum to illustrate how AIDD generates novel sets of molecules.
Collapse
Affiliation(s)
- Jeremy Jones
- Simulations Plus, Inc., 42505 10th Street West, Lancaster, CA, 93534‑7059, USA.
| | - Robert D Clark
- The Indiana University Luddy School of Informatics, Computing and Engineering, 700 N. Woodlawn Avenue, Bloomington, IN, 47408, USA
| | - Michael S Lawless
- Simulations Plus, Inc., 42505 10th Street West, Lancaster, CA, 93534‑7059, USA
| | - David W Miller
- Simulations Plus, Inc., 42505 10th Street West, Lancaster, CA, 93534‑7059, USA
| | - Marvin Waldman
- Simulations Plus, Inc., 42505 10th Street West, Lancaster, CA, 93534‑7059, USA
| |
Collapse
|
14
|
Li J, Chen X, Liu R, Liu X, Shu M. Engineering novel scaffolds for specific HDAC11 inhibitors against metabolic diseases exploiting deep learning, virtual screening, and molecular dynamics simulations. Int J Biol Macromol 2024; 262:129810. [PMID: 38340912 DOI: 10.1016/j.ijbiomac.2024.129810] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2023] [Revised: 12/20/2023] [Accepted: 01/25/2024] [Indexed: 02/12/2024]
Abstract
The prevalence of metabolic diseases is increasing at a frightening rate year by year. The burgeoning development of deep learning enables drug design to be more efficient, selective, and structurally novel. The critical relevance of Histone deacetylase 11 (HDAC11) to the pathogenesis of several metabolic diseases makes it a promising drug target for curbing metabolic disorders. The present study aims to design new specific HDAC11 inhibitors for the treatment of metabolic diseases. Deep learning was performed to learn the properties of existing HDAC11 inhibitors and yield a novel compound library containing 23,122 molecules. Subsequently, the compound library was screened by ADMET properties, Lipinski & Veber rules, traditional machine classification models, and molecular docking, and 10 compounds were screened as candidate HDAC11 inhibitors. The stability of the 10 new molecules was further evaluated by deploying RMSD, RMSF, MM/GBSA, free energy landscape mapping, and PCA analysis in molecular dynamics simulations. As a result, ten compounds, Cpd_17556, Cpd_2184, Cpd_8907, Cpd_7771, Cpd_14959, Cpd_7108, Cpd_12383, Cpd_13153, Cpd_14500and Cpd_21811, were characterized as good HDAC11 inhibitors and are expected to be promising drug candidates for metabolic disorders, and further in vitro, in vivo and clinical trials to demonstrate in the future.
Collapse
Affiliation(s)
- Jiali Li
- School of Pharmacy and Bioengineering, Chongqing University of Technology, Chongqing 400054, China; Key Laboratory of Screening and activity evaluation of targeted drugs, Chongqing 400054, China
| | - XiaoDie Chen
- School of Pharmacy and Bioengineering, Chongqing University of Technology, Chongqing 400054, China; Key Laboratory of Screening and activity evaluation of targeted drugs, Chongqing 400054, China
| | - Rong Liu
- School of Pharmacy and Bioengineering, Chongqing University of Technology, Chongqing 400054, China; Key Laboratory of Screening and activity evaluation of targeted drugs, Chongqing 400054, China
| | - Xingyu Liu
- School of Pharmacy and Bioengineering, Chongqing University of Technology, Chongqing 400054, China; Key Laboratory of Screening and activity evaluation of targeted drugs, Chongqing 400054, China
| | - Mao Shu
- School of Pharmacy and Bioengineering, Chongqing University of Technology, Chongqing 400054, China; Key Laboratory of Screening and activity evaluation of targeted drugs, Chongqing 400054, China.
| |
Collapse
|
15
|
Melancon K, Pliushcheuskaya P, Meiler J, Künze G. Targeting ion channels with ultra-large library screening for hit discovery. Front Mol Neurosci 2024; 16:1336004. [PMID: 38249296 PMCID: PMC10796734 DOI: 10.3389/fnmol.2023.1336004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2023] [Accepted: 12/05/2023] [Indexed: 01/23/2024] Open
Abstract
Ion channels play a crucial role in a variety of physiological and pathological processes, making them attractive targets for drug development in diseases such as diabetes, epilepsy, hypertension, cancer, and chronic pain. Despite the importance of ion channels in drug discovery, the vastness of chemical space and the complexity of ion channels pose significant challenges for identifying drug candidates. The use of in silico methods in drug discovery has dramatically reduced the time and cost of drug development and has the potential to revolutionize the field of medicine. Recent advances in computer hardware and software have enabled the screening of ultra-large compound libraries. Integration of different methods at various scales and dimensions is becoming an inevitable trend in drug development. In this review, we provide an overview of current state-of-the-art computational chemistry methodologies for ultra-large compound library screening and their application to ion channel drug discovery research. We discuss the advantages and limitations of various in silico techniques, including virtual screening, molecular mechanics/dynamics simulations, and machine learning-based approaches. We also highlight several successful applications of computational chemistry methodologies in ion channel drug discovery and provide insights into future directions and challenges in this field.
Collapse
Affiliation(s)
- Kortney Melancon
- Department of Chemistry, Vanderbilt University, Nashville, TN, United States
- Center for Structural Biology, Vanderbilt University, Nashville, TN, United States
| | | | - Jens Meiler
- Department of Chemistry, Vanderbilt University, Nashville, TN, United States
- Center for Structural Biology, Vanderbilt University, Nashville, TN, United States
- Medical Faculty, Institute for Drug Discovery, Leipzig University, Leipzig, Germany
- Center for Scalable Data Analytics and Artificial Intelligence, Leipzig University, Leipzig, Germany
| | - Georg Künze
- Medical Faculty, Institute for Drug Discovery, Leipzig University, Leipzig, Germany
- Center for Scalable Data Analytics and Artificial Intelligence, Leipzig University, Leipzig, Germany
- Interdisciplinary Center for Bioinformatics, Leipzig University, Leipzig, Germany
| |
Collapse
|
16
|
Angelo JS, Guedes IA, Barbosa HJC, Dardenne LE. Multi-and many-objective optimization: present and future in de novo drug design. Front Chem 2023; 11:1288626. [PMID: 38192501 PMCID: PMC10773868 DOI: 10.3389/fchem.2023.1288626] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2023] [Accepted: 11/27/2023] [Indexed: 01/10/2024] Open
Abstract
de novo Drug Design (dnDD) aims to create new molecules that satisfy multiple conflicting objectives. Since several desired properties can be considered in the optimization process, dnDD is naturally categorized as a many-objective optimization problem (ManyOOP), where more than three objectives must be simultaneously optimized. However, a large number of objectives typically pose several challenges that affect the choice and the design of optimization methodologies. Herein, we cover the application of multi- and many-objective optimization methods, particularly those based on Evolutionary Computation and Machine Learning techniques, to enlighten their potential application in dnDD. Additionally, we comprehensively analyze how molecular properties used in the optimization process are applied as either objectives or constraints to the problem. Finally, we discuss future research in many-objective optimization for dnDD, highlighting two important possible impacts: i) its integration with the development of multi-target approaches to accelerate the discovery of innovative and more efficacious drug therapies and ii) its role as a catalyst for new developments in more fundamental and general methodological frameworks in the field.
Collapse
Affiliation(s)
| | | | | | - Laurent E. Dardenne
- Coordenação de Modelagem Computacional, Laboratório Nacional de Computação Científica, Petrópolis, Brazil
| |
Collapse
|
17
|
Shimizu Y, Ohta M, Ishida S, Terayama K, Osawa M, Honma T, Ikeda K. AI-driven molecular generation of not-patented pharmaceutical compounds using world open patent data. J Cheminform 2023; 15:120. [PMID: 38093324 PMCID: PMC10716930 DOI: 10.1186/s13321-023-00791-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2023] [Accepted: 12/02/2023] [Indexed: 12/17/2023] Open
Abstract
Developing compounds with novel structures is important for the production of new drugs. From an intellectual perspective, confirming the patent status of newly developed compounds is essential, particularly for pharmaceutical companies. The generation of a large number of compounds has been made possible because of the recent advances in artificial intelligence (AI). However, confirming the patent status of these generated molecules has been a challenge because there are no free and easy-to-use tools that can be used to determine the novelty of the generated compounds in terms of patents in a timely manner; additionally, there are no appropriate reference databases for pharmaceutical patents in the world. In this study, two public databases, SureChEMBL and Google Patents Public Datasets, were used to create a reference database of drug-related patented compounds using international patent classification. An exact structure search system was constructed using InChIKey and a relational database system to rapidly search for compounds in the reference database. Because drug-related patented compounds are a good source for generative AI to learn useful chemical structures, they were used as the training data. Furthermore, molecule generation was successfully directed by increasing and decreasing the number of generated patented compounds through incorporation of patent status (i.e., patented or not) into learning. The use of patent status enabled generation of novel molecules with high drug-likeness. The generation using generative AI with patent information would help efficiently propose novel compounds in terms of pharmaceutical patents. Scientific contribution: In this study, a new molecule-generation method that takes into account the patent status of molecules, which has rarely been considered but is an important feature in drug discovery, was developed. The method enables the generation of novel molecules based on pharmaceutical patents with high drug-likeness and will help in the efficient development of effective drug compounds.
Collapse
Affiliation(s)
- Yugo Shimizu
- HPC- and AI-driven Drug Development Platform Division, RIKEN Center for Computational Science, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama City, Kanagawa, 230-0045, Japan
- Division of Physics for Life Functions, Keio University Faculty of Pharmacy, 1-5-30 Shibakoen, Minato-ku, Tokyo, 105-8512, Japan
| | - Masateru Ohta
- HPC- and AI-driven Drug Development Platform Division, RIKEN Center for Computational Science, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama City, Kanagawa, 230-0045, Japan
| | - Shoichi Ishida
- Graduate School of Medical Life Science, Yokohama City University, 1-7-29 Suehiro-cho, Tsurumi-ku, Yokohama City, Kanagawa, 230-0045, Japan
| | - Kei Terayama
- Graduate School of Medical Life Science, Yokohama City University, 1-7-29 Suehiro-cho, Tsurumi-ku, Yokohama City, Kanagawa, 230-0045, Japan
| | - Masanori Osawa
- Division of Physics for Life Functions, Keio University Faculty of Pharmacy, 1-5-30 Shibakoen, Minato-ku, Tokyo, 105-8512, Japan
| | - Teruki Honma
- RIKEN Center for Biosystems Dynamics Research, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama City, Kanagawa, 230-0045, Japan
| | - Kazuyoshi Ikeda
- HPC- and AI-driven Drug Development Platform Division, RIKEN Center for Computational Science, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama City, Kanagawa, 230-0045, Japan.
- Division of Physics for Life Functions, Keio University Faculty of Pharmacy, 1-5-30 Shibakoen, Minato-ku, Tokyo, 105-8512, Japan.
| |
Collapse
|
18
|
Šícho M, Luukkonen S, van den Maagdenberg HW, Schoenmaker L, Béquignon OJM, van Westen GJP. DrugEx: Deep Learning Models and Tools for Exploration of Drug-Like Chemical Space. J Chem Inf Model 2023; 63:3629-3636. [PMID: 37272707 PMCID: PMC10306259 DOI: 10.1021/acs.jcim.3c00434] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2023] [Indexed: 06/06/2023]
Abstract
The discovery of novel molecules with desirable properties is a classic challenge in medicinal chemistry. With the recent advancements of machine learning, there has been a surge of de novo drug design tools. However, few resources exist that are user-friendly as well as easily customizable. In this application note, we present the new versatile open-source software package DrugEx for multiobjective reinforcement learning. This package contains the consolidated and redesigned scripts from the prior DrugEx papers including multiple generator architectures, a variety of scoring tools, and multiobjective optimization methods. It has a flexible application programming interface and can readily be used via the command line interface or the graphical user interface GenUI. The DrugEx package is publicly available at https://github.com/CDDLeiden/DrugEx.
Collapse
Affiliation(s)
- Martin Šícho
- Leiden
Academic Centre for Drug Research, Leiden
University, 55 Einsteinweg, 2333 CC, Leiden, The Netherlands
- CZ-OPENSCREEN:
National Infrastructure for Chemical Biology, Department of Informatics
and Chemistry, Faculty of Chemical Technology, University of Chemistry and Technology Prague, Technická 5, 166 28, Prague, Czech Republic
| | - Sohvi Luukkonen
- Leiden
Academic Centre for Drug Research, Leiden
University, 55 Einsteinweg, 2333 CC, Leiden, The Netherlands
| | | | - Linde Schoenmaker
- Leiden
Academic Centre for Drug Research, Leiden
University, 55 Einsteinweg, 2333 CC, Leiden, The Netherlands
| | - Olivier J. M. Béquignon
- Leiden
Academic Centre for Drug Research, Leiden
University, 55 Einsteinweg, 2333 CC, Leiden, The Netherlands
| | - Gerard J. P. van Westen
- Leiden
Academic Centre for Drug Research, Leiden
University, 55 Einsteinweg, 2333 CC, Leiden, The Netherlands
| |
Collapse
|
19
|
Liu X, Ye K, van Vlijmen HWT, IJzerman AP, van Westen GJP. DrugEx v3: scaffold-constrained drug design with graph transformer-based reinforcement learning. J Cheminform 2023; 15:24. [PMID: 36803659 PMCID: PMC9940339 DOI: 10.1186/s13321-023-00694-z] [Citation(s) in RCA: 18] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2022] [Accepted: 02/06/2023] [Indexed: 02/22/2023] Open
Abstract
Rational drug design often starts from specific scaffolds to which side chains/substituents are added or modified due to the large drug-like chemical space available to search for novel drug-like molecules. With the rapid growth of deep learning in drug discovery, a variety of effective approaches have been developed for de novo drug design. In previous work we proposed a method named DrugEx, which can be applied in polypharmacology based on multi-objective deep reinforcement learning. However, the previous version is trained under fixed objectives and does not allow users to input any prior information (i.e. a desired scaffold). In order to improve the general applicability, we updated DrugEx to design drug molecules based on scaffolds which consist of multiple fragments provided by users. Here, a Transformer model was employed to generate molecular structures. The Transformer is a multi-head self-attention deep learning model containing an encoder to receive scaffolds as input and a decoder to generate molecules as output. In order to deal with the graph representation of molecules a novel positional encoding for each atom and bond based on an adjacency matrix was proposed, extending the architecture of the Transformer. The graph Transformer model contains growing and connecting procedures for molecule generation starting from a given scaffold based on fragments. Moreover, the generator was trained under a reinforcement learning framework to increase the number of desired ligands. As a proof of concept, the method was applied to design ligands for the adenosine A2A receptor (A2AAR) and compared with SMILES-based methods. The results show that 100% of the generated molecules are valid and most of them had a high predicted affinity value towards A2AAR with given scaffolds.
Collapse
Affiliation(s)
- Xuhan Liu
- grid.5132.50000 0001 2312 1970Drug Discovery and Safety, Leiden Academic Centre for Drug Research, Einsteinweg 55, Leiden, The Netherlands
| | - Kai Ye
- grid.43169.390000 0001 0599 1243School of Electrics and Information Engineering, Xi’an Jiaotong University, 28 XianningW Rd, Xi’an, China
| | - Herman W. T. van Vlijmen
- grid.5132.50000 0001 2312 1970Drug Discovery and Safety, Leiden Academic Centre for Drug Research, Einsteinweg 55, Leiden, The Netherlands ,grid.419619.20000 0004 0623 0341Janssen Pharmaceutica NV, Turnhoutseweg 30, B-2340 Beerse, Belgium
| | - Adriaan P. IJzerman
- grid.5132.50000 0001 2312 1970Drug Discovery and Safety, Leiden Academic Centre for Drug Research, Einsteinweg 55, Leiden, The Netherlands
| | - Gerard J. P. van Westen
- grid.5132.50000 0001 2312 1970Drug Discovery and Safety, Leiden Academic Centre for Drug Research, Einsteinweg 55, Leiden, The Netherlands
| |
Collapse
|
20
|
Schoenmaker L, Béquignon OJM, Jespers W, van Westen GJP. UnCorrupt SMILES: a novel approach to de novo design. J Cheminform 2023; 15:22. [PMID: 36788579 PMCID: PMC9926805 DOI: 10.1186/s13321-023-00696-x] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2022] [Accepted: 02/06/2023] [Indexed: 02/16/2023] Open
Abstract
Generative deep learning models have emerged as a powerful approach for de novo drug design as they aid researchers in finding new molecules with desired properties. Despite continuous improvements in the field, a subset of the outputs that sequence-based de novo generators produce cannot be progressed due to errors. Here, we propose to fix these invalid outputs post hoc. In similar tasks, transformer models from the field of natural language processing have been shown to be very effective. Therefore, here this type of model was trained to translate invalid Simplified Molecular-Input Line-Entry System (SMILES) into valid representations. The performance of this SMILES corrector was evaluated on four representative methods of de novo generation: a recurrent neural network (RNN), a target-directed RNN, a generative adversarial network (GAN), and a variational autoencoder (VAE). This study has found that the percentage of invalid outputs from these specific generative models ranges between 4 and 89%, with different models having different error-type distributions. Post hoc correction of SMILES was shown to increase model validity. The SMILES corrector trained with one error per input alters 60-90% of invalid generator outputs and fixes 35-80% of them. However, a higher error detection and performance was obtained for transformer models trained with multiple errors per input. In this case, the best model was able to correct 60-95% of invalid generator outputs. Further analysis showed that these fixed molecules are comparable to the correct molecules from the de novo generators based on novelty and similarity. Additionally, the SMILES corrector can be used to expand the amount of interesting new molecules within the targeted chemical space. Introducing different errors into existing molecules yields novel analogs with a uniqueness of 39% and a novelty of approximately 20%. The results of this research demonstrate that SMILES correction is a viable post hoc extension and can enhance the search for better drug candidates.
Collapse
Affiliation(s)
- Linde Schoenmaker
- grid.5132.50000 0001 2312 1970Computational Drug Discovery, Drug Discovery and Safety, Leiden Academic Centre for Drug Research, Einsteinweg 55, Leiden, The Netherlands
| | - Olivier J. M. Béquignon
- grid.5132.50000 0001 2312 1970Computational Drug Discovery, Drug Discovery and Safety, Leiden Academic Centre for Drug Research, Einsteinweg 55, Leiden, The Netherlands
| | - Willem Jespers
- grid.5132.50000 0001 2312 1970Computational Drug Discovery, Drug Discovery and Safety, Leiden Academic Centre for Drug Research, Einsteinweg 55, Leiden, The Netherlands
| | - Gerard J. P. van Westen
- grid.5132.50000 0001 2312 1970Computational Drug Discovery, Drug Discovery and Safety, Leiden Academic Centre for Drug Research, Einsteinweg 55, Leiden, The Netherlands
| |
Collapse
|
21
|
Fromer JC, Coley CW. Computer-aided multi-objective optimization in small molecule discovery. PATTERNS (NEW YORK, N.Y.) 2023; 4:100678. [PMID: 36873904 PMCID: PMC9982302 DOI: 10.1016/j.patter.2023.100678] [Citation(s) in RCA: 25] [Impact Index Per Article: 25.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/12/2023]
Abstract
Molecular discovery is a multi-objective optimization problem that requires identifying a molecule or set of molecules that balance multiple, often competing, properties. Multi-objective molecular design is commonly addressed by combining properties of interest into a single objective function using scalarization, which imposes assumptions about relative importance and uncovers little about the trade-offs between objectives. In contrast to scalarization, Pareto optimization does not require knowledge of relative importance and reveals the trade-offs between objectives. However, it introduces additional considerations in algorithm design. In this review, we describe pool-based and de novo generative approaches to multi-objective molecular discovery with a focus on Pareto optimization algorithms. We show how pool-based molecular discovery is a relatively direct extension of multi-objective Bayesian optimization and how the plethora of different generative models extend from single-objective to multi-objective optimization in similar ways using non-dominated sorting in the reward function (reinforcement learning) or to select molecules for retraining (distribution learning) or propagation (genetic algorithms). Finally, we discuss some remaining challenges and opportunities in the field, emphasizing the opportunity to adopt Bayesian optimization techniques into multi-objective de novo design.
Collapse
Affiliation(s)
- Jenna C Fromer
- Department of Chemical Engineering, MIT, Cambridge, MA 02139, USA
| | - Connor W Coley
- Department of Chemical Engineering, MIT, Cambridge, MA 02139, USA.,Department of Electrical Engineering and Computer Science, MIT, Cambridge, MA 02139, USA
| |
Collapse
|
22
|
Abstract
Citations are an essential aspect of research communication and have become the basis of many evaluation metrics in the academic world. Some see citation counts as a mark of scientific impact or even quality, but in reality the reasons for citing other work are manifold which makes the interpretation more complicated than a single citation count can reflect. Two years ago, the Journal of Cheminformatics proposed the CiTO Pilot for the adoption of a practice of annotating citations with their citation intentions. Basically, when you cite a journal article or dataset (or any other source), you also explain why specifically you cite that source. Particularly, the agreement and disagreement and reuse of methods and data are of interest. This article explores what happened after the launch of the pilot. We summarize how authors in the Journal of Cheminformatics used the pilot, shows citation annotations are distributed with Wikidata, visualized with Scholia, discusses adoption outside BMC, and finally present some thoughts on what needs to happen next.
Collapse
|
23
|
Yoshizawa T, Ishida S, Sato T, Ohta M, Honma T, Terayama K. Selective Inhibitor Design for Kinase Homologs Using Multiobjective Monte Carlo Tree Search. J Chem Inf Model 2022; 62:5351-5360. [DOI: 10.1021/acs.jcim.2c00787] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Affiliation(s)
- Tatsuya Yoshizawa
- Graduate School of Medical Life Science, Yokohama City University, Tsurumi-ku, Yokohama230-0045, Japan
| | - Shoichi Ishida
- Graduate School of Medical Life Science, Yokohama City University, Tsurumi-ku, Yokohama230-0045, Japan
| | - Tomohiro Sato
- RIKEN Center for Biosystems Dynamics Research, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama230-0045, Japan
| | - Masateru Ohta
- HPC- and AI-driven Drug Development Platform Division, Center for Computational Science, RIKEN, Yokohama230-0045, Japan
| | - Teruki Honma
- RIKEN Center for Biosystems Dynamics Research, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama230-0045, Japan
| | - Kei Terayama
- Graduate School of Medical Life Science, Yokohama City University, Tsurumi-ku, Yokohama230-0045, Japan
| |
Collapse
|
24
|
Thomas M, O’Boyle NM, Bender A, de Graaf C. Augmented Hill-Climb increases reinforcement learning efficiency for language-based de novo molecule generation. J Cheminform 2022; 14:68. [PMID: 36192789 PMCID: PMC9531503 DOI: 10.1186/s13321-022-00646-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2022] [Accepted: 09/23/2022] [Indexed: 11/10/2022] Open
Abstract
A plethora of AI-based techniques now exists to conduct de novo molecule generation that can devise molecules conditioned towards a particular endpoint in the context of drug design. One popular approach is using reinforcement learning to update a recurrent neural network or language-based de novo molecule generator. However, reinforcement learning can be inefficient, sometimes requiring up to 105 molecules to be sampled to optimize more complex objectives, which poses a limitation when using computationally expensive scoring functions like docking or computer-aided synthesis planning models. In this work, we propose a reinforcement learning strategy called Augmented Hill-Climb based on a simple, hypothesis-driven hybrid between REINVENT and Hill-Climb that improves sample-efficiency by addressing the limitations of both currently used strategies. We compare its ability to optimize several docking tasks with REINVENT and benchmark this strategy against other commonly used reinforcement learning strategies including REINFORCE, REINVENT (version 1 and 2), Hill-Climb and best agent reminder. We find that optimization ability is improved ~ 1.5-fold and sample-efficiency is improved ~ 45-fold compared to REINVENT while still delivering appealing chemistry as output. Diversity filters were used, and their parameters were tuned to overcome observed failure modes that take advantage of certain diversity filter configurations. We find that Augmented Hill-Climb outperforms the other reinforcement learning strategies used on six tasks, especially in the early stages of training or for more difficult objectives. Lastly, we show improved performance not only on recurrent neural networks but also on a reinforcement learning stabilized transformer architecture. Overall, we show that Augmented Hill-Climb improves sample-efficiency for language-based de novo molecule generation conditioning via reinforcement learning, compared to the current state-of-the-art. This makes more computationally expensive scoring functions, such as docking, more accessible on a relevant timescale.
Collapse
Affiliation(s)
- Morgan Thomas
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Cambridge, CB2 1EW UK
| | - Noel M. O’Boyle
- Computational Chemistry, Sosei Heptares, Steinmetz Building, Granta Park, Great Abington, Cambridge, CB21 6DG UK
| | - Andreas Bender
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Cambridge, CB2 1EW UK
| | - Chris de Graaf
- Computational Chemistry, Sosei Heptares, Steinmetz Building, Granta Park, Great Abington, Cambridge, CB21 6DG UK
| |
Collapse
|
25
|
Wang X, Li F, Chen J, Teng Y, Ji C, Wu H. Critical features identification for chemical chronic toxicity based on mechanistic forecast models. ENVIRONMENTAL POLLUTION (BARKING, ESSEX : 1987) 2022; 307:119584. [PMID: 35688391 DOI: 10.1016/j.envpol.2022.119584] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/03/2022] [Revised: 05/03/2022] [Accepted: 06/03/2022] [Indexed: 06/15/2023]
Abstract
Facing billions of tons of pollutants entering the ocean each year, aquatic toxicity is becoming a crucial endpoint for evaluating chemical adverse effects on ecosystems. Notably, huge amount of toxic chemicals at environmental relevant doses can cause potential adverse effects. However, chronic aquatic toxicity effects of chemicals are much scarcer, especially at population level. Rotifers are highly sensitive to toxicants even at chronic low-doses and their communities are usually considered as effective indicators for assessing the status of aquatic ecosystems. Therefore, the no observed effect concentration (NOEC) for population abundance of rotifers were selected as endpoints to develop machine learning models for the prediction of chemical aquatic chronic toxicity. In this study, forty-eight binary models were built by eight types of chemical descriptors combined with six machine learning algorithms. The best binary model was 1D & 2D molecular descriptors - random trees model (RT) with high balanced accuracy (BA) (0.83 for training and 0.83 for validation set), and Matthews correlation coefficient (MCC) (0.72 for training set and 0.67 for validation set). Moreover, the optimal model identified the primary factors (SpMAD_Dzp, AMW, MATS2v) and filtered out three high alerting substructures [c1cc(Cl)cc1, CNCO, CCOP(=S)(OCC)O] influencing the chronic aquatic toxicity. These results showed that the compounds with low molecular volume, high polarity and molecular weight could contribute to adverse effects on rotifers, facilitating the deeper understanding of chronic toxicity mechanisms. In addition, forecast models had better performances than the common models embedded into ECOSAR software. This study provided insights into structural features responsible for the toxicity of different groups of chemicals and thereby allowed for the rational design of green and safer alternatives.
Collapse
Affiliation(s)
- Xiaoqing Wang
- CAS Key Laboratory of Coastal Environmental Processes and Ecological Remediation, Yantai Institute of Coastal Zone Research (YIC), Chinese Academy of Sciences (CAS), Shandong Key Laboratory of Coastal Environmental Processes, YICCAS, Yantai, 264003, PR China; University of Chinese Academy of Sciences, Beijing, 100049, PR China
| | - Fei Li
- CAS Key Laboratory of Coastal Environmental Processes and Ecological Remediation, Yantai Institute of Coastal Zone Research (YIC), Chinese Academy of Sciences (CAS), Shandong Key Laboratory of Coastal Environmental Processes, YICCAS, Yantai, 264003, PR China; Center for Ocean Mega-Science, Chinese Academy of Sciences, Qingdao, 266071, PR China.
| | - Jingwen Chen
- Key Laboratory of Industrial Ecology and Environmental Engineering (MOE), School of Environmental Science and Technology, Dalian University of Technology, Linggong Road 2, Dalian, 116024, China
| | - Yuefa Teng
- CAS Key Laboratory of Coastal Environmental Processes and Ecological Remediation, Yantai Institute of Coastal Zone Research (YIC), Chinese Academy of Sciences (CAS), Shandong Key Laboratory of Coastal Environmental Processes, YICCAS, Yantai, 264003, PR China; University of Chinese Academy of Sciences, Beijing, 100049, PR China
| | - Chenglong Ji
- CAS Key Laboratory of Coastal Environmental Processes and Ecological Remediation, Yantai Institute of Coastal Zone Research (YIC), Chinese Academy of Sciences (CAS), Shandong Key Laboratory of Coastal Environmental Processes, YICCAS, Yantai, 264003, PR China; Center for Ocean Mega-Science, Chinese Academy of Sciences, Qingdao, 266071, PR China
| | - Huifeng Wu
- CAS Key Laboratory of Coastal Environmental Processes and Ecological Remediation, Yantai Institute of Coastal Zone Research (YIC), Chinese Academy of Sciences (CAS), Shandong Key Laboratory of Coastal Environmental Processes, YICCAS, Yantai, 264003, PR China; Center for Ocean Mega-Science, Chinese Academy of Sciences, Qingdao, 266071, PR China
| |
Collapse
|
26
|
Goldman B, Kearnes S, Kramer T, Riley P, Walters WP. Defining Levels of Automated Chemical Design. J Med Chem 2022; 65:7073-7087. [PMID: 35511951 PMCID: PMC9150065 DOI: 10.1021/acs.jmedchem.2c00334] [Citation(s) in RCA: 16] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2022] [Indexed: 01/07/2023]
Abstract
One application area of computational methods in drug discovery is the automated design of small molecules. Despite the large number of publications describing methods and their application in both retrospective and prospective studies, there is a lack of agreement on terminology and key attributes to distinguish these various systems. We introduce Automated Chemical Design (ACD) Levels to clearly define the level of autonomy along the axes of ideation and decision making. To fully illustrate this framework, we provide literature exemplars and place some notable methods and applications into the levels. The ACD framework provides a common language for describing automated small molecule design systems and enables medicinal chemists to better understand and evaluate such systems.
Collapse
Affiliation(s)
- Brian Goldman
- Relay
Therapeutics, 399 Binney Street, Cambridge, Massachusetts 02139, United States
| | - Steven Kearnes
- Relay
Therapeutics, 399 Binney Street, Cambridge, Massachusetts 02139, United States
| | - Trevor Kramer
- Relay
Therapeutics, 399 Binney Street, Cambridge, Massachusetts 02139, United States
| | - Patrick Riley
- Relay
Therapeutics, 399 Binney Street, Cambridge, Massachusetts 02139, United States
| | - W. Patrick Walters
- Relay
Therapeutics, 399 Binney Street, Cambridge, Massachusetts 02139, United States
| |
Collapse
|
27
|
Martinelli DD. Generative machine learning for de novo drug discovery: A systematic review. Comput Biol Med 2022; 145:105403. [PMID: 35339849 DOI: 10.1016/j.compbiomed.2022.105403] [Citation(s) in RCA: 25] [Impact Index Per Article: 12.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2022] [Revised: 03/10/2022] [Accepted: 03/11/2022] [Indexed: 02/08/2023]
Abstract
Recent research on artificial intelligence indicates that machine learning algorithms can auto-generate novel drug-like molecules. Generative models have revolutionized de novo drug discovery, rendering the explorative process more efficient. Several model frameworks and input formats have been proposed to enhance the performance of intelligent algorithms in generative molecular design. In this systematic literature review of experimental articles and reviews over the last five years, machine learning models, challenges associated with computational molecule design along with proposed solutions, and molecular encoding methods are discussed. A query-based search of the PubMed, ScienceDirect, Springer, Wiley Online Library, arXiv, MDPI, bioRxiv, and IEEE Xplore databases yielded 87 studies. Twelve additional studies were identified via citation searching. Of the articles in which machine learning was implemented, six prominent algorithms were identified: long short-term memory recurrent neural networks (LSTM-RNNs), variational autoencoders (VAEs), generative adversarial networks (GANs), adversarial autoencoders (AAEs), evolutionary algorithms, and gated recurrent unit (GRU-RNNs). Furthermore, eight central challenges were designated: homogeneity of generated molecular libraries, deficient synthesizability, limited assay data, model interpretability, incapacity for multi-property optimization, incomparability, restricted molecule size, and uncertainty in model evaluation. Molecules were encoded either as strings, which were occasionally augmented using randomization, as 2D graphs, or as 3D graphs. Statistical analysis and visualization are performed to illustrate how approaches to machine learning in de novo drug design have evolved over the past five years. Finally, future opportunities and reservations are discussed.
Collapse
|
28
|
Deng J, Yang Z, Ojima I, Samaras D, Wang F. Artificial intelligence in drug discovery: applications and techniques. Brief Bioinform 2021; 23:6420092. [PMID: 34734228 DOI: 10.1093/bib/bbab430] [Citation(s) in RCA: 37] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2021] [Revised: 08/02/2021] [Accepted: 09/18/2021] [Indexed: 12/23/2022] Open
Abstract
Artificial intelligence (AI) has been transforming the practice of drug discovery in the past decade. Various AI techniques have been used in many drug discovery applications, such as virtual screening and drug design. In this survey, we first give an overview on drug discovery and discuss related applications, which can be reduced to two major tasks, i.e. molecular property prediction and molecule generation. We then present common data resources, molecule representations and benchmark platforms. As a major part of the survey, AI techniques are dissected into model architectures and learning paradigms. To reflect the technical development of AI in drug discovery over the years, the surveyed works are organized chronologically. We expect that this survey provides a comprehensive review on AI in drug discovery. We also provide a GitHub repository with a collection of papers (and codes, if applicable) as a learning resource, which is regularly updated.
Collapse
Affiliation(s)
- Jianyuan Deng
- Department of Biomedical Informatics, Stony Brook University, Stony Brook, NY 11790, USA
| | - Zhibo Yang
- Department of Computer Science, Stony Brook University, Stony Brook, NY 11790, USA
| | - Iwao Ojima
- Department of Chemistry, Stony Brook University, Stony Brook, NY 11790, USA
| | - Dimitris Samaras
- Department of Computer Science, Stony Brook University, Stony Brook, NY 11790, USA
| | - Fusheng Wang
- Department of Biomedical Informatics, Stony Brook University, Stony Brook, NY 11790, USA.,Department of Computer Science, Stony Brook University, Stony Brook, NY 11790, USA
| |
Collapse
|