1
|
Wu T, Zhou M, Zou J, Chen Q, Qian F, Kurths J, Liu R, Tang Y. AI-guided few-shot inverse design of HDP-mimicking polymers against drug-resistant bacteria. Nat Commun 2024; 15:6288. [PMID: 39060236 PMCID: PMC11282099 DOI: 10.1038/s41467-024-50533-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2023] [Accepted: 07/11/2024] [Indexed: 07/28/2024] Open
Abstract
Host defense peptide (HDP)-mimicking polymers are promising therapeutic alternatives to antibiotics and have large-scale untapped potential. Artificial intelligence (AI) exhibits promising performance on large-scale chemical-content design, however, existing AI methods face difficulties on scarcity data in each family of HDP-mimicking polymers (<102), much smaller than public polymer datasets (>105), and multi-constraints on properties and structures when exploring high-dimensional polymer space. Herein, we develop a universal AI-guided few-shot inverse design framework by designing multi-modal representations to enrich polymer information for predictions and creating a graph grammar distillation for chemical space restriction to improve the efficiency of multi-constrained polymer generation with reinforcement learning. Exampled with HDP-mimicking β-amino acid polymers, we successfully simulate predictions of over 105 polymers and identify 83 optimal polymers. Furthermore, we synthesize an optimal polymer DM0.8iPen0.2 and find that this polymer exhibits broad-spectrum and potent antibacterial activity against multiple clinically isolated antibiotic-resistant pathogens, validating the effectiveness of AI-guided design strategy.
Collapse
Affiliation(s)
- Tianyu Wu
- Key Laboratory of Smart Manufacturing in Energy Chemical Process, East China University of Science and Technology, Shanghai, 200237, China
| | - Min Zhou
- State Key Laboratory of Bioreactor Engineering, East China University of Science and Technology, Shanghai, 200237, China
| | - Jingcheng Zou
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, Frontiers Science Center for Materiobiology and Dynamic Chemistry, Key Laboratory for Ultrafine Materials of Ministry of Education, Research Center for Biomedical Materials of Ministry of Education, School of Materials Science and Engineering, East China University of Science and Technology, Shanghai, 200237, China
| | - Qi Chen
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, Frontiers Science Center for Materiobiology and Dynamic Chemistry, Key Laboratory for Ultrafine Materials of Ministry of Education, Research Center for Biomedical Materials of Ministry of Education, School of Materials Science and Engineering, East China University of Science and Technology, Shanghai, 200237, China
| | - Feng Qian
- Key Laboratory of Smart Manufacturing in Energy Chemical Process, East China University of Science and Technology, Shanghai, 200237, China
| | - Jürgen Kurths
- Potsdam Institute for Climate Impact Research (PIK), Potsdam, 14473, Germany
- Institut für Physik, Humboldt-Universität zu Berlin, Berlin, 10115, Germany
- The Research Institute of Intelligent Complex Systems, Fudan University, Shanghai, 200433, China
| | - Runhui Liu
- State Key Laboratory of Bioreactor Engineering, East China University of Science and Technology, Shanghai, 200237, China.
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, Frontiers Science Center for Materiobiology and Dynamic Chemistry, Key Laboratory for Ultrafine Materials of Ministry of Education, Research Center for Biomedical Materials of Ministry of Education, School of Materials Science and Engineering, East China University of Science and Technology, Shanghai, 200237, China.
| | - Yang Tang
- Key Laboratory of Smart Manufacturing in Energy Chemical Process, East China University of Science and Technology, Shanghai, 200237, China.
| |
Collapse
|
2
|
Thomas M, Ahmad M, Tresadern G, de Fabritiis G. PromptSMILES: prompting for scaffold decoration and fragment linking in chemical language models. J Cheminform 2024; 16:77. [PMID: 38965600 PMCID: PMC11225391 DOI: 10.1186/s13321-024-00866-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2024] [Accepted: 06/04/2024] [Indexed: 07/06/2024] Open
Abstract
SMILES-based generative models are amongst the most robust and successful recent methods used to augment drug design. They are typically used for complete de novo generation, however, scaffold decoration and fragment linking applications are sometimes desirable which requires a different grammar, architecture, training dataset and therefore, re-training of a new model. In this work, we describe a simple procedure to conduct constrained molecule generation with a SMILES-based generative model to extend applicability to scaffold decoration and fragment linking by providing SMILES prompts, without the need for re-training. In combination with reinforcement learning, we show that pre-trained, decoder-only models adapt to these applications quickly and can further optimize molecule generation towards a specified objective. We compare the performance of this approach to a variety of orthogonal approaches and show that performance is comparable or better. For convenience, we provide an easy-to-use python package to facilitate model sampling which can be found on GitHub and the Python Package Index.Scientific contributionThis novel method extends an autoregressive chemical language model to scaffold decoration and fragment linking scenarios. This doesn't require re-training, the use of a bespoke grammar, or curation of a custom dataset, as commonly required by other approaches.
Collapse
Affiliation(s)
- Morgan Thomas
- Computational Science Laboratory, Universitat Pompeu Fabra, Barcelona Biomedical Research Park (PRBB), C Dr. Aguiader 88, 08003, Barcelona, Spain.
| | - Mazen Ahmad
- In Silico Discovery, Janssen Pharmaceutica N. V., Turnhoutseweg 30, 2340, Beerse, Belgium
| | - Gary Tresadern
- In Silico Discovery, Janssen Pharmaceutica N. V., Turnhoutseweg 30, 2340, Beerse, Belgium
| | - Gianni de Fabritiis
- Computational Science Laboratory, Universitat Pompeu Fabra, Barcelona Biomedical Research Park (PRBB), C Dr. Aguiader 88, 08003, Barcelona, Spain.
- Acellera Labs, C Dr. Trueta 183, 08005, Barcelona, Spain.
- Institució Catalana de Recerca i Estudis Avançats (ICREA), Passeig Lluis Companys 23, 08010, Barcelona, Spain.
| |
Collapse
|
3
|
Gangwal A, Ansari A, Ahmad I, Azad AK, Wan Sulaiman WMA. Current strategies to address data scarcity in artificial intelligence-based drug discovery: A comprehensive review. Comput Biol Med 2024; 179:108734. [PMID: 38964243 DOI: 10.1016/j.compbiomed.2024.108734] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2024] [Revised: 06/01/2024] [Accepted: 06/08/2024] [Indexed: 07/06/2024]
Abstract
Artificial intelligence (AI) has played a vital role in computer-aided drug design (CADD). This development has been further accelerated with the increasing use of machine learning (ML), mainly deep learning (DL), and computing hardware and software advancements. As a result, initial doubts about the application of AI in drug discovery have been dispelled, leading to significant benefits in medicinal chemistry. At the same time, it is crucial to recognize that AI is still in its infancy and faces a few limitations that need to be addressed to harness its full potential in drug discovery. Some notable limitations are insufficient, unlabeled, and non-uniform data, the resemblance of some AI-generated molecules with existing molecules, unavailability of inadequate benchmarks, intellectual property rights (IPRs) related hurdles in data sharing, poor understanding of biology, focus on proxy data and ligands, lack of holistic methods to represent input (molecular structures) to prevent pre-processing of input molecules (feature engineering), etc. The major component in AI infrastructure is input data, as most of the successes of AI-driven efforts to improve drug discovery depend on the quality and quantity of data, used to train and test AI algorithms, besides a few other factors. Additionally, data-gulping DL approaches, without sufficient data, may collapse to live up to their promise. Current literature suggests a few methods, to certain extent, effectively handle low data for better output from the AI models in the context of drug discovery. These are transferring learning (TL), active learning (AL), single or one-shot learning (OSL), multi-task learning (MTL), data augmentation (DA), data synthesis (DS), etc. One different method, which enables sharing of proprietary data on a common platform (without compromising data privacy) to train ML model, is federated learning (FL). In this review, we compare and discuss these methods, their recent applications, and limitations while modeling small molecule data to get the improved output of AI methods in drug discovery. Article also sums up some other novel methods to handle inadequate data.
Collapse
Affiliation(s)
- Amit Gangwal
- Department of Natural Product Chemistry, Shri Vile Parle Kelavani Mandal's Institute of Pharmacy, Dhule, 424001, Maharashtra, India.
| | - Azim Ansari
- Computer Aided Drug Design Center, Shri Vile Parle Kelavani Mandal's Institute of Pharmacy, Dhule, 424001, Maharashtra, India
| | - Iqrar Ahmad
- Department of Pharmaceutical Chemistry, Prof. Ravindra Nikam College of Pharmacy, Gondur, Dhule, 424002, Maharashtra, India.
| | - Abul Kalam Azad
- Faculty of Pharmacy, University College of MAIWP International, Batu Caves, 68100, Kuala Lumpur, Malaysia.
| | | |
Collapse
|
4
|
Zhang Q, Zuo L, Ren Y, Wang S, Wang W, Ma L, Zhang J, Xia B. FMCA-DTI: a fragment-oriented method based on a multihead cross attention mechanism to improve drug-target interaction prediction. Bioinformatics 2024; 40:btae347. [PMID: 38810106 PMCID: PMC11256963 DOI: 10.1093/bioinformatics/btae347] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2023] [Revised: 04/23/2024] [Accepted: 05/28/2024] [Indexed: 05/31/2024] Open
Abstract
MOTIVATION Identifying drug-target interactions (DTI) is crucial in drug discovery. Fragments are less complex and can accurately characterize local features, which is important in DTI prediction. Recently, deep learning (DL)-based methods predict DTI more efficiently. However, two challenges remain in existing DL-based methods: (i) some methods directly encode drugs and proteins into integers, ignoring the substructure representation; (ii) some methods learn the features of the drugs and proteins separately instead of considering their interactions. RESULTS In this article, we propose a fragment-oriented method based on a multihead cross attention mechanism for predicting DTI, named FMCA-DTI. FMCA-DTI obtains multiple types of fragments of drugs and proteins by branch chain mining and category fragment mining. Importantly, FMCA-DTI utilizes the shared-weight-based multihead cross attention mechanism to learn the complex interaction features between different fragments. Experiments on three benchmark datasets show that FMCA-DTI achieves significantly improved performance by comparing it with four state-of-the-art baselines. AVAILABILITY AND IMPLEMENTATION The code for this workflow is available at: https://github.com/jacky102022/FMCA-DTI.
Collapse
Affiliation(s)
- Qi Zhang
- College of Mathematics and Computer Science, Yan'an University, Yan'an 716000, China
| | - Le Zuo
- College of Mathematics and Computer Science, Yan'an University, Yan'an 716000, China
| | - Ying Ren
- College of Mathematics and Computer Science, Yan'an University, Yan'an 716000, China
| | - Siyuan Wang
- College of Mathematics and Computer Science, Yan'an University, Yan'an 716000, China
| | - Wenfa Wang
- College of Mathematics and Computer Science, Yan'an University, Yan'an 716000, China
| | - Lerong Ma
- College of Mathematics and Computer Science, Yan'an University, Yan'an 716000, China
| | - Jing Zhang
- Medical College of Yan'an University, Yan'an University, Yan'an 716000, China
- Medical Research and Experimental Center, The Second Affiliated Hospital of Xi'an Medical University, Xi'an 710021, China
| | - Bisheng Xia
- College of Mathematics and Computer Science, Yan'an University, Yan'an 716000, China
| |
Collapse
|
5
|
Chandraghatgi R, Ji HF, Rosen GL, Sokhansanj BA. Streamlining Computational Fragment-Based Drug Discovery through Evolutionary Optimization Informed by Ligand-Based Virtual Prescreening. J Chem Inf Model 2024; 64:3826-3840. [PMID: 38696451 PMCID: PMC11197033 DOI: 10.1021/acs.jcim.4c00234] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2024] [Revised: 04/18/2024] [Accepted: 04/19/2024] [Indexed: 05/04/2024]
Abstract
Recent advances in computational methods provide the promise of dramatically accelerating drug discovery. While mathematical modeling and machine learning have become vital in predicting drug-target interactions and properties, there is untapped potential in computational drug discovery due to the vast and complex chemical space. This paper builds on our recently published computational fragment-based drug discovery (FBDD) method called fragment databases from screened ligand drug discovery (FDSL-DD). FDSL-DD uses in silico screening to identify ligands from a vast library, fragmenting them while attaching specific attributes based on predicted binding affinity and interaction with the target subdomain. In this paper, we further propose a two-stage optimization method that utilizes the information from prescreening to optimize computational ligand synthesis. We hypothesize that using prescreening information for optimization shrinks the search space and focuses on promising regions, thereby improving the optimization for candidate ligands. The first optimization stage assembles these fragments into larger compounds using genetic algorithms, followed by a second stage of iterative refinement to produce compounds with enhanced bioactivity. To demonstrate broad applicability, the methodology is demonstrated on three diverse protein targets found in human solid cancers, bacterial antimicrobial resistance, and the SARS-CoV-2 virus. Combined, the proposed FDSL-DD and a two-stage optimization approach yield high-affinity ligand candidates more efficiently than other state-of-the-art computational FBDD methods. We further show that a multiobjective optimization method accounting for drug-likeness can still produce potential candidate ligands with a high binding affinity. Overall, the results demonstrate that integrating detailed chemical information with a constrained search framework can markedly optimize the initial drug discovery process, offering a more precise and efficient route to developing new therapeutics.
Collapse
Affiliation(s)
- Rohan Chandraghatgi
- Department
of Biology, Drexel University, Philadelphia, Pennsylvania 19104, United States
| | - Hai-Feng Ji
- Department
of Chemistry, Drexel University, Philadelphia, Pennsylvania 19104, United States
| | - Gail L. Rosen
- Department
of Electrical & Computer Engineering, Drexel University, Philadelphia, Pennsylvania 19104, United States
| | - Bahrad A. Sokhansanj
- Department
of Electrical & Computer Engineering, Drexel University, Philadelphia, Pennsylvania 19104, United States
| |
Collapse
|
6
|
Wang S, Liang D, Wang J, Dong K, Zhang Y, Liang H, Xu X, Song T. FraHMT: A Fragment-Oriented Heterogeneous Graph Molecular Generation Model for Target Proteins. J Chem Inf Model 2024; 64:3718-3732. [PMID: 38644797 DOI: 10.1021/acs.jcim.4c00252] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/23/2024]
Abstract
The molecular generation task stands as a pivotal step in the domains of computational chemistry and drug discovery, aiming to computationally generate molecular structures for specific properties. In contrast to previous models that focused primarily on SMILES strings or molecular graphs, our model placed a special emphasis on the substructure information on molecules, enabling the model to learn richer chemical rules and structure features from fragments and chemical reaction information on molecules. To accomplish this, we fragmented the molecules to construct heterogeneous graph representations based on atom and fragment information. Then our model mapped the heterogeneous graph data into a latent vector space by using an encoder and employed a self-regressive generative model as a decoder for molecular generation. Additionally, we performed transfer learning on the model using a small set of ligand molecules known to be active against the target protein to generate molecules that bind better to the target protein. Experimental results demonstrate that our model is highly competitive with state-of-the-art models. It can generate valid and diverse molecules with favorable physicochemical properties and drug-likeness. Importantly, they produce novel molecules with high docking scores against the target proteins.
Collapse
Affiliation(s)
- Shuang Wang
- College of Computer Science and Technology, China University of Petroleum, QingDao 266580, China
| | - Dingming Liang
- College of Computer Science and Technology, China University of Petroleum, QingDao 266580, China
| | - Jianmin Wang
- College of Computer Science and Technology, China University of Petroleum, QingDao 266580, China
- The Interdisciplinary Graduate Program in Integrative Biotechnology, Yonsei University, Incheon 21983, Republic of Korea
| | - Kaiyu Dong
- College of Computer Science and Technology, China University of Petroleum, QingDao 266580, China
| | - Yunjing Zhang
- College of Computer Science and Technology, China University of Petroleum, QingDao 266580, China
| | - Huicong Liang
- Marine Biomedical Institute of Qingdao, School of Medicine and Pharmacy, Ocean University of China, QingDao 266580, China
| | - Ximing Xu
- Marine Biomedical Institute of Qingdao, School of Medicine and Pharmacy, Ocean University of China, QingDao 266580, China
| | - Tao Song
- College of Computer Science and Technology, China University of Petroleum, QingDao 266580, China
- Department of Artificial Intelligence, Faculty of Computer Science, Polytechnical University of Madrid, Madrid 28031, Spain
| |
Collapse
|
7
|
Bhowmik D, Zhang P, Fox Z, Irle S, Gounley J. Enhancing molecular design efficiency: Uniting language models and generative networks with genetic algorithms. PATTERNS (NEW YORK, N.Y.) 2024; 5:100947. [PMID: 38645768 PMCID: PMC11026973 DOI: 10.1016/j.patter.2024.100947] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/26/2023] [Revised: 11/14/2023] [Accepted: 02/08/2024] [Indexed: 04/23/2024]
Abstract
This study examines the effectiveness of generative models in drug discovery, material science, and polymer science, aiming to overcome constraints associated with traditional inverse design methods relying on heuristic rules. Generative models generate synthetic data resembling real data, enabling deep learning model training without extensive labeled datasets. They prove valuable in creating virtual libraries of molecules for material science and facilitating drug discovery by generating molecules with specific properties. While generative adversarial networks (GANs) are explored for these purposes, mode collapse restricts their efficacy, limiting novel structure variability. To address this, we introduce a masked language model (LM) inspired by natural language processing. Although LMs alone can have inherent limitations, we propose a hybrid architecture combining LMs and GANs to efficiently generate new molecules, demonstrating superior performance over standalone masked LMs, particularly for smaller population sizes. This hybrid LM-GAN architecture enhances efficiency in optimizing properties and generating novel samples.
Collapse
Affiliation(s)
- Debsindhu Bhowmik
- Computational Sciences and Engineering Division, Oak Ridge National Laboratory, Oak Ridge, TN 37831, USA
| | - Pei Zhang
- Computational Sciences and Engineering Division, Oak Ridge National Laboratory, Oak Ridge, TN 37831, USA
| | - Zachary Fox
- Computational Sciences and Engineering Division, Oak Ridge National Laboratory, Oak Ridge, TN 37831, USA
| | - Stephan Irle
- Computational Sciences and Engineering Division, Oak Ridge National Laboratory, Oak Ridge, TN 37831, USA
| | - John Gounley
- Computational Sciences and Engineering Division, Oak Ridge National Laboratory, Oak Ridge, TN 37831, USA
| |
Collapse
|
8
|
Ghiandoni GM, Flanagan SR, Bodkin MJ, Nizi MG, Galera-Prat A, Brai A, Chen B, Wallace JEA, Hristozov D, Webster J, Manfroni G, Lehtiö L, Tabarrini O, Gillet VJ. Synthetically accessible de novo design using reaction vectors: Application to PARP1 inhibitors. Mol Inform 2024; 43:e202300183. [PMID: 38258328 DOI: 10.1002/minf.202300183] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2023] [Revised: 01/16/2024] [Accepted: 01/22/2024] [Indexed: 01/24/2024]
Abstract
De novo design has been a hotly pursued topic for many years. Most recent developments have involved the use of deep learning methods for generative molecular design. Despite increasing levels of algorithmic sophistication, the design of molecules that are synthetically accessible remains a major challenge. Reaction-based de novo design takes a conceptually simpler approach and aims to address synthesisability directly by mimicking synthetic chemistry and driving structural transformations by known reactions that are applied in a stepwise manner. However, the use of a small number of hand-coded transformations restricts the chemical space that can be accessed and there are few examples in the literature where molecules and their synthetic routes have been designed and executed successfully. Here we describe the application of reaction-based de novo design to the design of synthetically accessible and biologically active compounds as proof-of-concept of our reaction vector-based software. Reaction vectors are derived automatically from known reactions and allow access to a wide region of synthetically accessible chemical space. The design was aimed at producing molecules that are active against PARP1 and which have improved brain penetration properties compared to existing PARP1 inhibitors. We synthesised a selection of the designed molecules according to the provided synthetic routes and tested them experimentally. The results demonstrate that reaction vectors can be applied to the design of novel molecules of biological relevance that are also synthetically accessible.
Collapse
Affiliation(s)
- Gian Marco Ghiandoni
- Information School, University of Sheffield, Regent Court, 211 Portobello, Sheffield, S1 4DP, UK
| | - Stuart R Flanagan
- Evotec (U.K.) Ltd, 114 Innovation Drive, Milton Park, Abingdon, OX14 4RZ, UK
| | - Michael J Bodkin
- Evotec (U.K.) Ltd, 114 Innovation Drive, Milton Park, Abingdon, OX14 4RZ, UK
| | - Maria Giulia Nizi
- Department of Pharmaceutical Sciences, University of Perugia, 06123, Perugia, Italy
| | - Albert Galera-Prat
- Faculty of Biochemistry and Molecular Medicine & Biocenter Oulu, University of Oulu, Oulu, FI-90014, Finland
| | - Annalaura Brai
- Department of Biotechnology, Chemistry and Pharmacy, University of Siena, I-53100, Siena, Italy
| | - Beining Chen
- Department of Chemistry, University of Sheffield, Dainton Building, Brook Hill, Sheffield, S3 7HF, UK
| | - James E A Wallace
- Evotec (U.K.) Ltd, 114 Innovation Drive, Milton Park, Abingdon, OX14 4RZ, UK
| | - Dimitar Hristozov
- Evotec (U.K.) Ltd, 114 Innovation Drive, Milton Park, Abingdon, OX14 4RZ, UK
| | - James Webster
- Information School, University of Sheffield, Regent Court, 211 Portobello, Sheffield, S1 4DP, UK
| | - Giuseppe Manfroni
- Department of Pharmaceutical Sciences, University of Perugia, 06123, Perugia, Italy
| | - Lari Lehtiö
- Faculty of Biochemistry and Molecular Medicine & Biocenter Oulu, University of Oulu, Oulu, FI-90014, Finland
| | - Oriana Tabarrini
- Department of Pharmaceutical Sciences, University of Perugia, 06123, Perugia, Italy
| | - Valerie J Gillet
- Information School, University of Sheffield, Regent Court, 211 Portobello, Sheffield, S1 4DP, UK
| |
Collapse
|
9
|
Dodds M, Guo J, Löhr T, Tibo A, Engkvist O, Janet JP. Sample efficient reinforcement learning with active learning for molecular design. Chem Sci 2024; 15:4146-4160. [PMID: 38487235 PMCID: PMC10935729 DOI: 10.1039/d3sc04653b] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2023] [Accepted: 02/07/2024] [Indexed: 03/17/2024] Open
Abstract
Reinforcement learning (RL) is a powerful and flexible paradigm for searching for solutions in high-dimensional action spaces. However, bridging the gap between playing computer games with thousands of simulated episodes and solving real scientific problems with complex and involved environments (up to actual laboratory experiments) requires improvements in terms of sample efficiency to make the most of expensive information. The discovery of new drugs is a major commercial application of RL, motivated by the very large nature of the chemical space and the need to perform multiparameter optimization (MPO) across different properties. In silico methods, such as virtual library screening (VS) and de novo molecular generation with RL, show great promise in accelerating this search. However, incorporation of increasingly complex computational models in these workflows requires increasing sample efficiency. Here, we introduce an active learning system linked with an RL model (RL-AL) for molecular design, which aims to improve the sample-efficiency of the optimization process. We identity and characterize unique challenges combining RL and AL, investigate the interplay between the systems, and develop a novel AL approach to solve the MPO problem. Our approach greatly expedites the search for novel solutions relative to baseline-RL for simple ligand- and structure-based oracle functions, with a 5-66-fold increase in hits generated for a fixed oracle budget and a 4-64-fold reduction in computational time to find a specific number of hits. Furthermore, compounds discovered through RL-AL display substantial enrichment of a multi-parameter scoring objective, indicating superior efficacy in curating high-scoring compounds, without a reduction in output diversity. This significant acceleration improves the feasibility of oracle functions that have largely been overlooked in RL due to high computational costs, for example free energy perturbation methods, and in principle is applicable to any RL domain.
Collapse
Affiliation(s)
- Michael Dodds
- Molecular AI, Discovery Sciences, R&D, AstraZeneca 431 50 Gothenburg Sweden
| | - Jeff Guo
- Molecular AI, Discovery Sciences, R&D, AstraZeneca 431 50 Gothenburg Sweden
| | - Thomas Löhr
- Molecular AI, Discovery Sciences, R&D, AstraZeneca 431 50 Gothenburg Sweden
| | - Alessandro Tibo
- Molecular AI, Discovery Sciences, R&D, AstraZeneca 431 50 Gothenburg Sweden
| | - Ola Engkvist
- Molecular AI, Discovery Sciences, R&D, AstraZeneca 431 50 Gothenburg Sweden
| | - Jon Paul Janet
- Molecular AI, Discovery Sciences, R&D, AstraZeneca 431 50 Gothenburg Sweden
| |
Collapse
|
10
|
Olmedo DA, Durant-Archibold AA, López-Pérez JL, Medina-Franco JL. Design and Diversity Analysis of Chemical Libraries in Drug Discovery. Comb Chem High Throughput Screen 2024; 27:502-515. [PMID: 37409545 DOI: 10.2174/1386207326666230705150110] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2023] [Revised: 05/30/2023] [Accepted: 05/30/2023] [Indexed: 07/07/2023]
Abstract
Chemical libraries and compound data sets are among the main inputs to start the drug discovery process at universities, research institutes, and the pharmaceutical industry. The approach used in the design of compound libraries, the chemical information they possess, and the representation of structures, play a fundamental role in the development of studies: chemoinformatics, food informatics, in silico pharmacokinetics, computational toxicology, bioinformatics, and molecular modeling to generate computational hits that will continue the optimization process of drug candidates. The prospects for growth in drug discovery and development processes in chemical, biotechnological, and pharmaceutical companies began a few years ago by integrating computational tools with artificial intelligence methodologies. It is anticipated that it will increase the number of drugs approved by regulatory agencies shortly.
Collapse
Affiliation(s)
- Dionisio A Olmedo
- Centro de Investigaciones Farmacognósticas de la Flora Panameña (CIFLORPAN), Facultad de Farmacia, Universidad de Panamá, Ciudad de Panamá, Apartado, 0824-00178, Panamá
- Sistema Nacional de Investigación (SNI), Secretaria Nacional de Ciencia, Tecnología e Innovación (SENACYT), Ciudad del Saber, Clayton, Panamá
| | - Armando A Durant-Archibold
- Centro de Biodiversidad y Descubrimiento de Drogas, Instituto de Investigaciones Científicas y Servicios de Alta Tecnología (INDICASAT AIP), Apartado, 0843-01103, Panamá
- Departamento de Bioquímica, Facultad de Ciencias Naturales, Exactas y Tecnología, Universidad de Panamá, Ciudad de Panamá, Panamá
| | - José Luis López-Pérez
- CESIFAR, Departamento de Farmacología, Facultad de Medicina, Universidad de Panamá, Ciudad de Panamá, Panamá
- Departamento de Ciencias Farmacéuticas, Facultad de Farmacia, Universidad de Salamanca, Avda. Campo Charro s/n, 37071 Salamanca, España
| | - José Luis Medina-Franco
- DIFACQUIM Grupo de Investigación, Departamento de Farmacia, Escuela de Química, Universidad Nacional Autónoma de México, Ciudad de México, Apartado, 04510, México
| |
Collapse
|
11
|
Qin R, Zhang H, Huang W, Shao Z, Lei J. Deep learning-based design and screening of benzimidazole-pyrazine derivatives as adenosine A 2B receptor antagonists. J Biomol Struct Dyn 2023:1-17. [PMID: 38133953 DOI: 10.1080/07391102.2023.2295974] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2023] [Accepted: 12/11/2023] [Indexed: 12/24/2023]
Abstract
The Adenosine A2B receptor (A2BAR) is considered a novel potential target for the immunotherapy of cancer, and A2BAR antagonists have an inhibitory effect on tumor growth, proliferation, and metastasis. In our previous studies, we identified a class of benzimidazole-pyrazine scaffolds whose derivatives exhibited the antagonistic effect but lacked subtype selectivity towards A2BAR. In this work, we developed a scaffold-based protocol that incorporates a deep generative model and multilayer virtual screening to design benzimidazole-pyrazine derivatives as potential selective A2BAR antagonists. By utilizing a generative model with reported A2BAR antagonists as the training set, we built up a scaffold-focused library of benzimidazole-pyrazine derivatives and processed a virtual screening protocol to discover potential A2BAR antagonists. Finally, five molecules with different Bemis-Murcko scaffolds were identified and exhibited higher binding free energies than the reference molecule 12o. Further computational analysis revealed that the 3-benzyl derivative ABA-1266 presented high selectivity toward A2BAR and showed preferred draggability, providing future potent development of selective A2BAR antagonists.Communicated by Ramaswamy H. Sarma.
Collapse
Affiliation(s)
- Rui Qin
- School of Pharmaceutical Sciences, Sun Yat-sen University, Guangzhou, China
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, China
| | - Hao Zhang
- School of Pharmaceutical Sciences, Sun Yat-sen University, Guangzhou, China
| | - Weifeng Huang
- School of Pharmaceutical Sciences, Sun Yat-sen University, Guangzhou, China
| | - Zhenglin Shao
- School of Pharmaceutical Sciences, Sun Yat-sen University, Guangzhou, China
| | - Jinping Lei
- School of Pharmaceutical Sciences, Sun Yat-sen University, Guangzhou, China
| |
Collapse
|
12
|
Xu M, Chen H. Tree-Invent: A Novel Multipurpose Molecular Generative Model Constrained with a Topological Tree. J Chem Inf Model 2023; 63:7067-7082. [PMID: 37962855 DOI: 10.1021/acs.jcim.3c01626] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2023]
Abstract
De novo molecular design plays an important role in drug discovery. Here, a novel generative model, Tree-Invent, was proposed to integrate topological constraints in the generation of a molecular graph. In this model, a molecular graph is represented as a topological tree in which a ring system, a nonring atom, and a chemical bond are regarded as the ring node, single node, and edge, respectively. The molecule generation is driven by three independent submodels for carrying out operations of node addition, ring generation, and node connection. One unique feature of the generative model is that the topological tree structure can be specified as a constraint for structure generation, which provides more precise control of structure generation. Combined with reinforcement learning, the Tree-Invent model could efficiently explore targeted chemical space. Moreover, the Tree-Invent model is flexible enough to be used in versatile molecule design settings such as scaffold decoration, scaffold hopping, and linker generation.
Collapse
Affiliation(s)
- Mingyuan Xu
- Guangzhou National Laboratory, No. 9 XingDaoHuanBei Road, Guangzhou International Bio Island, Guangzhou, Guangdong 510005, China
| | - Hongming Chen
- Guangzhou National Laboratory, No. 9 XingDaoHuanBei Road, Guangzhou International Bio Island, Guangzhou, Guangdong 510005, China
| |
Collapse
|
13
|
Diao Y, Liu D, Ge H, Zhang R, Jiang K, Bao R, Zhu X, Bi H, Liao W, Chen Z, Zhang K, Wang R, Zhu L, Zhao Z, Hu Q, Li H. Macrocyclization of linear molecules by deep learning to facilitate macrocyclic drug candidates discovery. Nat Commun 2023; 14:4552. [PMID: 37507402 PMCID: PMC10382584 DOI: 10.1038/s41467-023-40219-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2022] [Accepted: 07/18/2023] [Indexed: 07/30/2023] Open
Abstract
Interest in macrocycles as potential therapeutic agents has increased rapidly. Macrocyclization of bioactive acyclic molecules provides a potential avenue to yield novel chemical scaffolds, which can contribute to the improvement of the biological activity and physicochemical properties of these molecules. In this study, we propose a computational macrocyclization method based on Transformer architecture (which we name Macformer). Leveraging deep learning, Macformer explores the vast chemical space of macrocyclic analogues of a given acyclic molecule by adding diverse linkers compatible with the acyclic molecule. Macformer can efficiently learn the implicit relationships between acyclic and macrocyclic structures represented as SMILES strings and generate plenty of macrocycles with chemical diversity and structural novelty. In data augmentation scenarios using both internal ChEMBL and external ZINC test datasets, Macformer display excellent performance and generalisability. We showcase the utility of Macformer when combined with molecular docking simulations and wet lab based experimental validation, by applying it to the prospective design of macrocyclic JAK2 inhibitors.
Collapse
Affiliation(s)
- Yanyan Diao
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science & Technology, Shanghai, 200237, China
| | - Dandan Liu
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science & Technology, Shanghai, 200237, China
| | - Huan Ge
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science & Technology, Shanghai, 200237, China
| | - Rongrong Zhang
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science & Technology, Shanghai, 200237, China
| | - Kexin Jiang
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science & Technology, Shanghai, 200237, China
| | - Runhui Bao
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science & Technology, Shanghai, 200237, China
| | - Xiaoqian Zhu
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science & Technology, Shanghai, 200237, China
| | - Hongjie Bi
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science & Technology, Shanghai, 200237, China
| | - Wenjie Liao
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science & Technology, Shanghai, 200237, China
| | - Ziqi Chen
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science & Technology, Shanghai, 200237, China
| | - Kai Zhang
- Innovation Center for AI and Drug Discovery, East China Normal University, Shanghai, 200062, China
| | - Rui Wang
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science & Technology, Shanghai, 200237, China
| | - Lili Zhu
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science & Technology, Shanghai, 200237, China
| | - Zhenjiang Zhao
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science & Technology, Shanghai, 200237, China
| | - Qiaoyu Hu
- Innovation Center for AI and Drug Discovery, East China Normal University, Shanghai, 200062, China
| | - Honglin Li
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science & Technology, Shanghai, 200237, China.
- Innovation Center for AI and Drug Discovery, East China Normal University, Shanghai, 200062, China.
- Lingang Laboratory, Shanghai, 200031, China.
| |
Collapse
|
14
|
Wills S, Sanchez-Garcia R, Dudgeon T, Roughley SD, Merritt A, Hubbard RE, Davidson J, von Delft F, Deane CM. Fragment Merging Using a Graph Database Samples Different Catalogue Space than Similarity Search. J Chem Inf Model 2023. [PMID: 37229647 DOI: 10.1021/acs.jcim.3c00276] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/27/2023]
Abstract
Fragment merging is a promising approach to progressing fragments directly to on-scale potency: each designed compound incorporates the structural motifs of overlapping fragments in a way that ensures compounds recapitulate multiple high-quality interactions. Searching commercial catalogues provides one useful way to quickly and cheaply identify such merges and circumvents the challenge of synthetic accessibility, provided they can be readily identified. Here, we demonstrate that the Fragment Network, a graph database that provides a novel way to explore the chemical space surrounding fragment hits, is well-suited to this challenge. We use an iteration of the database containing >120 million catalogue compounds to find fragment merges for four crystallographic screening campaigns and contrast the results with a traditional fingerprint-based similarity search. The two approaches identify complementary sets of merges that recapitulate the observed fragment-protein interactions but lie in different regions of chemical space. We further show our methodology is an effective route to achieving on-scale potency by retrospective analyses for two different targets; in analyses of public COVID Moonshot and Mycobacterium tuberculosis EthR inhibitors, potential inhibitors with micromolar IC50 values were identified. This work demonstrates the use of the Fragment Network to increase the yield of fragment merges beyond that of a classical catalogue search.
Collapse
Affiliation(s)
- Stephanie Wills
- Department of Statistics, University of Oxford, Oxford OX1 3LB, United Kingdom
- Centre for Medicines Discovery, University of Oxford, Oxford OX3 7DQ, United Kingdom
| | - Ruben Sanchez-Garcia
- Department of Statistics, University of Oxford, Oxford OX1 3LB, United Kingdom
- Centre for Medicines Discovery, University of Oxford, Oxford OX3 7DQ, United Kingdom
| | - Tim Dudgeon
- Informatics Matters, Ltd., Perch Coworking, Franklins House, Bicester OX26 6JU, United Kingdom
| | - Stephen D Roughley
- Vernalis (R&D) Limited, Granta Park, Great Abington, Cambridge CB21 6GB, United Kingdom
| | - Andy Merritt
- LifeArc, Lynton House, 7-12 Tavistock Square, London WC1H 9LT, United Kingdom
| | - Roderick E Hubbard
- Vernalis (R&D) Limited, Granta Park, Great Abington, Cambridge CB21 6GB, United Kingdom
| | - James Davidson
- Vernalis (R&D) Limited, Granta Park, Great Abington, Cambridge CB21 6GB, United Kingdom
| | - Frank von Delft
- Centre for Medicines Discovery, University of Oxford, Oxford OX3 7DQ, United Kingdom
- Diamond Light Source, Didcot OX11 0DE, United Kingdom
- Research Complex at Harwell, Harwell Science and Innovation Campus, Didcot OX11 0FA, United Kingdom
- Department of Biochemistry, University of Johannesburg, Auckland Park, Johannesburg 2006, South Africa
| | - Charlotte M Deane
- Department of Statistics, University of Oxford, Oxford OX1 3LB, United Kingdom
| |
Collapse
|
15
|
Ji C, Zheng Y, Wang R, Cai Y, Wu H. Graph Polish: A Novel Graph Generation Paradigm for Molecular Optimization. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2023; 34:2323-2337. [PMID: 34520363 DOI: 10.1109/tnnls.2021.3106392] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
Molecular optimization, which transforms a given input molecule X into another Y with desired properties, is essential in molecular drug discovery. The traditional approaches either suffer from sample-inefficient learning or ignore information that can be captured with the supervised learning of optimized molecule pairs. In this study, we present a novel molecular optimization paradigm, Graph Polish. In this paradigm, with the guidance of the source and target molecule pairs of the desired properties, a heuristic optimization solution can be derived: given an input molecule, we first predict which atom can be viewed as the optimization center, and then the nearby regions are optimized around this center. We then propose an effective and efficient learning framework, Teacher and Student polish, to capture the dependencies in the optimization steps. A teacher component automatically identifies and annotates the optimization centers and the preservation, removal, and addition of some parts of the molecules; a student component learns these knowledges and applies them to a new molecule. The proposed paradigm can offer an intuitive interpretation for the molecular optimization result. Experiments with multiple optimization tasks are conducted on several benchmark datasets. The proposed approach achieves a significant advantage over the six state-of-the-art baseline methods. Also, extensive studies are conducted to validate the effectiveness, explainability, and time savings of the novel optimization paradigm.
Collapse
|
16
|
Koutroumpa NM, Papavasileiou KD, Papadiamantis AG, Melagraki G, Afantitis A. A Systematic Review of Deep Learning Methodologies Used in the Drug Discovery Process with Emphasis on In Vivo Validation. Int J Mol Sci 2023; 24:6573. [PMID: 37047543 PMCID: PMC10095548 DOI: 10.3390/ijms24076573] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2022] [Revised: 03/24/2023] [Accepted: 03/28/2023] [Indexed: 04/05/2023] Open
Abstract
The discovery and development of new drugs are extremely long and costly processes. Recent progress in artificial intelligence has made a positive impact on the drug development pipeline. Numerous challenges have been addressed with the growing exploitation of drug-related data and the advancement of deep learning technology. Several model frameworks have been proposed to enhance the performance of deep learning algorithms in molecular design. However, only a few have had an immediate impact on drug development since computational results may not be confirmed experimentally. This systematic review aims to summarize the different deep learning architectures used in the drug discovery process and are validated with further in vivo experiments. For each presented study, the proposed molecule or peptide that has been generated or identified by the deep learning model has been biologically evaluated in animal models. These state-of-the-art studies highlight that even if artificial intelligence in drug discovery is still in its infancy, it has great potential to accelerate the drug discovery cycle, reduce the required costs, and contribute to the integration of the 3R (Replacement, Reduction, Refinement) principles. Out of all the reviewed scientific articles, seven algorithms were identified: recurrent neural networks, specifically, long short-term memory (LSTM-RNNs), Autoencoders (AEs) and their Wasserstein Autoencoders (WAEs) and Variational Autoencoders (VAEs) variants; Convolutional Neural Networks (CNNs); Direct Message Passing Neural Networks (D-MPNNs); and Multitask Deep Neural Networks (MTDNNs). LSTM-RNNs were the most used architectures with molecules or peptide sequences as inputs.
Collapse
Affiliation(s)
- Nikoletta-Maria Koutroumpa
- Department of ChemoInformatics, NovaMechanics Ltd., Nicosia 1070, Cyprus
- School of Chemical Engineering, National Technical University of Athens, 157 80 Athens, Greece
- Division of Data Driven Innovation, Entelos Institute, Larnaca 6059, Cyprus
| | - Konstantinos D. Papavasileiou
- Department of ChemoInformatics, NovaMechanics Ltd., Nicosia 1070, Cyprus
- Division of Data Driven Innovation, Entelos Institute, Larnaca 6059, Cyprus
- Department of ChemoInformatics, NovaMechanics MIKE., 185 45 Piraeus, Greece
| | - Anastasios G. Papadiamantis
- Department of ChemoInformatics, NovaMechanics Ltd., Nicosia 1070, Cyprus
- Division of Data Driven Innovation, Entelos Institute, Larnaca 6059, Cyprus
| | - Georgia Melagraki
- Division of Physical Sciences & Applications, Hellenic Military Academy, 166 73 Vari, Greece
| | - Antreas Afantitis
- Department of ChemoInformatics, NovaMechanics Ltd., Nicosia 1070, Cyprus
- Division of Data Driven Innovation, Entelos Institute, Larnaca 6059, Cyprus
- Department of ChemoInformatics, NovaMechanics MIKE., 185 45 Piraeus, Greece
| |
Collapse
|
17
|
Yu Y, Huang J, He H, Han J, Ye G, Xu T, Sun X, Chen X, Ren X, Li C, Li H, Huang W, Liu Y, Wang X, Gao Y, Cheng N, Guo N, Chen X, Feng J, Hua Y, Liu C, Zhu G, Xie Z, Yao L, Zhong W, Chen X, Liu W, Li H. Accelerated Discovery of Macrocyclic CDK2 Inhibitor QR-6401 by Generative Models and Structure-Based Drug Design. ACS Med Chem Lett 2023; 14:297-304. [PMID: 36923916 PMCID: PMC10009793 DOI: 10.1021/acsmedchemlett.2c00515] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2022] [Accepted: 01/19/2023] [Indexed: 02/11/2023] Open
Abstract
Selective CDK2 inhibitors have the potential to provide effective therapeutics for CDK2-dependent cancers and for combating drug resistance due to high cyclin E1 (CCNE1) expression intrinsically or CCNE1 amplification induced by treatment of CDK4/6 inhibitors. Generative models that take advantage of deep learning are being increasingly integrated into early drug discovery for hit identification and lead optimization. Here we report the discovery of a highly potent and selective macrocyclic CDK2 inhibitor QR-6401 (23) accelerated by the application of generative models and structure-based drug design (SBDD). QR-6401 (23) demonstrated robust antitumor efficacy in an OVCAR3 ovarian cancer xenograft model via oral administration.
Collapse
Affiliation(s)
- Yang Yu
- Tencent
AI Lab, Tencent, Shenzhen 518057, China
| | | | - Hu He
- Regor
Therapeutics Group, Shanghai, 201210, China
| | - Jing Han
- Regor
Therapeutics Group, Shanghai, 201210, China
| | - Geyan Ye
- Tencent
AI Lab, Tencent, Shenzhen 518057, China
| | - Tingyang Xu
- Tencent
AI Lab, Tencent, Shenzhen 518057, China
| | | | - Xiumei Chen
- Regor
Therapeutics Group, Shanghai, 201210, China
| | - Xiaoming Ren
- Regor
Therapeutics Group, Shanghai, 201210, China
| | - Chunlai Li
- Regor
Therapeutics Group, Shanghai, 201210, China
| | - Huijuan Li
- Regor
Therapeutics Group, Shanghai, 201210, China
| | - Wei Huang
- Regor
Therapeutics Group, Shanghai, 201210, China
| | - Yangyang Liu
- Regor
Therapeutics Group, Shanghai, 201210, China
| | - Xinjuan Wang
- Regor
Therapeutics Group, Shanghai, 201210, China
| | - Yongzhi Gao
- Regor
Therapeutics Group, Shanghai, 201210, China
| | - Nianhe Cheng
- Regor
Therapeutics Group, Shanghai, 201210, China
| | - Na Guo
- BioDuro-Sundia, Shanghai, 200131, China
| | - Xibo Chen
- BioDuro-Sundia, Shanghai, 200131, China
| | | | - Yuxia Hua
- BioDuro-Sundia, Beijing, 102200, China
| | - Chong Liu
- BioDuro-Sundia, Beijing, 102200, China
| | - Guoyun Zhu
- Regor
Therapeutics Group, Shanghai, 201210, China
| | - Zhi Xie
- Regor
Therapeutics Group, Shanghai, 201210, China
| | - Lili Yao
- Regor
Therapeutics Group, Shanghai, 201210, China
| | - Wenge Zhong
- Regor
Therapeutics Group, Shanghai, 201210, China
| | - Xinde Chen
- Tencent
AI Lab, Tencent, Shenzhen 518057, China
| | - Wei Liu
- Tencent
AI Lab, Tencent, Shenzhen 518057, China
| | - Hailong Li
- Regor
Therapeutics Group, Shanghai, 201210, China
| |
Collapse
|
18
|
Seo S, Lim J, Kim WY. Molecular Generative Model via Retrosynthetically Prepared Chemical Building Block Assembly. ADVANCED SCIENCE (WEINHEIM, BADEN-WURTTEMBERG, GERMANY) 2023; 10:e2206674. [PMID: 36596675 PMCID: PMC10015872 DOI: 10.1002/advs.202206674] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/14/2022] [Indexed: 06/17/2023]
Abstract
Deep generative models are attracting attention as a smart molecular design strategy. However, previous models often render molecules with low synthesizability, hindering their real-world applications. Here, a novel graph-based conditional generative model which makes molecules by tailoring retrosynthetically prepared chemical building blocks until achieving target properties in an auto-regressive fashion is proposed. This strategy improves the synthesizability and property control of the resulting molecules and also helps learn how to select appropriate building blocks and bind them together to achieve target properties. By applying a negative sampling method to the selection process of building blocks, this model overcame a critical limitation of previous fragment-based models, which can only use molecules from the training set during generation. As a result, the model works equally well with unseen building blocks without sacrificing computational efficiency. It is demonstrated that the model can generate potential inhibitors with high docking scores against the 3CL protease of SARS-COV-2.
Collapse
Affiliation(s)
- Seonghwan Seo
- HITS Incorporation124 Teheran‐ro, Gangnam‐guSeoul06234Republic of Korea
- Department of ChemistryKAIST, 291 Daehak‐ro, Yuseong‐guDaejeon34141Republic of Korea
| | - Jaechang Lim
- HITS Incorporation124 Teheran‐ro, Gangnam‐guSeoul06234Republic of Korea
| | - Woo Youn Kim
- HITS Incorporation124 Teheran‐ro, Gangnam‐guSeoul06234Republic of Korea
- Department of ChemistryKAIST, 291 Daehak‐ro, Yuseong‐guDaejeon34141Republic of Korea
- AI InstituteKAIST, 291 Daehak‐ro, Yuseong‐guDaejeon34141Republic of Korea
| |
Collapse
|
19
|
Liu X, Ye K, van Vlijmen HWT, IJzerman AP, van Westen GJP. DrugEx v3: scaffold-constrained drug design with graph transformer-based reinforcement learning. J Cheminform 2023; 15:24. [PMID: 36803659 PMCID: PMC9940339 DOI: 10.1186/s13321-023-00694-z] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2022] [Accepted: 02/06/2023] [Indexed: 02/22/2023] Open
Abstract
Rational drug design often starts from specific scaffolds to which side chains/substituents are added or modified due to the large drug-like chemical space available to search for novel drug-like molecules. With the rapid growth of deep learning in drug discovery, a variety of effective approaches have been developed for de novo drug design. In previous work we proposed a method named DrugEx, which can be applied in polypharmacology based on multi-objective deep reinforcement learning. However, the previous version is trained under fixed objectives and does not allow users to input any prior information (i.e. a desired scaffold). In order to improve the general applicability, we updated DrugEx to design drug molecules based on scaffolds which consist of multiple fragments provided by users. Here, a Transformer model was employed to generate molecular structures. The Transformer is a multi-head self-attention deep learning model containing an encoder to receive scaffolds as input and a decoder to generate molecules as output. In order to deal with the graph representation of molecules a novel positional encoding for each atom and bond based on an adjacency matrix was proposed, extending the architecture of the Transformer. The graph Transformer model contains growing and connecting procedures for molecule generation starting from a given scaffold based on fragments. Moreover, the generator was trained under a reinforcement learning framework to increase the number of desired ligands. As a proof of concept, the method was applied to design ligands for the adenosine A2A receptor (A2AAR) and compared with SMILES-based methods. The results show that 100% of the generated molecules are valid and most of them had a high predicted affinity value towards A2AAR with given scaffolds.
Collapse
Affiliation(s)
- Xuhan Liu
- grid.5132.50000 0001 2312 1970Drug Discovery and Safety, Leiden Academic Centre for Drug Research, Einsteinweg 55, Leiden, The Netherlands
| | - Kai Ye
- grid.43169.390000 0001 0599 1243School of Electrics and Information Engineering, Xi’an Jiaotong University, 28 XianningW Rd, Xi’an, China
| | - Herman W. T. van Vlijmen
- grid.5132.50000 0001 2312 1970Drug Discovery and Safety, Leiden Academic Centre for Drug Research, Einsteinweg 55, Leiden, The Netherlands ,grid.419619.20000 0004 0623 0341Janssen Pharmaceutica NV, Turnhoutseweg 30, B-2340 Beerse, Belgium
| | - Adriaan P. IJzerman
- grid.5132.50000 0001 2312 1970Drug Discovery and Safety, Leiden Academic Centre for Drug Research, Einsteinweg 55, Leiden, The Netherlands
| | - Gerard J. P. van Westen
- grid.5132.50000 0001 2312 1970Drug Discovery and Safety, Leiden Academic Centre for Drug Research, Einsteinweg 55, Leiden, The Netherlands
| |
Collapse
|
20
|
Song S, Tang H, Ran T, Fang F, Tong L, Chen H, Xie H, Lu X. Application of deep generative model for design of Pyrrolo[2,3-d] pyrimidine derivatives as new selective TANK binding kinase 1 (TBK1) inhibitors. Eur J Med Chem 2023; 247:115034. [PMID: 36603506 DOI: 10.1016/j.ejmech.2022.115034] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2022] [Revised: 12/08/2022] [Accepted: 12/17/2022] [Indexed: 12/24/2022]
Abstract
The deep conditional transformer neural network SyntaLinker was applied to identify compounds with pyrrolo[2,3-d]pyrimidine scaffold as potent selective TBK1 inhibitor. Further medicinal chemistry optimization campaign led to the discovery of the most potent compound 7l, which exhibited strong enzymatic inhibitory activity against TBK1 with an IC50 value of 22.4 nM 7l had a superior inhibitory activity in human monocytic THP1-Blue cells reporter gene assay than MRT67307. Furthermore, 7l significantly inhibited TBK1 downstream target genes cxcl10 and ifnβ expression in THP1 and RAW264.7 cells induced by poly (I:C) and lipopolysaccharide, respectively. This study suggested that combination of deep conditional transformer neural network SyntaLinker and transfer learning could be a powerful tool for scaffold hopping in drug discovery.
Collapse
Affiliation(s)
- Shukai Song
- School of Pharmacy, Jinan University, #855 Xingye Avenue, Guangzhou, 510632, China
| | - Haotian Tang
- Division of Antitumor Pharmacology, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, #555 Zuchongzhi Road, Shanghai, 201203, China; University of Chinese Academy of Sciences, No.19(A) Yuquan Road, Shijingshan District, Beijing, 100049, China
| | - Ting Ran
- Division of Drug and Vaccine Research, Guangzhou Laboratory, Guangzhou, 510530, China
| | - Feng Fang
- Division of Antitumor Pharmacology, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, #555 Zuchongzhi Road, Shanghai, 201203, China
| | - Linjiang Tong
- Division of Antitumor Pharmacology, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, #555 Zuchongzhi Road, Shanghai, 201203, China
| | - Hongming Chen
- Division of Drug and Vaccine Research, Guangzhou Laboratory, Guangzhou, 510530, China.
| | - Hua Xie
- Division of Antitumor Pharmacology, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, #555 Zuchongzhi Road, Shanghai, 201203, China; Zhongshan Institute for Drug Discovery, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Cuiheng New District, Zhongshan City, China.
| | - Xiaoyun Lu
- School of Pharmacy, Jinan University, #855 Xingye Avenue, Guangzhou, 510632, China.
| |
Collapse
|
21
|
McNair D. Artificial Intelligence and Machine Learning for Lead-to-Candidate Decision-Making and Beyond. Annu Rev Pharmacol Toxicol 2023; 63:77-97. [PMID: 35679624 DOI: 10.1146/annurev-pharmtox-051921-023255] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/25/2023]
Abstract
The use of artificial intelligence (AI) and machine learning (ML) in pharmaceutical research and development has to date focused on research: target identification; docking-, fragment-, and motif-based generation of compound libraries; modeling of synthesis feasibility; rank-ordering likely hits according to structural and chemometric similarity to compounds having known activity and affinity to the target(s); optimizing a smaller library for synthesis and high-throughput screening; and combining evidence from screening to support hit-to-lead decisions. Applying AI/ML methods to lead optimization and lead-to-candidate (L2C) decision-making has shown slower progress, especially regarding predicting absorption, distribution, metabolism, excretion, and toxicology properties. The present review surveys reasons why this is so, reports progress that has occurred in recent years, and summarizes some of the issues that remain. Effective AI/ML tools to derisk L2C and later phases of development are important to accelerate the pharmaceutical development process, ameliorate escalating development costs, and achieve greater success rates.
Collapse
Affiliation(s)
- Douglas McNair
- Global Health, Integrated Development, Bill & Melinda Gates Foundation, Seattle, Washington, USA;
| |
Collapse
|
22
|
Liao Z, Xie L, Mamitsuka H, Zhu S. Sc2Mol: a scaffold-based two-step molecule generator with variational autoencoder and transformer. Bioinformatics 2023; 39:btac814. [PMID: 36576008 PMCID: PMC9835482 DOI: 10.1093/bioinformatics/btac814] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2022] [Revised: 10/31/2022] [Accepted: 12/27/2022] [Indexed: 12/29/2022] Open
Abstract
MOTIVATION Finding molecules with desired pharmaceutical properties is crucial in drug discovery. Generative models can be an efficient tool to find desired molecules through the distribution learned by the model to approximate given training data. Existing generative models (i) do not consider backbone structures (scaffolds), resulting in inefficiency or (ii) need prior patterns for scaffolds, causing bias. Scaffolds are reasonable to use, and it is imperative to design a generative model without any prior scaffold patterns. RESULTS We propose a generative model-based molecule generator, Sc2Mol, without any prior scaffold patterns. Sc2Mol uses SMILES strings for molecules. It consists of two steps: scaffold generation and scaffold decoration, which are carried out by a variational autoencoder and a transformer, respectively. The two steps are powerful for implementing random molecule generation and scaffold optimization. Our empirical evaluation using drug-like molecule datasets confirmed the success of our model in distribution learning and molecule optimization. Also, our model could automatically learn the rules to transform coarse scaffolds into sophisticated drug candidates. These rules were consistent with those for current lead optimization. AVAILABILITY AND IMPLEMENTATION The code is available at https://github.com/zhiruiliao/Sc2Mol. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Zhirui Liao
- School of Computer Science, Fudan University, Shanghai 200433, China
| | - Lei Xie
- Department of Computer Science, Hunter College, The City University of New York, New York, NY 10065, USA
| | - Hiroshi Mamitsuka
- Bioinformatics Center, Institute for Chemical Research, Kyoto University, Uji, Kyoto Prefecture 611-0011, Japan
- Department of Computer Science, Aalto University, Espoo 00076, Finland
| | - Shanfeng Zhu
- Institute of Science and Technology for Brain-Inspired Intelligence and MOE Frontiers Center for Brain Science, Fudan University, Shanghai 200433, China
- Shanghai Qi Zhi Institute, Shanghai 200030, China
- Key Laboratory of Computational Neuroscience and Brain-Inspired Intelligence, Fudan University, Ministry of Education, Shanghai 200433, China
- Shanghai Key Lab of Intelligent Information Processing and Shanghai Institute of Artificial Intelligence Algorithm, Fudan University, Shanghai 200433, China
- Zhangjiang Fudan International Innovation Center, Shanghai 200433, China
- Institute of Artificial Intelligence Biomedicine, Nanjing University, Nanjing, Jiangsu 210031, China
| |
Collapse
|
23
|
Umedera K, Yoshimori A, Chen H, Kouji H, Nakamura H, Bajorath J. DeepCubist: Molecular Generator for Designing Peptidomimetics based on Complex three-dimensional scaffolds. J Comput Aided Mol Des 2023; 37:107-115. [PMID: 36462089 PMCID: PMC9876871 DOI: 10.1007/s10822-022-00493-y] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2022] [Accepted: 11/23/2022] [Indexed: 12/04/2022]
Abstract
Mimicking bioactive conformations of peptide segments involved in the formation of protein-protein interfaces with small molecules is thought to represent a promising strategy for the design of protein-protein interaction (PPI) inhibitors. For compound design, the use of three-dimensional (3D) scaffolds rich in sp3-centers makes it possible to precisely mimic bioactive peptide conformations. Herein, we introduce DeepCubist, a molecular generator for designing peptidomimetics based on 3D scaffolds. Firstly, enumerated 3D scaffolds are superposed on a target peptide conformation to identify a preferred template structure for designing peptidomimetics. Secondly, heteroatoms and unsaturated bonds are introduced into the template via a deep generative model to produce candidate compounds. DeepCubist was applied to design peptidomimetics of exemplary peptide turn, helix, and loop structures in pharmaceutical targets engaging in PPIs.
Collapse
Affiliation(s)
- Kohei Umedera
- School of Life Science and Technology, Tokyo Institute of Technology, 4259, Nagatsuta-cho, Midori-ku, 226-8503 Yokohama, Japan ,Department of Life Science Informatics, LIMES Program Unit Chemical Biology and Medicinal Chemistry, B-IT, Rheinische Friedrich-Wilhelms-Universität, Friedrich-Hirzebruch-Allee 5/6, D-53115 Bonn, Germany
| | - Atsushi Yoshimori
- Institute for Theoretical Medicine, Inc, 26-1, Muraoka-Higashi 2-chome, 251-8555 Fujisawa, Kanagawa Japan
| | - Hengwei Chen
- Department of Life Science Informatics, LIMES Program Unit Chemical Biology and Medicinal Chemistry, B-IT, Rheinische Friedrich-Wilhelms-Universität, Friedrich-Hirzebruch-Allee 5/6, D-53115 Bonn, Germany
| | - Hiroyuki Kouji
- Oita University Institute of Advanced Medicine, Inc, 17-20, Higashi Kasuga-machi, 870-0037 Oita City, Oita Japan
| | - Hiroyuki Nakamura
- School of Life Science and Technology, Tokyo Institute of Technology, 4259, Nagatsuta-cho, Midori-ku, 226-8503 Yokohama, Japan ,Laboratory for Chemistry and Life Science, Institute of Innovative Research, Tokyo Institute of Technology, 4259, Nagatsuta-cho, Midori-ku, 226-8503 Yokohama, Japan
| | - Jürgen Bajorath
- Department of Life Science Informatics, LIMES Program Unit Chemical Biology and Medicinal Chemistry, B-IT, Rheinische Friedrich-Wilhelms-Universität, Friedrich-Hirzebruch-Allee 5/6, D-53115 Bonn, Germany
| |
Collapse
|
24
|
Dai X, Xu Y, Qiu H, Qian X, Lin M, Luo L, Zhao Y, Huang D, Zhang Y, Chen Y, Liu H, Jiang Y. KID: A Kinase-Focused Interaction Database and Its Application in the Construction of Kinase-Focused Molecule Databases. J Chem Inf Model 2022; 62:6022-6034. [PMID: 36447388 DOI: 10.1021/acs.jcim.2c00908] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/02/2022]
Abstract
Protein kinases are important drug targets for the treatment of several diseases. The interaction between kinases and ligands is vital in the process of small-molecule kinase inhibitor (SMKI) design. In this study, we propose a method to extract fragments and amino acid residues from crystal structures for kinase-ligand interactions. In addition, core fragments that interact with the important hinge region of kinases were extracted along with their decorations. Based on the superimposed structural data of kinases from the kinase-ligand interaction fingerprint and structure database, we obtained two libraries, namely, a hinge-unfocused fragment-amino acid pair library (FAP Lib) that contains 6672 pairs of fragments and corresponding amino-acids, and a hinge-focused hinge binder library (HB Lib) of 3560 pairs of hinge-binding scaffolds with their corresponding decorations. These two libraries constitute a kinase-focused interaction database (KID). In depth analysis was conducted on KID to explore important characteristics of fragments in the design of SMKIs. With KID, we built two kinase-focused molecule databases, one called Recomb_DB, which contains 1,72,346 molecules generated through fragment recombination based on the FAP Lib, and another called RsdHB_DB, which contains 93,030 molecules generated based on our HB Lib using molecular generation methods. Compared with five databases both commercial and non-commercial, these two databases both ranked top 3 in scaffold diversity, top 4 in molecule fingerprint diversity, and are more focused on the chemical space of kinase inhibitors. Hence, KID presents a useful addition to existing databases for the exploration of novel SMKIs.
Collapse
Affiliation(s)
- Xiaowen Dai
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing 211198, China
| | - Yuan Xu
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing 211198, China
| | - Haodi Qiu
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing 211198, China
| | - Xu Qian
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing 211198, China
| | - Mingde Lin
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing 211198, China
| | - Lin Luo
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing 211198, China
| | - Yang Zhao
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing 211198, China
| | - Dingfang Huang
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing 211198, China
| | - Yanmin Zhang
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing 211198, China
| | - Yadong Chen
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing 211198, China
| | - Haichun Liu
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing 211198, China
| | - Yulei Jiang
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing 211198, China
| |
Collapse
|
25
|
Xiong F, Xu H, Yu M, Chen X, Zhong Z, Guo Y, Chen M, Ou H, Wu J, Xie A, Xiong J, Xu L, Zhang L, Zhong Q, Huang L, Li Z, Zhang T, Jin F, He X. 3CLpro inhibitors: DEL-based molecular generation. Front Pharmacol 2022; 13:1085665. [PMID: 36569316 PMCID: PMC9768338 DOI: 10.3389/fphar.2022.1085665] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2022] [Accepted: 11/23/2022] [Indexed: 12/12/2022] Open
Abstract
Molecular generation (MG) via machine learning (ML) has speeded drug structural optimization, especially for targets with a large amount of reported bioactivity data. However, molecular generation for structural optimization is often powerless for new targets. DNA-encoded library (DEL) can generate systematic, target-specific activity data, including novel targets with few or unknown activity data. Therefore, this study aims to overcome the limitation of molecular generation in the structural optimization for the new target. Firstly, we generated molecules using the structure-affinity data (2.96 million samples) for 3C-like protease (3CLpro) from our own-built DEL platform to get rid of using public databases (e.g., CHEMBL and ZINC). Subsequently, to analyze the effect of transfer learning on the positive rate of the molecule generation model, molecular docking and affinity model based on DEL data were applied to explore the enhanced impact of transfer learning on molecule generation. In addition, the generated molecules are subjected to multiple filtering, including physicochemical properties, drug-like properties, and pharmacophore evaluation, molecular docking to determine the molecules for further study and verified by molecular dynamics simulation.
Collapse
Affiliation(s)
- Feng Xiong
- Shenzhen Innovation Center for Small Molecule Drug Discovery Co., Ltd., Shenzhen, China,*Correspondence: Feng Xiong, ; Feng Jin, ; Xun He,
| | - Honggui Xu
- Shenzhen NewDEL Biotech Co., Ltd., Shenzhen, China
| | - Mingao Yu
- Shenzhen NewDEL Biotech Co., Ltd., Shenzhen, China
| | - Xingyu Chen
- Shenzhen NewDEL Biotech Co., Ltd., Shenzhen, China
| | - Zhenmin Zhong
- Shenzhen Innovation Center for Small Molecule Drug Discovery Co., Ltd., Shenzhen, China
| | - Yuhan Guo
- Shenzhen NewDEL Biotech Co., Ltd., Shenzhen, China
| | - Meihong Chen
- Shenzhen Innovation Center for Small Molecule Drug Discovery Co., Ltd., Shenzhen, China
| | - Huanfang Ou
- Shenzhen Innovation Center for Small Molecule Drug Discovery Co., Ltd., Shenzhen, China
| | - Jiaqi Wu
- Shenzhen Innovation Center for Small Molecule Drug Discovery Co., Ltd., Shenzhen, China
| | - Anhua Xie
- Shenzhen Innovation Center for Small Molecule Drug Discovery Co., Ltd., Shenzhen, China
| | - Jiaqi Xiong
- Shenzhen Innovation Center for Small Molecule Drug Discovery Co., Ltd., Shenzhen, China
| | - Linlin Xu
- Shenzhen Innovation Center for Small Molecule Drug Discovery Co., Ltd., Shenzhen, China
| | - Lanmei Zhang
- Shenzhen Innovation Center for Small Molecule Drug Discovery Co., Ltd., Shenzhen, China
| | - Qijian Zhong
- Shenzhen Innovation Center for Small Molecule Drug Discovery Co., Ltd., Shenzhen, China
| | - Liye Huang
- Shenzhen Innovation Center for Small Molecule Drug Discovery Co., Ltd., Shenzhen, China
| | - Zhenwei Li
- Shenzhen Innovation Center for Small Molecule Drug Discovery Co., Ltd., Shenzhen, China
| | | | - Feng Jin
- Shenzhen NewDEL Biotech Co., Ltd., Shenzhen, China,*Correspondence: Feng Xiong, ; Feng Jin, ; Xun He,
| | - Xun He
- Shenzhen Innovation Center for Small Molecule Drug Discovery Co., Ltd., Shenzhen, China,*Correspondence: Feng Xiong, ; Feng Jin, ; Xun He,
| |
Collapse
|
26
|
Xu T, Wang M, Liu X, Feng D, Zhu Y, Fan Z, Rao S, Lu J. A Scaffold-based Deep Generative Model Considering Molecular Stereochemical Information. Mol Inform 2022; 41:e2200088. [PMID: 36031563 DOI: 10.1002/minf.202200088] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
Designing molecules with specific scaffolds can facilitate the discovery and optimization of lead compounds. Some scaffold-based molecular generation models have been developed using deep-learning methods based on specific scaffolds, although incorporating scaffold generalization is expected to achieve scaffold hopping. Moreover, most of the existing models focus on the 2D shape of the scaffold and overlook the stereochemical properties of the compound, especially for natural products. In this study, we optimized the scaffold-based molecular generation model designed by Lim et al. (Chemical Science 2020, 11, 1153-1164). Real-time ultrafast shape recognition with pharmacophore constraints (USRCAT) was introduced into the model to search for molecules similar to the 3D conformation and pharmacophore of the input scaffold sourced from the training set; the searched molecules were then used as new scaffolds to execute scaffold hopping. The optimized model could generate new molecules with the same chirality as the input scaffold. Furthermore, the probability distribution of the molecular structure and various physicochemical properties were analyzed to evaluate the model's generation capability. We thus believe that the optimized model can provide a basis for medicinal chemists to explore a wider chemical space toward optimization of the lead compounds and to screen the virtual compound library.
Collapse
Affiliation(s)
- Tianxu Xu
- Department, Institution:Key Laboratory of Molecular Pharmacology and Drug Evaluation, Ministry of Education, Collaborative Innovation Center of Advanced Drug Delivery System and Biotech Drugs in Universities of Shandong, School of Pharmacy, Yantai University, No. 30, Qingquan Road, Laishan District, Yantai, 264005, China
| | - Minjun Wang
- Department, Institution:Key Laboratory of Molecular Pharmacology and Drug Evaluation, Ministry of Education, Collaborative Innovation Center of Advanced Drug Delivery System and Biotech Drugs in Universities of Shandong, School of Pharmacy, Yantai University, No. 30, Qingquan Road, Laishan District, Yantai, 264005, China
| | - Xiaoqian Liu
- Department, Institution:Key Laboratory of Molecular Pharmacology and Drug Evaluation, Ministry of Education, Collaborative Innovation Center of Advanced Drug Delivery System and Biotech Drugs in Universities of Shandong, School of Pharmacy, Yantai University, No. 30, Qingquan Road, Laishan District, Yantai, 264005, China
| | - Dawei Feng
- Department, Institution:Key Laboratory of Molecular Pharmacology and Drug Evaluation, Ministry of Education, Collaborative Innovation Center of Advanced Drug Delivery System and Biotech Drugs in Universities of Shandong, School of Pharmacy, Yantai University, No. 30, Qingquan Road, Laishan District, Yantai, 264005, China
| | - Yanjuan Zhu
- Department, Institution:Key Laboratory of Molecular Pharmacology and Drug Evaluation, Ministry of Education, Collaborative Innovation Center of Advanced Drug Delivery System and Biotech Drugs in Universities of Shandong, School of Pharmacy, Yantai University, No. 30, Qingquan Road, Laishan District, Yantai, 264005, China
| | - Zhe Fan
- Department, Institution:Key Laboratory of Molecular Pharmacology and Drug Evaluation, Ministry of Education, Collaborative Innovation Center of Advanced Drug Delivery System and Biotech Drugs in Universities of Shandong, School of Pharmacy, Yantai University, No. 30, Qingquan Road, Laishan District, Yantai, 264005, China
| | - Shurong Rao
- Department, Institution:Key Laboratory of Molecular Pharmacology and Drug Evaluation, Ministry of Education, Collaborative Innovation Center of Advanced Drug Delivery System and Biotech Drugs in Universities of Shandong, School of Pharmacy, Yantai University, No. 30, Qingquan Road, Laishan District, Yantai, 264005, China
| | - Jing Lu
- Department, Institution:Key Laboratory of Molecular Pharmacology and Drug Evaluation, Ministry of Education, Collaborative Innovation Center of Advanced Drug Delivery System and Biotech Drugs in Universities of Shandong, School of Pharmacy, Yantai University, No. 30, Qingquan Road, Laishan District, Yantai, 264005, China
| |
Collapse
|
27
|
Bort W, Mazitov D, Horvath D, Bonachera F, Lin A, Marcou G, Baskin I, Madzhidov T, Varnek A. Inverse QSAR: Reversing Descriptor-Driven Prediction Pipeline Using Attention-Based Conditional Variational Autoencoder. J Chem Inf Model 2022; 62:5471-5484. [PMID: 36332178 DOI: 10.1021/acs.jcim.2c01086] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
In order to better foramize it, the notorious inverse-QSAR problem (finding structures of given QSAR-predicted properties) is considered in this paper as a two-step process including (i) finding "seed" descriptor vectors corresponding to user-constrained QSAR model output values and (ii) identifying the chemical structures best matching the "seed" vectors. The main development effort here was focused on the latter stage, proposing a new attention-based conditional variational autoencoder neural-network architecture based on recent developments in attention-based methods. The obtained results show that this workflow was capable of generating compounds predicted to display desired activity while being completely novel compared to the training database (ChEMBL). Moreover, the generated compounds show acceptable druglikeness and synthetic accessibility. Both pharmacophore and docking studies were carried out as "orthogonal" in silico validation methods, proving that some of de novo structures are, beyond being predicted active by 2D-QSAR models, clearly able to match binding 3D pharmacophores and bind the protein pocket.
Collapse
Affiliation(s)
- William Bort
- Laboratory of Chemoinformatics, UMR 7140 University of Strasbourg/CNRS, 4 rue Blaise Pascal, 67000 Strasbourg, France
| | - Daniyar Mazitov
- Laboratory of Chemoinformatics and Molecular Modeling, A. M. Butlerov Institute of Chemistry, Kazan Federal University, 18, Kremlyovskaya str., 420008 Kazan, Russia
| | - Dragos Horvath
- Laboratory of Chemoinformatics, UMR 7140 University of Strasbourg/CNRS, 4 rue Blaise Pascal, 67000 Strasbourg, France
| | - Fanny Bonachera
- Laboratory of Chemoinformatics, UMR 7140 University of Strasbourg/CNRS, 4 rue Blaise Pascal, 67000 Strasbourg, France
| | - Arkadii Lin
- Laboratory of Chemoinformatics, UMR 7140 University of Strasbourg/CNRS, 4 rue Blaise Pascal, 67000 Strasbourg, France
| | - Gilles Marcou
- Laboratory of Chemoinformatics, UMR 7140 University of Strasbourg/CNRS, 4 rue Blaise Pascal, 67000 Strasbourg, France
| | - Igor Baskin
- Department of Material Science and Engineering, Technion─Israel Institute of Technology, 3200003 Haifa, Israel
| | - Timur Madzhidov
- Laboratory of Chemoinformatics and Molecular Modeling, A. M. Butlerov Institute of Chemistry, Kazan Federal University, 18, Kremlyovskaya str., 420008 Kazan, Russia
| | - Alexandre Varnek
- Laboratory of Chemoinformatics, UMR 7140 University of Strasbourg/CNRS, 4 rue Blaise Pascal, 67000 Strasbourg, France
| |
Collapse
|
28
|
Zhang Y, Jiang Q, Li L, Li Z, Xu Z, Chen Y, Sun Y, Liu C, Mao Z, Chen F, Li H, Cao Y, Pian C. Predicting the structure of unexplored novel fentanyl analogues by deep learning model. Brief Bioinform 2022; 23:6741166. [PMID: 36184256 DOI: 10.1093/bib/bbac418] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2022] [Revised: 08/21/2022] [Accepted: 08/30/2022] [Indexed: 12/14/2022] Open
Abstract
Fentanyl and its analogues are psychoactive substances and the concern of fentanyl abuse has been existed in decades. Because the structure of fentanyl is easy to be modified, criminals may synthesize new fentanyl analogues to avoid supervision. The drug supervision is based on the structure matching to the database and too few kinds of fentanyl analogues are included in the database, so it is necessary to find out more potential fentanyl analogues and expand the sample space of fentanyl analogues. In this study, we introduced two deep generative models (SeqGAN and MolGPT) to generate potential fentanyl analogues, and a total of 11 041 valid molecules were obtained. The results showed that not only can we generate molecules with similar property distribution of original data, but the generated molecules also contain potential fentanyl analogues that are not pretty similar to any of original data. Ten molecules based on the rules of fentanyl analogues were selected for NMR, MS and IR validation. The results indicated that these molecules are all unreported fentanyl analogues. Furthermore, this study is the first to apply the deep learning to the generation of fentanyl analogues, greatly expands the exploring space of fentanyl analogues and provides help for the supervision of fentanyl.
Collapse
Affiliation(s)
| | | | | | - Zutan Li
- Bioinformatics Doctoral Student at Nanjing Agricultural University, China
| | - Zhihui Xu
- Researcher in Simcere Diagnostics Co., Ltd, China
| | - Yuanyuan Chen
- College of Sciences at Nanjing Agricultural University, China
| | - Yang Sun
- Nanjing Medical University, China
| | - Cheng Liu
- Department of Forensic Medicine, College of Basic Medical Science at Nanjing Medical University, China
| | - Zhengsheng Mao
- Forensic Science Department at Nanjing Medical University, China
| | | | - Hualan Li
- Bioinformatics Master Student at Nanjing Agricultural University, China
| | - Yue Cao
- Department of Forensic Medicine, Nanjing Medical University, Nanjing, Jiangsu, China
| | - Cong Pian
- College of Sciences, Nanjing Agricultural University, Nanjing, JiangsuChina
| |
Collapse
|
29
|
Askr H, Elgeldawi E, Aboul Ella H, Elshaier YAMM, Gomaa MM, Hassanien AE. Deep learning in drug discovery: an integrative review and future challenges. Artif Intell Rev 2022; 56:5975-6037. [PMID: 36415536 PMCID: PMC9669545 DOI: 10.1007/s10462-022-10306-1] [Citation(s) in RCA: 30] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/24/2022] [Indexed: 11/18/2022]
Abstract
Recently, using artificial intelligence (AI) in drug discovery has received much attention since it significantly shortens the time and cost of developing new drugs. Deep learning (DL)-based approaches are increasingly being used in all stages of drug development as DL technology advances, and drug-related data grows. Therefore, this paper presents a systematic Literature review (SLR) that integrates the recent DL technologies and applications in drug discovery Including, drug-target interactions (DTIs), drug-drug similarity interactions (DDIs), drug sensitivity and responsiveness, and drug-side effect predictions. We present a review of more than 300 articles between 2000 and 2022. The benchmark data sets, the databases, and the evaluation measures are also presented. In addition, this paper provides an overview of how explainable AI (XAI) supports drug discovery problems. The drug dosing optimization and success stories are discussed as well. Finally, digital twining (DT) and open issues are suggested as future research challenges for drug discovery problems. Challenges to be addressed, future research directions are identified, and an extensive bibliography is also included.
Collapse
Affiliation(s)
- Heba Askr
- Faculty of Computers and Artificial Intelligence, University of Sadat City, Sadat City, Egypt
| | - Enas Elgeldawi
- Computer Science Department, Faculty of Science, Minia University, Minia, Egypt
| | - Heba Aboul Ella
- Faculty of Pharmacy and Drug Technology, Chinese University in Egypt (CUE), Cairo, Egypt
| | | | - Mamdouh M. Gomaa
- Computer Science Department, Faculty of Science, Minia University, Minia, Egypt
| | - Aboul Ella Hassanien
- Faculty of Computers and Artificial Intelligence, Cairo University, Cairo, Egypt
| |
Collapse
|
30
|
Zhang Y, Luo M, Wu P, Wu S, Lee TY, Bai C. Application of Computational Biology and Artificial Intelligence in Drug Design. Int J Mol Sci 2022; 23:13568. [PMID: 36362355 PMCID: PMC9658956 DOI: 10.3390/ijms232113568] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2022] [Revised: 10/29/2022] [Accepted: 11/03/2022] [Indexed: 08/24/2023] Open
Abstract
Traditional drug design requires a great amount of research time and developmental expense. Booming computational approaches, including computational biology, computer-aided drug design, and artificial intelligence, have the potential to expedite the efficiency of drug discovery by minimizing the time and financial cost. In recent years, computational approaches are being widely used to improve the efficacy and effectiveness of drug discovery and pipeline, leading to the approval of plenty of new drugs for marketing. The present review emphasizes on the applications of these indispensable computational approaches in aiding target identification, lead discovery, and lead optimization. Some challenges of using these approaches for drug design are also discussed. Moreover, we propose a methodology for integrating various computational techniques into new drug discovery and design.
Collapse
Affiliation(s)
- Yue Zhang
- School of Life and Health Sciences, School of Medicine, The Chinese University of Hong Kong, Shenzhen 518172, China
- School of Chemistry and Materials Science, University of Science and Technology of China, Hefei 230026, China
- Warshel Institute for Computational Biology, Shenzhen 518172, China
| | - Mengqi Luo
- School of Life and Health Sciences, School of Medicine, The Chinese University of Hong Kong, Shenzhen 518172, China
- South China Hospital, Health Science Center, Shenzhen University, Shenzhen 518116, China
| | - Peng Wu
- School of Biomedical Engineering, Health Science Center, Shenzhen University, Shenzhen 518055, China
| | - Song Wu
- South China Hospital, Health Science Center, Shenzhen University, Shenzhen 518116, China
| | - Tzong-Yi Lee
- School of Life and Health Sciences, School of Medicine, The Chinese University of Hong Kong, Shenzhen 518172, China
- Warshel Institute for Computational Biology, Shenzhen 518172, China
| | - Chen Bai
- School of Life and Health Sciences, School of Medicine, The Chinese University of Hong Kong, Shenzhen 518172, China
- Warshel Institute for Computational Biology, Shenzhen 518172, China
| |
Collapse
|
31
|
Kumar R, Sharma A, Alexiou A, Ashraf GM. Artificial Intelligence in De novo Drug Design: Are We Still There? Curr Top Med Chem 2022; 22:2483-2492. [PMID: 36263480 DOI: 10.2174/1568026623666221017143244] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2022] [Revised: 09/06/2022] [Accepted: 09/15/2022] [Indexed: 01/20/2023]
Abstract
BACKGROUND The artificial intelligence (AI)-assisted design of drug candidates with novel structures and desired properties has received significant attention in the recent past, so related areas of forward prediction that aim to discover chemical matters worth synthesizing and further experimental investigation. OBJECTIVES The purpose behind developing AI-driven models is to explore the broader chemical space and suggest new drug candidate scaffolds with promising therapeutic value. Moreover, it is anticipated that such AI-based models may not only significantly reduce the cost and time but also decrease the attrition rate of drug candidates that fail to reach the desirable endpoints at the final stages of drug development. In an attempt to develop AI-based models for de novo drug design, numerous methods have been proposed by various study groups by applying machine learning and deep learning algorithms to chemical datasets. However, there are many challenges in obtaining accurate predictions, and real breakthroughs in de novo drug design are still scarce. METHODS In this review, we explore the recent trends in developing AI-based models for de novo drug design to assess the current status, challenges, and opportunities in the field. CONCLUSION The consistently improved AI algorithms and the abundance of curated training chemical data indicate that AI-based de novo drug design should perform better than the current models. Improvements in the performance are warranted to obtain better outcomes in the form of potential drug candidates, which can perform well in in vivo conditions, especially in the case of more complex diseases.
Collapse
Affiliation(s)
- Rajnish Kumar
- Amity Institute of Biotechnology, Amity University Uttar Pradesh Lucknow Campus, Uttar Pradesh, India
| | - Anju Sharma
- Department of Applied Science, Indian Institute of Information Technology, Allahabad, Uttar Pradesh, India
| | - Athanasios Alexiou
- Novel Global Community Educational Foundation, Hebersham, 2770 NSW, Australia.,AFNP Med Austria, 1010 Wien, Austria
| | - Ghulam Md Ashraf
- Pre-Clinical Research Unit (PCRU), King Fahd Medical Research Center, King Abdulaziz University, Jeddah, Saudi Arabia.,Department of Medical Laboratory Technology, Faculty of Applied Medical Sciences, King Abdulaziz University, Jeddah, Saudi Arabia
| |
Collapse
|
32
|
Atance SR, Diez JV, Engkvist O, Olsson S, Mercado R. De Novo Drug Design Using Reinforcement Learning with Graph-Based Deep Generative Models. J Chem Inf Model 2022; 62:4863-4872. [PMID: 36219571 DOI: 10.1021/acs.jcim.2c00838] [Citation(s) in RCA: 22] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Machine learning provides effective computational tools for exploring the chemical space via deep generative models. Here, we propose a new reinforcement learning scheme to fine-tune graph-based deep generative models for de novo molecular design tasks. We show how our computational framework can successfully guide a pretrained generative model toward the generation of molecules with a specific property profile, even when such molecules are not present in the training set and unlikely to be generated by the pretrained model. We explored the following tasks: generating molecules of decreasing/increasing size, increasing drug-likeness, and increasing bioactivity. Using the proposed approach, we achieve a model which generates diverse compounds with predicted DRD2 activity for 95% of sampled molecules, outperforming previously reported methods on this metric.
Collapse
Affiliation(s)
- Sara Romeo Atance
- Molecular AI, Discovery Sciences, R&D, AstraZeneca Gothenburg, Pepparedsleden 1, 431 50Mölndal, Sweden.,Department of Computer Science and Engineering, Chalmers University of Technology, Rännvägen 6, 412 58Göteborg, Sweden
| | - Juan Viguera Diez
- Molecular AI, Discovery Sciences, R&D, AstraZeneca Gothenburg, Pepparedsleden 1, 431 50Mölndal, Sweden.,Department of Computer Science and Engineering, Chalmers University of Technology, Rännvägen 6, 412 58Göteborg, Sweden
| | - Ola Engkvist
- Molecular AI, Discovery Sciences, R&D, AstraZeneca Gothenburg, Pepparedsleden 1, 431 50Mölndal, Sweden.,Department of Computer Science and Engineering, Chalmers University of Technology, Rännvägen 6, 412 58Göteborg, Sweden
| | - Simon Olsson
- Department of Computer Science and Engineering, Chalmers University of Technology, Rännvägen 6, 412 58Göteborg, Sweden
| | - Rocío Mercado
- Molecular AI, Discovery Sciences, R&D, AstraZeneca Gothenburg, Pepparedsleden 1, 431 50Mölndal, Sweden
| |
Collapse
|
33
|
García-Ortegón M, Simm GNC, Tripp AJ, Hernández-Lobato JM, Bender A, Bacallado S. DOCKSTRING: Easy Molecular Docking Yields Better Benchmarks for Ligand Design. J Chem Inf Model 2022; 62:3486-3502. [PMID: 35849793 PMCID: PMC9364321 DOI: 10.1021/acs.jcim.1c01334] [Citation(s) in RCA: 26] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2021] [Indexed: 01/05/2023]
Abstract
The field of machine learning for drug discovery is witnessing an explosion of novel methods. These methods are often benchmarked on simple physicochemical properties such as solubility or general druglikeness, which can be readily computed. However, these properties are poor representatives of objective functions in drug design, mainly because they do not depend on the candidate compound's interaction with the target. By contrast, molecular docking is a widely applied method in drug discovery to estimate binding affinities. However, docking studies require a significant amount of domain knowledge to set up correctly, which hampers adoption. Here, we present dockstring, a bundle for meaningful and robust comparison of ML models using docking scores. dockstring consists of three components: (1) an open-source Python package for straightforward computation of docking scores, (2) an extensive dataset of docking scores and poses of more than 260,000 molecules for 58 medically relevant targets, and (3) a set of pharmaceutically relevant benchmark tasks such as virtual screening or de novo design of selective kinase inhibitors. The Python package implements a robust ligand and target preparation protocol that allows nonexperts to obtain meaningful docking scores. Our dataset is the first to include docking poses, as well as the first of its size that is a full matrix, thus facilitating experiments in multiobjective optimization and transfer learning. Overall, our results indicate that docking scores are a more realistic evaluation objective than simple physicochemical properties, yielding benchmark tasks that are more challenging and more closely related to real problems in drug discovery.
Collapse
Affiliation(s)
- Miguel García-Ortegón
- Statistical
Laboratory, Centre for Mathematical Sciences, University of Cambridge, Wilberforce Rd., Cambridge CB3 0WB, United Kingdom
| | - Gregor N. C. Simm
- Department
of Engineering, University of Cambridge, Trumpington St., Cambridge CB2 1PZ, United Kingdom
| | - Austin J. Tripp
- Department
of Engineering, University of Cambridge, Trumpington St., Cambridge CB2 1PZ, United Kingdom
| | | | - Andreas Bender
- Yusuf
Hamied Department of Chemistry, University
of Cambridge, Lensfield
Rd., Cambridge CB2 1EW, United Kingdom
| | - Sergio Bacallado
- Statistical
Laboratory, Centre for Mathematical Sciences, University of Cambridge, Wilberforce Rd., Cambridge CB3 0WB, United Kingdom
| |
Collapse
|
34
|
Hadfield TE, Imrie F, Merritt A, Birchall K, Deane CM. Incorporating Target-Specific Pharmacophoric Information into Deep Generative Models for Fragment Elaboration. J Chem Inf Model 2022; 62:2280-2292. [PMID: 35499971 PMCID: PMC9131447 DOI: 10.1021/acs.jcim.1c01311] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
Abstract
Despite recent interest in deep generative models for scaffold elaboration, their applicability to fragment-to-lead campaigns has so far been limited. This is primarily due to their inability to account for local protein structure or a user's design hypothesis. We propose a novel method for fragment elaboration, STRIFE, that overcomes these issues. STRIFE takes as input fragment hotspot maps (FHMs) extracted from a protein target and processes them to provide meaningful and interpretable structural information to its generative model, which in turn is able to rapidly generate elaborations with complementary pharmacophores to the protein. In a large-scale evaluation, STRIFE outperforms existing, structure-unaware, fragment elaboration methods in proposing highly ligand-efficient elaborations. In addition to automatically extracting pharmacophoric information from a protein target's FHM, STRIFE optionally allows the user to specify their own design hypotheses.
Collapse
Affiliation(s)
- Thomas E Hadfield
- Oxford Protein Informatics Group, Department of Statistics, University of Oxford, Oxford OX1 3LB, United Kingdom
| | - Fergus Imrie
- Oxford Protein Informatics Group, Department of Statistics, University of Oxford, Oxford OX1 3LB, United Kingdom
| | - Andy Merritt
- LifeArc, SBC Open Innovation Campus, Stevenage SG1 2FX, United Kingdom
| | - Kristian Birchall
- LifeArc, SBC Open Innovation Campus, Stevenage SG1 2FX, United Kingdom
| | - Charlotte M Deane
- Oxford Protein Informatics Group, Department of Statistics, University of Oxford, Oxford OX1 3LB, United Kingdom
| |
Collapse
|
35
|
Rodríguez-Pérez R, Miljković F, Bajorath J. Machine Learning in Chemoinformatics and Medicinal Chemistry. Annu Rev Biomed Data Sci 2022; 5:43-65. [PMID: 35440144 DOI: 10.1146/annurev-biodatasci-122120-124216] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
In chemoinformatics and medicinal chemistry, machine learning has evolved into an important approach. In recent years, increasing computational resources and new deep learning algorithms have put machine learning onto a new level, addressing previously unmet challenges in pharmaceutical research. In silico approaches for compound activity predictions, de novo design, and reaction modeling have been further advanced by new algorithmic developments and the emergence of big data in the field. Herein, novel applications of machine learning and deep learning in chemoinformatics and medicinal chemistry are reviewed. Opportunities and challenges for new methods and applications are discussed, placing emphasis on proper baseline comparisons, robust validation methodologies, and new applicability domains. Expected final online publication date for the Annual Review of Biomedical Data Science, Volume 5 is August 2022. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.
Collapse
Affiliation(s)
- Raquel Rodríguez-Pérez
- Department of Life Science Informatics, B-IT (Bonn-Aachen International Center for Information Technology), Chemical Biology and Medicinal Chemistry Program Unit, LIMES (Life and Medical Sciences Institute), Rheinische Friedrich-Wilhelms-Universität, Bonn, Germany; .,Current affiliation: Novartis Institutes for Biomedical Research, Novartis Campus, Basel, Switzerland
| | - Filip Miljković
- Department of Life Science Informatics, B-IT (Bonn-Aachen International Center for Information Technology), Chemical Biology and Medicinal Chemistry Program Unit, LIMES (Life and Medical Sciences Institute), Rheinische Friedrich-Wilhelms-Universität, Bonn, Germany; .,Current affiliation: Data Science and AI, Imaging and Data Analytics, Clinical Pharmacology and Safety Sciences, R&D AstraZeneca, Gothenburg, Sweden
| | - Jürgen Bajorath
- Department of Life Science Informatics, B-IT (Bonn-Aachen International Center for Information Technology), Chemical Biology and Medicinal Chemistry Program Unit, LIMES (Life and Medical Sciences Institute), Rheinische Friedrich-Wilhelms-Universität, Bonn, Germany;
| |
Collapse
|
36
|
Bilodeau C, Jin W, Jaakkola T, Barzilay R, Jensen KF. Generative models for molecular discovery: Recent advances and challenges. WIRES COMPUTATIONAL MOLECULAR SCIENCE 2022. [DOI: 10.1002/wcms.1608] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Affiliation(s)
- Camille Bilodeau
- Department of Chemical Engineering Massachusetts Institute of Technology Cambridge Massachusetts USA
| | - Wengong Jin
- Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology Cambridge Massachusetts USA
| | - Tommi Jaakkola
- Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology Cambridge Massachusetts USA
| | - Regina Barzilay
- Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology Cambridge Massachusetts USA
| | - Klavs F. Jensen
- Department of Chemical Engineering Massachusetts Institute of Technology Cambridge Massachusetts USA
| |
Collapse
|
37
|
Unsupervised Learning in Drug Design from Self-Organization to Deep Chemistry. Int J Mol Sci 2022; 23:ijms23052797. [PMID: 35269939 PMCID: PMC8910896 DOI: 10.3390/ijms23052797] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2022] [Revised: 02/27/2022] [Accepted: 02/27/2022] [Indexed: 12/10/2022] Open
Abstract
The availability of computers has brought novel prospects in drug design. Neural networks (NN) were an early tool that cheminformatics tested for converting data into drugs. However, the initial interest faded for almost two decades. The recent success of Deep Learning (DL) has inspired a renaissance of neural networks for their potential application in deep chemistry. DL targets direct data analysis without any human intervention. Although back-propagation NN is the main algorithm in the DL that is currently being used, unsupervised learning can be even more efficient. We review self-organizing maps (SOM) in mapping molecular representations from the 1990s to the current deep chemistry. We discovered the enormous efficiency of SOM not only for features that could be expected by humans, but also for those that are not trivial to human chemists. We reviewed the DL projects in the current literature, especially unsupervised architectures. DL appears to be efficient in pattern recognition (Deep Face) or chess (Deep Blue). However, an efficient deep chemistry is still a matter for the future. This is because the availability of measured property data in chemistry is still limited.
Collapse
|
38
|
Kaitoh K, Yamanishi Y. Scaffold-Retained Structure Generator to Exhaustively Create Molecules in an Arbitrary Chemical Space. J Chem Inf Model 2022; 62:2212-2225. [DOI: 10.1021/acs.jcim.1c01130] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
Affiliation(s)
- Kazuma Kaitoh
- Department of Bioscience and Bioinformatics, Faculty of Computer Science and Systems Engineering, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka, Fukuoka 820-8502, Japan
| | - Yoshihiro Yamanishi
- Department of Bioscience and Bioinformatics, Faculty of Computer Science and Systems Engineering, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka, Fukuoka 820-8502, Japan
| |
Collapse
|
39
|
Applications of machine learning in computer-aided drug discovery. QRB DISCOVERY 2022. [PMID: 37529294 PMCID: PMC10392679 DOI: 10.1017/qrd.2022.12] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
Abstract
Abstract
Machine learning (ML) has revolutionised the field of structure-based drug design (SBDD) in recent years. During the training stage, ML techniques typically analyse large amounts of experimentally determined data to create predictive models in order to inform the drug discovery process. Deep learning (DL) is a subfield of ML, that relies on multiple layers of a neural network to extract significantly more complex patterns from experimental data, and has recently become a popular choice in SBDD. This review provides a thorough summary of the recent DL trends in SBDD with a particular focus on de novo drug design, binding site prediction, and binding affinity prediction of small molecules.
Collapse
|
40
|
Bilsland AE, Pugliese A, Bower J. Implementation of an AI-assisted fragment-generator in an open-source platform. RSC Med Chem 2022; 13:1205-1211. [PMID: 36320432 PMCID: PMC9579942 DOI: 10.1039/d2md00152g] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2022] [Accepted: 07/27/2022] [Indexed: 11/21/2022] Open
Abstract
We recently reported a deep learning model to facilitate fragment library design, which is critical for efficient hit identification. However, our model was implemented in Python. We have now created an implementation in the KNIME graphical pipelining environment which we hope will allow experimentation by users with limited programming knowledge. We report a deep learning model to facilitate fragment library design, which is critical for efficient hit identification, and an implementation in the KNIME graphical workflow environment which should facilitate a more codeless use.![]()
Collapse
Affiliation(s)
- Alan E. Bilsland
- Cancer Research Horizons – Therapeutic Innovation, Cancer Research UK Beatson Institute, Garscube Estate, Switchback Road, Glasgow G61 1BD, UK
| | - Angelo Pugliese
- BioAscent Discovery, Bo'Ness Road, Newhouse, Lanarkshire ML1 5UH, UK
| | - Justin Bower
- Cancer Research Horizons – Therapeutic Innovation, Cancer Research UK Beatson Institute, Garscube Estate, Switchback Road, Glasgow G61 1BD, UK
| |
Collapse
|
41
|
Abstract
Artificial intelligence (AI) tools find increasing application in drug discovery supporting every stage of the Design-Make-Test-Analyse (DMTA) cycle. The main focus of this chapter is the application in molecular generation with the aid of deep neural networks (DNN). We present a historical overview of the main advances in the field. We analyze the concepts of distribution and goal-directed learning and then highlight some of the recent applications of generative models in drug design with a focus into research work from the biopharmaceutical industry. We present in some more detail REINVENT which is an open-source software developed within our group in AstraZeneca and the main platform for AI molecular design support for a number of medicinal chemistry projects in the company and we also demonstrate some of our work in library design. Finally, we present some of the main challenges in the application of AI in Drug Discovery and different approaches to respond to these challenges which define areas for current and future work.
Collapse
|
42
|
Abstract
It is still rare that AI application examples with full DMTA (Design, Make, Test, Analysis) outcomes are reported. A recent study highlights that a generative model could be applied in the drug discovery process through an example in which ideas generated by an AI generative model were confirmed by following wet-lab works.
Collapse
Affiliation(s)
- Hongming Chen
- Bioland Laboratory (Guangzhou Regenerative Medicine and Health - Guangdong Laboratory), Guangzhou 510530, China
| |
Collapse
|
43
|
Miljković F, Rodríguez-Pérez R, Bajorath J. Impact of Artificial Intelligence on Compound Discovery, Design, and Synthesis. ACS OMEGA 2021; 6:33293-33299. [PMID: 34926881 PMCID: PMC8674916 DOI: 10.1021/acsomega.1c05512] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/04/2021] [Accepted: 11/18/2021] [Indexed: 05/17/2023]
Abstract
As in other areas, artificial intelligence (AI) is heavily promoted in different scientific fields, including chemistry. Although chemistry traditionally tends to be a conservative field and slower than others to adapt new concepts, AI is increasingly being investigated across chemical disciplines. In medicinal chemistry, supported by computer-aided drug design and cheminformatics, computational methods have long been employed to aid in the search for and optimization of active compounds. We are currently witnessing a multitude of AI-related publications in the medicinal-chemistry-relevant literature and anticipate that the numbers will further increase. Often, advances through AI promoted in such reports are difficult to reconcile or remain questionable, which hampers the acceptance of computational work in interdisciplinary environments. Herein we attempt to highlight selected investigations in which AI has shown promise to impact medicinal chemistry in areas such as compound design and synthesis.
Collapse
Affiliation(s)
- Filip Miljković
- Department
of Life Science Informatics and Data Science, B-IT, LIMES Program
Unit Chemical Biology and Medicinal Chemistry, Rheinische Friedrich-Wilhelms-Universität, Friedrich-Hirzebruch-Allee 6, D-53115 Bonn, Germany
- Data
Science and AI, Imaging and Data Analytics, Clinical Pharmacology
& Safety Sciences, R&D, AstraZeneca, SE-431 83 Gothenburg, Sweden
| | - Raquel Rodríguez-Pérez
- Department
of Life Science Informatics and Data Science, B-IT, LIMES Program
Unit Chemical Biology and Medicinal Chemistry, Rheinische Friedrich-Wilhelms-Universität, Friedrich-Hirzebruch-Allee 6, D-53115 Bonn, Germany
- Novartis
Institutes for Biomedical Research, Novartis
Campus, CH-4002 Basel, Switzerland
| | - Jürgen Bajorath
- Department
of Life Science Informatics and Data Science, B-IT, LIMES Program
Unit Chemical Biology and Medicinal Chemistry, Rheinische Friedrich-Wilhelms-Universität, Friedrich-Hirzebruch-Allee 6, D-53115 Bonn, Germany
- Phone: 49-228-7369-100.
| |
Collapse
|
44
|
Grebner C, Matter H, Hessler G. Artificial Intelligence in Compound Design. Methods Mol Biol 2021; 2390:349-382. [PMID: 34731477 DOI: 10.1007/978-1-0716-1787-8_15] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/10/2023]
Abstract
Artificial intelligence has seen an incredibly fast development in recent years. Many novel technologies for property prediction of drug molecules as well as for the design of novel molecules were introduced by different research groups. These artificial intelligence-based design methods can be applied for suggesting novel chemical motifs in lead generation or scaffold hopping as well as for optimization of desired property profiles during lead optimization. In lead generation, broad sampling of the chemical space for identification of novel motifs is required, while in the lead optimization phase, a detailed exploration of the chemical neighborhood of a current lead series is advantageous. These different requirements for successful design outcomes render different combinations of artificial intelligence technologies useful. Overall, we observe that a combination of different approaches with tailored scoring and evaluation schemes appears beneficial for efficient artificial intelligence-based compound design.
Collapse
Affiliation(s)
- Christoph Grebner
- Sanofi-Aventis Deutschland GmbH, R&D, Integrated Drug Discovery, Frankfurt am Main, Germany
| | - Hans Matter
- Sanofi-Aventis Deutschland GmbH, R&D, Integrated Drug Discovery, Frankfurt am Main, Germany
| | - Gerhard Hessler
- Sanofi-Aventis Deutschland GmbH, R&D, Integrated Drug Discovery, Frankfurt am Main, Germany.
| |
Collapse
|
45
|
Ye Z, Chen F, Zeng J, Gao J, Zhang MQ. ScaffComb: A Phenotype-Based Framework for Drug Combination Virtual Screening in Large-Scale Chemical Datasets. ADVANCED SCIENCE (WEINHEIM, BADEN-WURTTEMBERG, GERMANY) 2021; 8:e2102092. [PMID: 34723439 PMCID: PMC8693048 DOI: 10.1002/advs.202102092] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 05/21/2021] [Revised: 07/29/2021] [Indexed: 06/13/2023]
Abstract
Combinational therapy is used for a long time in cancer treatment to overcome drug resistance related to monotherapy. Increased pharmacological data and the rapid development of deep learning methods have enabled the construction of models to predict and screen drug pairs. However, the size of drug libraries is restricted to hundreds to thousands of compounds. The ScaffComb framework, which aims to bridge the gaps in the virtual screening of drug combinations in large-scale databases, is proposed here. Inspired by phenotype-based drug design, ScaffComb integrates phenotypic information into molecular scaffolds, which can be used to screen the drug library and identify potent drug combinations. First, ScaffComb is validated using the US food and drug administration dataset and known drug combinations are successfully reidentified. Then, ScaffComb is applied to screen the ZINC and ChEMBL databases, which yield novel drug combinations and reveal an ability to discover new synergistic mechanisms. To our knowledge, ScaffComb is the first method to use phenotype-based virtual screening of drug combinations in large-scale chemical datasets.
Collapse
Affiliation(s)
- Zhaofeng Ye
- MOE Key Laboratory of BioinformaticsBioinformatics DivisionCenter for Synthetic and Systems BiologyBNRistDepartment of AutomationTsinghua UniversityBeijing100084China
- School of MedicineTsinghua UniversityBeijing100084China
| | - Fengling Chen
- Center for Stem Cell Biology and Regenerative MedicineMOE Key Laboratory of BioinformaticsTsinghua UniversityBeijing100084China
- Tsinghua‐Peking Center for Life SciencesBeijing100084China
| | - Jiangyang Zeng
- MOE Key Laboratory of BioinformaticsBioinformatics DivisionCenter for Synthetic and Systems BiologyBNRistDepartment of AutomationTsinghua UniversityBeijing100084China
- Institute for Interdisciplinary Information SciencesTsinghua UniversityBeijing100084China
| | - Juntao Gao
- MOE Key Laboratory of BioinformaticsBioinformatics DivisionCenter for Synthetic and Systems BiologyBNRistDepartment of AutomationTsinghua UniversityBeijing100084China
| | - Michael Q. Zhang
- MOE Key Laboratory of BioinformaticsBioinformatics DivisionCenter for Synthetic and Systems BiologyBNRistDepartment of AutomationTsinghua UniversityBeijing100084China
- School of MedicineTsinghua UniversityBeijing100084China
- Department of Biological SciencesCenter for Systems BiologyThe University of Texas at DallasRichardsonTX75080‐3021USA
| |
Collapse
|
46
|
Tan X, Li C, Yang R, Zhao S, Li F, Li X, Chen L, Wan X, Liu X, Yang T, Tong X, Xu T, Cui R, Jiang H, Zhang S, Liu H, Zheng M. Discovery of Pyrazolo[3,4- d]pyridazinone Derivatives as Selective DDR1 Inhibitors via Deep Learning Based Design, Synthesis, and Biological Evaluation. J Med Chem 2021; 65:103-119. [PMID: 34821145 DOI: 10.1021/acs.jmedchem.1c01205] [Citation(s) in RCA: 24] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
Alterations of discoidin domain receptor1 (DDR1) may lead to increased production of inflammatory cytokines, making DDR1 an attractive target for inflammatory bowel disease (IBD) therapy. A scaffold-based molecular design workflow was established and performed by integrating a deep generative model, kinase selectivity screening and molecular docking, leading to a novel DDR1 inhibitor compound 2, which showed potent DDR1 inhibition profile (IC50 = 10.6 ± 1.9 nM) and excellent selectivity against a panel of 430 kinases (S (10) = 0.002 at 0.1 μM). Compound 2 potently inhibited the expression of pro-inflammatory cytokines and DDR1 autophosphorylation in cells, and it also demonstrated promising oral therapeutic effect in a dextran sulfate sodium (DSS)-induced mouse colitis model.
Collapse
Affiliation(s)
- Xiaoqin Tan
- ByteDance AI Lab, 1999 Yishan Road, Shanghai 201103, China
| | - Chunpu Li
- School of Pharmaceutical Science and Technology, Hangzhou Institute for Advanced Study, UCAS, Hangzhou 310024, China
| | - Ruirui Yang
- University of Chinese Academy of Sciences, No.19A Yuquan Road, Beijing 100049, China.,Shanghai Institute for Advanced Immunochemical Studies and School of Life Science and Technology, ShanghaiTech University, 393 Huaxiazhong Road, Shanghai 200031, China
| | | | - Fei Li
- Fudan University, 2005 Songhu Road, Shanghai 200433, China
| | - Xutong Li
- University of Chinese Academy of Sciences, No.19A Yuquan Road, Beijing 100049, China
| | - Lifan Chen
- University of Chinese Academy of Sciences, No.19A Yuquan Road, Beijing 100049, China
| | - Xiaozhe Wan
- University of Chinese Academy of Sciences, No.19A Yuquan Road, Beijing 100049, China
| | - Xiaohong Liu
- Shanghai Institute for Advanced Immunochemical Studies and School of Life Science and Technology, ShanghaiTech University, 393 Huaxiazhong Road, Shanghai 200031, China
| | - Tianbiao Yang
- School of Pharmaceutical Science and Technology, Hangzhou Institute for Advanced Study, UCAS, Hangzhou 310024, China
| | - Xiaochu Tong
- University of Chinese Academy of Sciences, No.19A Yuquan Road, Beijing 100049, China
| | | | - Rongrong Cui
- School of Chinese Materia Medica, Nanjing University of Chinese Medicine, 138 Xianlin Road, Nanjing 210023, China
| | - Hualiang Jiang
- School of Pharmaceutical Science and Technology, Hangzhou Institute for Advanced Study, UCAS, Hangzhou 310024, China.,University of Chinese Academy of Sciences, No.19A Yuquan Road, Beijing 100049, China.,Shanghai Institute for Advanced Immunochemical Studies and School of Life Science and Technology, ShanghaiTech University, 393 Huaxiazhong Road, Shanghai 200031, China
| | | | - Hong Liu
- School of Pharmaceutical Science and Technology, Hangzhou Institute for Advanced Study, UCAS, Hangzhou 310024, China
| | - Mingyue Zheng
- School of Pharmaceutical Science and Technology, Hangzhou Institute for Advanced Study, UCAS, Hangzhou 310024, China.,University of Chinese Academy of Sciences, No.19A Yuquan Road, Beijing 100049, China
| |
Collapse
|
47
|
Molecular generation by Fast Assembly of (Deep)SMILES fragments. J Cheminform 2021; 13:88. [PMID: 34775976 PMCID: PMC8591910 DOI: 10.1186/s13321-021-00566-4] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2021] [Accepted: 11/02/2021] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND In recent years, in silico molecular design is regaining interest. To generate on a computer molecules with optimized properties, scoring functions can be coupled with a molecular generator to design novel molecules with a desired property profile. RESULTS In this article, a simple method is described to generate only valid molecules at high frequency ([Formula: see text] molecule/s using a single CPU core), given a molecular training set. The proposed method generates diverse SMILES (or DeepSMILES) encoded molecules while also showing some propensity at training set distribution matching. When working with DeepSMILES, the method reaches peak performance ([Formula: see text] molecule/s) because it relies almost exclusively on string operations. The "Fast Assembly of SMILES Fragments" software is released as open-source at https://github.com/UnixJunkie/FASMIFRA . Experiments regarding speed, training set distribution matching, molecular diversity and benchmark against several other methods are also shown.
Collapse
|
48
|
Zheng S, Lei Z, Ai H, Chen H, Deng D, Yang Y. Deep scaffold hopping with multimodal transformer neural networks. J Cheminform 2021; 13:87. [PMID: 34774103 PMCID: PMC8590293 DOI: 10.1186/s13321-021-00565-5] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2021] [Accepted: 10/31/2021] [Indexed: 11/10/2022] Open
Abstract
Scaffold hopping is a central task of modern medicinal chemistry for rational drug design, which aims to design molecules of novel scaffolds sharing similar target biological activities toward known hit molecules. Traditionally, scaffolding hopping depends on searching databases of available compounds that can't exploit vast chemical space. In this study, we have re-formulated this task as a supervised molecule-to-molecule translation to generate hopped molecules novel in 2D structure but similar in 3D structure, as inspired by the fact that candidate compounds bind with their targets through 3D conformations. To efficiently train the model, we curated over 50 thousand pairs of molecules with increased bioactivity, similar 3D structure, but different 2D structure from public bioactivity database, which spanned 40 kinases commonly investigated by medicinal chemists. Moreover, we have designed a multimodal molecular transformer architecture by integrating molecular 3D conformer through a spatial graph neural network and protein sequence information through Transformer. The trained DeepHop model was shown able to generate around 70% molecules having improved bioactivity together with high 3D similarity but low 2D scaffold similarity to the template molecules. This ratio was 1.9 times higher than other state-of-the-art deep learning methods and rule- and virtual screening-based methods. Furthermore, we demonstrated that the model could generalize to new target proteins through fine-tuning with a small set of active compounds. Case studies have also shown the advantages and usefulness of DeepHop in practical scaffold hopping scenarios.
Collapse
Affiliation(s)
- Shuangjia Zheng
- School of Data and Computer Science, Sun Yat-Sen University, China, 132 East Circle at University City, Guangzhou, 510006, China
| | - Zengrong Lei
- Fermion Technology Co., Ltd, 1088 Newport East Road, Guangzhou, 510335, China
| | - Haitao Ai
- Fermion Technology Co., Ltd, 1088 Newport East Road, Guangzhou, 510335, China
| | - Hongming Chen
- Centre of Chemistry and Chemical Biology, Guangzhou Regenerative Medicine and Health Guangdong Laboratory, Guangzhou, 510530, China
| | - Daiguo Deng
- Fermion Technology Co., Ltd, 1088 Newport East Road, Guangzhou, 510335, China.
| | - Yuedong Yang
- School of Data and Computer Science, Sun Yat-Sen University, China, 132 East Circle at University City, Guangzhou, 510006, China.
| |
Collapse
|
49
|
Joshi RP, Gebauer NWA, Bontha M, Khazaieli M, James RM, Brown JB, Kumar N. 3D-Scaffold: A Deep Learning Framework to Generate 3D Coordinates of Drug-like Molecules with Desired Scaffolds. J Phys Chem B 2021; 125:12166-12176. [PMID: 34662142 DOI: 10.1021/acs.jpcb.1c06437] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/20/2023]
Abstract
The prerequisite of therapeutic drug design and discovery is to identify novel molecules and developing lead candidates with desired biophysical and biochemical properties. Deep generative models have demonstrated their ability to find such molecules by exploring a huge chemical space efficiently. An effective way to generate new molecules with desired target properties is by constraining the critical fucntional groups or the core scaffolds in the generation process. To this end, we developed a domain aware generative framework called 3D-Scaffold that takes 3D coordinates of the desired scaffold as an input and generates 3D coordinates of novel therapeutic candidates as an output while always preserving the desired scaffolds in generated structures. We demonstrated that our framework generates predominantly valid, unique, novel, and experimentally synthesizable molecules that have drug-like properties similar to the molecules in the training set. Using domain specific data sets, we generate covalent and noncovalent antiviral inhibitors targeting viral proteins. To measure the success of our framework in generating therapeutic candidates, generated structures were subjected to high throughput virtual screening via docking simulations, which shows favorable interaction against SARS-CoV-2 main protease (Mpro) and nonstructural protein endoribonuclease (NSP15) targets. Most importantly, our deep learning model performs well with relatively small 3D structural training data and quickly learns to generalize to new scaffolds, highlighting its potential application to other domains for generating target specific candidates.
Collapse
Affiliation(s)
- Rajendra P Joshi
- Pacific Northwest National Laboratory, Richland, Washington 99352, United States
| | - Niklas W A Gebauer
- Machine Learning Group, Technische Universität Berlin, 10587 Berlin, Germany.,BASLEARN - TU Berlin/BASF Joint Lab for Machine Learning, Technische Universität Berlin, 10587 Berlin, Germany.,Berlin Institute for the Foundations of Learning and Data, 10587 Berlin, Germany
| | - Mridula Bontha
- Pacific Northwest National Laboratory, Richland, Washington 99352, United States
| | - Mercedeh Khazaieli
- Pacific Northwest National Laboratory, Richland, Washington 99352, United States
| | - Rhema M James
- Pacific Northwest National Laboratory, Richland, Washington 99352, United States
| | - James B Brown
- Environmental Genomics & Systems Biology, Lawrence Berkeley National Laboratory, Berkley, California 94710, United States
| | - Neeraj Kumar
- Pacific Northwest National Laboratory, Richland, Washington 99352, United States
| |
Collapse
|
50
|
Imrie F, Hadfield TE, Bradley AR, Deane CM. Deep generative design with 3D pharmacophoric constraints. Chem Sci 2021; 12:14577-14589. [PMID: 34881010 PMCID: PMC8580048 DOI: 10.1039/d1sc02436a] [Citation(s) in RCA: 29] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2021] [Accepted: 10/18/2021] [Indexed: 12/30/2022] Open
Abstract
Generative models have increasingly been proposed as a solution to the molecular design problem. However, it has proved challenging to control the design process or incorporate prior knowledge, limiting their practical use in drug discovery. In particular, generative methods have made limited use of three-dimensional (3D) structural information even though this is critical to binding. This work describes a method to incorporate such information and demonstrates the benefit of doing so. We combine an existing graph-based deep generative model, DeLinker, with a convolutional neural network to utilise physically-meaningful 3D representations of molecules and target pharmacophores. We apply our model, DEVELOP, to both linker and R-group design, demonstrating its suitability for both hit-to-lead and lead optimisation. The 3D pharmacophoric information results in improved generation and allows greater control of the design process. In multiple large-scale evaluations, we show that including 3D pharmacophoric constraints results in substantial improvements in the quality of generated molecules. On a challenging test set derived from PDBbind, our model improves the proportion of generated molecules with high 3D similarity to the original molecule by over 300%. In addition, DEVELOP recovers 10× more of the original molecules compared to the baseline DeLinker method. Our approach is general-purpose, readily modifiable to alternate 3D representations, and can be incorporated into other generative frameworks. Code is available at https://github.com/oxpig/DEVELOP.
Collapse
Affiliation(s)
- Fergus Imrie
- Oxford Protein Informatics Group, Department of Statistics, University of Oxford Oxford OX1 3LB UK
| | - Thomas E Hadfield
- Oxford Protein Informatics Group, Department of Statistics, University of Oxford Oxford OX1 3LB UK
| | - Anthony R Bradley
- Exscientia Ltd The Schrödinger Building, Oxford Science Park Oxford OX4 4GE UK
| | - Charlotte M Deane
- Oxford Protein Informatics Group, Department of Statistics, University of Oxford Oxford OX1 3LB UK
| |
Collapse
|