201
|
Thiede LA, Krenn M, Nigam A, Aspuru-Guzik A. Curiosity in exploring chemical spaces: Intrinsicrewards for molecular reinforcement learning. MACHINE LEARNING: SCIENCE AND TECHNOLOGY 2022. [DOI: 10.1088/2632-2153/ac7ddc] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022] Open
Abstract
Abstract
Computer aided design of molecules has the potential to disrupt the field of drug and material discovery. Machine learning and deep learning in particular, made big strides in recent years and promises to greatly benefit computer aided methods. Reinforcement learning is a particularly promising approach since it enables de novo molecule design, that is molecular design, without providing any prior knowledge. However, the search space is vast, and therefore any reinforcement learning agent needs to perform efficient exploration. In this study, we examine three versions of intrinsic motivation to aid efficient exploration. The algorithms are adapted from intrinsic motivation in the literature that were developed in other settings, predominantly video games. We show that the \textit{curious} agents finds better performing molecules on two of three benchmarks. This indicates an exciting new research direction for reinforcement learning agents that can explore the chemical space out of their own motivation. This has the potential to eventually lead to unexpected new molecular designs no human has thought about so far.
Collapse
|
202
|
Zhang Y, Li Z, Duan B, Qin L, Peng J. MKGE: Knowledge Graph Embedding with Molecular Structure Information. Comput Biol Chem 2022; 100:107730. [DOI: 10.1016/j.compbiolchem.2022.107730] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2022] [Accepted: 07/12/2022] [Indexed: 11/30/2022]
|
203
|
Eckmann P, Sun K, Zhao B, Feng M, Gilson MK, Yu R. LIMO: Latent Inceptionism for Targeted Molecule Generation. PROCEEDINGS OF MACHINE LEARNING RESEARCH 2022; 162:5777-5792. [PMID: 36193121 PMCID: PMC9527083] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
Generation of drug-like molecules with high binding affinity to target proteins remains a difficult and resource-intensive task in drug discovery. Existing approaches primarily employ reinforcement learning, Markov sampling, or deep generative models guided by Gaussian processes, which can be prohibitively slow when generating molecules with high binding affinity calculated by computationally-expensive physics-based methods. We present Latent Inceptionism on Molecules (LIMO), which significantly accelerates molecule generation with an inceptionism-like technique. LIMO employs a variational autoencoder-generated latent space and property prediction by two neural networks in sequence to enable faster gradient-based reverse-optimization of molecular properties. Comprehensive experiments show that LIMO performs competitively on benchmark tasks and markedly outperforms state-of-the-art techniques on the novel task of generating drug-like compounds with high binding affinity, reaching nanomolar range against two protein targets. We corroborate these docking-based results with more accurate molecular dynamics-based calculations of absolute binding free energy and show that one of our generated drug-like compounds has a predicted K D (a measure of binding affinity) of 6 · 10-14 M against the human estrogen receptor, well beyond the affinities of typical early-stage drug candidates and most FDA-approved drugs to their respective targets. Code is available at https://github.com/Rose-STL-Lab/LIMO.
Collapse
Affiliation(s)
- Peter Eckmann
- Department of Computer Science and Engineering, UC San Diego, La Jolla, California, United States
| | - Kunyang Sun
- Department of Chemistry and Biochemistry, UC San Diego, La Jolla, California, United states
| | - Bo Zhao
- Department of Computer Science and Engineering, UC San Diego, La Jolla, California, United States
| | - Mudong Feng
- Department of Chemistry and Biochemistry, UC San Diego, La Jolla, California, United states
| | - Michael K. Gilson
- Department of Chemistry and Biochemistry, UC San Diego, La Jolla, California, United states
- Skaggs School of Pharmacy and Pharmaceutical Sciences, UC San Diego, La Jolla, California, United States
| | - Rose Yu
- Department of Computer Science and Engineering, UC San Diego, La Jolla, California, United States
| |
Collapse
|
204
|
Hatakeyama-Sato K, Adachi H, Umeki M, Kashikawa T, Kimura K, Oyaizu K. Automated Design of Li + -Conducting Polymer by Quantum-Inspired Annealing. Macromol Rapid Commun 2022; 43:e2200385. [PMID: 35759445 DOI: 10.1002/marc.202200385] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2022] [Revised: 06/05/2022] [Indexed: 11/07/2022]
Abstract
Automated molecule design by computers has been an essential topic in materials informatics. Still, generating practical structures is not easy because of the difficulty in treating material stability, synthetic difficulty, mechanical properties, and other miscellaneous parameters, often leading to the generation of junk molecules. We tackle the problem by introducing supervised/unsupervised machine learning and quantum-inspired annealing. Our autonomous molecular design system can help experimental researchers discover practical materials more efficiently. Like the human design process, new molecules are explored based on knowledge of existing compounds. A new solid-state polymer electrolyte for lithium-ion batteries is designed and synthesized, giving a promising room temperature conductivity of 10-5 S/cm with reasonable thermal, chemical, and mechanical properties. This article is protected by copyright. All rights reserved.
Collapse
Affiliation(s)
| | - Hiroki Adachi
- Department of Applied Chemistry, Waseda University, Tokyo, 169-8555, Japan
| | - Momoka Umeki
- Department of Applied Chemistry, Waseda University, Tokyo, 169-8555, Japan
| | | | | | - Kenichi Oyaizu
- Department of Applied Chemistry, Waseda University, Tokyo, 169-8555, Japan
| |
Collapse
|
205
|
Abbasi M, Santos BP, Pereira TC, Sofia R, Monteiro NRC, Simões CJV, Brito R, Ribeiro B, Oliveira JL, Arrais JP. Designing optimized drug candidates with Generative Adversarial Network. J Cheminform 2022; 14:40. [PMID: 35754029 PMCID: PMC9233801 DOI: 10.1186/s13321-022-00623-6] [Citation(s) in RCA: 19] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2022] [Accepted: 06/13/2022] [Indexed: 12/03/2022] Open
Abstract
Drug design is an important area of study for pharmaceutical businesses. However, low efficacy, off-target delivery, time consumption, and high cost are challenges and can create barriers that impact this process. Deep Learning models are emerging as a promising solution to perform de novo drug design, i.e., to generate drug-like molecules tailored to specific needs. However, stereochemistry was not explicitly considered in the generated molecules, which is inevitable in targeted-oriented molecules. This paper proposes a framework based on Feedback Generative Adversarial Network (GAN) that includes optimization strategy by incorporating Encoder-Decoder, GAN, and Predictor deep models interconnected with a feedback loop. The Encoder-Decoder converts the string notations of molecules into latent space vectors, effectively creating a new type of molecular representation. At the same time, the GAN can learn and replicate the training data distribution and, therefore, generate new compounds. The feedback loop is designed to incorporate and evaluate the generated molecules according to the multiobjective desired property at every epoch of training to ensure a steady shift of the generated distribution towards the space of the targeted properties. Moreover, to develop a more precise set of molecules, we also incorporate a multiobjective optimization selection technique based on a non-dominated sorting genetic algorithm. The results demonstrate that the proposed framework can generate realistic, novel molecules that span the chemical space. The proposed Encoder-Decoder model correctly reconstructs 99% of the datasets, including stereochemical information. The model's ability to find uncharted regions of the chemical space was successfully shown by optimizing the unbiased GAN to generate molecules with a high binding affinity to the Kappa Opioid and Adenosine [Formula: see text] receptor. Furthermore, the generated compounds exhibit high internal and external diversity levels 0.88 and 0.94, respectively, and uniqueness.
Collapse
Affiliation(s)
- Maryam Abbasi
- Univ Coimbra, Centre for Informatics and Systems of the University of Coimbra, Department of Informatics Engineering, Coimbra, Portugal
| | - Beatriz P. Santos
- Univ Coimbra, Centre for Informatics and Systems of the University of Coimbra, Department of Informatics Engineering, Coimbra, Portugal
| | - Tiago C. Pereira
- IEETA, Department of Electronics, Telecommunications and Informatics, University of Aveiro, Aveiro, Portugal
| | - Raul Sofia
- Univ Coimbra, Centre for Informatics and Systems of the University of Coimbra, Department of Informatics Engineering, Coimbra, Portugal
| | - Nelson R. C. Monteiro
- Univ Coimbra, Centre for Informatics and Systems of the University of Coimbra, Department of Informatics Engineering, Coimbra, Portugal
| | | | - Rui Brito
- BSIM Therapeutics, Instituto Pedro Nunes, Coimbra, Portugal
| | - Bernardete Ribeiro
- Univ Coimbra, Centre for Informatics and Systems of the University of Coimbra, Department of Informatics Engineering, Coimbra, Portugal
| | - José L. Oliveira
- IEETA, Department of Electronics, Telecommunications and Informatics, University of Aveiro, Aveiro, Portugal
| | - Joel P. Arrais
- Univ Coimbra, Centre for Informatics and Systems of the University of Coimbra, Department of Informatics Engineering, Coimbra, Portugal
| |
Collapse
|
206
|
|
207
|
Yang Y, Wu Z, Yao X, Kang Y, Hou T, Hsieh CY, Liu H. Exploring Low-Toxicity Chemical Space with Deep Learning for Molecular Generation. J Chem Inf Model 2022; 62:3191-3199. [PMID: 35713712 DOI: 10.1021/acs.jcim.2c00671] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Creating a wide range of new compounds that not only have ideal pharmacological properties but also easily pass long-term toxicity evaluation is still a challenging task in current drug discovery. In this study, we developed a conditional generative model by combining a semisupervised variational autoencoder (SSVAE) with an MGA toxicity predictor. Our aim is to generate molecules with low toxicity, good drug-like properties, and structural diversity. For multiobjective optimization, we have developed a method with hierarchical constraints on the toxicity space of small molecules to generate drug-like small molecules, which can also minimize the effect on the diversity of generated results. The evaluation results of the metrics indicate that the developed model has good effectiveness, novelty, and diversity. The generated molecules by this model are mainly distributed in low-toxicity regions, which suggests that our model can efficiently constrain the generation of toxic structures. In contrast to simply filtering toxic ones after generation, the low-toxicity molecular generative model can generate molecules with structural diversity. Our strategy can be used in target-based drug discovery to improve the quality of generated molecules with low-toxicity, drug-like, and highly active properties.
Collapse
Affiliation(s)
- Yuwei Yang
- School of Pharmacy, Lanzhou University, Lanzhou 730000, China
| | - Zhenxing Wu
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, P. R. China
| | - Xiaojun Yao
- College of Chemistry and Chemical Engineering, Lanzhou University, Lanzhou 730000, China
| | - Yu Kang
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, P. R. China
| | - Tingjun Hou
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, P. R. China
| | - Chang-Yu Hsieh
- Tencent Quantum Laboratory, Tencent, Shenzhen 518000, China
| | - Huanxiang Liu
- School of Pharmacy, Lanzhou University, Lanzhou 730000, China.,Faculty of Applied Science, Macao Polytechnic University, Macao, SAR 999078, China
| |
Collapse
|
208
|
Jackson IM, Webb EW, Scott PJ, James ML. In Silico Approaches for Addressing Challenges in CNS Radiopharmaceutical Design. ACS Chem Neurosci 2022; 13:1675-1683. [PMID: 35606334 PMCID: PMC9945852 DOI: 10.1021/acschemneuro.2c00269] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/08/2023] Open
Abstract
Positron emission tomography (PET) is a highly sensitive and versatile molecular imaging modality that leverages radiolabeled molecules, known as radiotracers, to interrogate biochemical processes such as metabolism, enzymatic activity, and receptor expression. The ability to probe specific molecular and cellular events longitudinally in a noninvasive manner makes PET imaging a particularly powerful technique for studying the central nervous system (CNS) in both health and disease. Unfortunately, developing and translating a single CNS PET tracer for clinical use is typically an extremely resource-intensive endeavor, often requiring synthesis and evaluation of numerous candidate molecules. While existing in vitro methods are beginning to address the challenge of derisking molecules prior to costly in vivo PET studies, most require a significant investment of resources and possess substantial limitations. In the context of CNS drug development, significant time and resources have been invested into the development and optimization of computational methods, particularly involving machine learning, to streamline the design of better CNS therapeutics. However, analogous efforts developed and validated for CNS radiotracer design are conspicuously limited. In this Perspective, we overview the requirements and challenges of CNS PET tracer design, survey the most promising computational methods for in silico CNS drug design, and bridge these two areas by discussing the potential applications and impact of computational design tools in CNS radiotracer design.
Collapse
Affiliation(s)
- Isaac M. Jackson
- Department of Radiology, Stanford University, Stanford, CA 94305
| | - E. William Webb
- Department of Radiology, University of Michigan, Ann Arbor, MI 48109
| | - Peter J.H. Scott
- Department of Radiology, University of Michigan, Ann Arbor, MI 48109;,Corresponding Authors: Peter J. H. Scott − Department of Radiology, University of Michigan, Ann Arbor, MI 48109, United States; , Michelle L. James − Departments of Radiology, and Neurology & Neurological Sciences, 1201 Welch Rd., P-206, Stanford, CA 94305-5484, United States;
| | - Michelle L. James
- Department of Radiology, Stanford University, Stanford, CA 94305;,Department of Neurology & Neurological Sciences, Stanford University, Stanford, CA 94304.,Corresponding Authors: Peter J. H. Scott − Department of Radiology, University of Michigan, Ann Arbor, MI 48109, United States; , Michelle L. James − Departments of Radiology, and Neurology & Neurological Sciences, 1201 Welch Rd., P-206, Stanford, CA 94305-5484, United States;
| |
Collapse
|
209
|
Urbina F, Lowden CT, Culberson JC, Ekins S. MegaSyn: Integrating Generative Molecular Design, Automated Analog Designer, and Synthetic Viability Prediction. ACS OMEGA 2022; 7:18699-18713. [PMID: 35694522 PMCID: PMC9178760 DOI: 10.1021/acsomega.2c01404] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/08/2022] [Accepted: 05/11/2022] [Indexed: 05/04/2023]
Abstract
Generative machine learning models have become widely adopted in drug discovery and other fields to produce new molecules and explore molecular space, with the goal of discovering novel compounds with optimized properties. These generative models are frequently combined with transfer learning or scoring of the physicochemical properties to steer generative design, yet often, they are not capable of addressing a wide variety of potential problems, as well as converge into similar molecular space when combined with a scoring function for the desired properties. In addition, these generated compounds may not be synthetically feasible, reducing their capabilities and limiting their usefulness in real-world scenarios. Here, we introduce a suite of automated tools called MegaSyn representing three components: a new hill-climb algorithm, which makes use of SMILES-based recurrent neural network (RNN) generative models, analog generation software, and retrosynthetic analysis coupled with fragment analysis to score molecules for their synthetic feasibility. We show that by deconstructing the targeted molecules and focusing on substructures, combined with an ensemble of generative models, MegaSyn generally performs well for the specific tasks of generating new scaffolds as well as targeted analogs, which are likely synthesizable and druglike. We now describe the development, benchmarking, and testing of this suite of tools and propose how they might be used to optimize molecules or prioritize promising lead compounds using these RNN examples provided by multiple test case examples.
Collapse
Affiliation(s)
- Fabio Urbina
- Collaborations
Pharmaceuticals, Inc., 840 Main Campus Drive, Lab 3510, Raleigh, North Carolina 27606, United States
| | - Christopher T. Lowden
- Workflow
Informatics Corporation, 9316 Bramden Court, Wake Forest, North Carolina 27587, United States
| | - J. Christopher Culberson
- Workflow
Informatics Corporation, 9316 Bramden Court, Wake Forest, North Carolina 27587, United States
| | - Sean Ekins
- Collaborations
Pharmaceuticals, Inc., 840 Main Campus Drive, Lab 3510, Raleigh, North Carolina 27606, United States
| |
Collapse
|
210
|
Yi H. Efficient machine learning algorithm for electroencephalogram modeling in brain–computer interfaces. Neural Comput Appl 2022. [DOI: 10.1007/s00521-020-04861-3] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
211
|
Singh S, Sunoj RB. A Transfer Learning Approach for Reaction Discovery in Small Data Situations Using Generative Model. iScience 2022; 25:104661. [PMID: 35832891 PMCID: PMC9272387 DOI: 10.1016/j.isci.2022.104661] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2022] [Revised: 05/20/2022] [Accepted: 06/16/2022] [Indexed: 11/01/2022] Open
Abstract
Sustainable practices in chemical sciences can be better realized by adopting interdisciplinary approaches that combine the advantages of machine learning (ML) on the initially acquired small data in reaction discovery. Developing new reactions generally remains heuristic and even time and resource intensive. For instance, synthesis of fluorine-containing compounds, which constitute ∼20% of the marketed drugs, relies on deoxyfluorination of abundantly available alcohols. Herein, we demonstrate the use of a recurrent neural network-based deep generative model built on a library of just 37 alcohols for effective learning and exploration of the chemical space. The proof-of-concept ML model is able to generate good quality, synthetically accessible, higher-yielding novel alcohol molecules. This protocol would have superior utility for deployment into a practical reaction discovery pipeline. Dual pronged transfer learning, both to generate and predict yields of new molecules Demonstrated the utility for an important family of deoxyfluorination of alcohols Applicable for practically more likely situations with relatively smaller data Extendable to other reaction manifolds to facilitate expedited reaction discovery
Collapse
|
212
|
MSNovelist: de novo structure generation from mass spectra. Nat Methods 2022; 19:865-870. [PMID: 35637304 PMCID: PMC9262714 DOI: 10.1038/s41592-022-01486-3] [Citation(s) in RCA: 42] [Impact Index Per Article: 21.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2021] [Accepted: 04/07/2022] [Indexed: 12/29/2022]
Abstract
Current methods for structure elucidation of small molecules rely on finding similarity with spectra of known compounds, but do not predict structures de novo for unknown compound classes. We present MSNovelist, which combines fingerprint prediction with an encoder–decoder neural network to generate structures de novo solely from tandem mass spectrometry (MS2) spectra. In an evaluation with 3,863 MS2 spectra from the Global Natural Product Social Molecular Networking site, MSNovelist predicted 25% of structures correctly on first rank, retrieved 45% of structures overall and reproduced 61% of correct database annotations, without having ever seen the structure in the training phase. Similarly, for the CASMI 2016 challenge, MSNovelist correctly predicted 26% and retrieved 57% of structures, recovering 64% of correct database annotations. Finally, we illustrate the application of MSNovelist in a bryophyte MS2 dataset, in which de novo structure prediction substantially outscored the best database candidate for seven spectra. MSNovelist is ideally suited to complement library-based annotation in the case of poorly represented analyte classes and novel compounds. MSNovelist combines fingerprint prediction with an encoder–decoder neural network for de novo structure generation of small molecules from mass spectra.
Collapse
|
213
|
Tang B, He F, Liu D, He F, Wu T, Fang M, Niu Z, Wu Z, Xu D. AI-Aided Design of Novel Targeted Covalent Inhibitors against SARS-CoV-2. Biomolecules 2022. [PMID: 35740872 DOI: 10.1101/2020.03.03.972133v1.full] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/26/2023] Open
Abstract
The drug repurposing of known approved drugs (e.g., lopinavir/ritonavir) has failed to treat SARS-CoV-2-infected patients. Therefore, it is important to generate new chemical entities against this virus. As a critical enzyme in the lifecycle of the coronavirus, the 3C-like main protease (3CLpro or Mpro) is the most attractive target for antiviral drug design. Based on a recently solved structure (PDB ID: 6LU7), we developed a novel advanced deep Q-learning network with a fragment-based drug design (ADQN-FBDD) for generating potential lead compounds targeting SARS-CoV-2 3CLpro. We obtained a series of derivatives from the lead compounds based on our structure-based optimization policy (SBOP). All of the 47 lead compounds obtained directly with our AI model and related derivatives based on the SBOP are accessible in our molecular library. These compounds can be used as potential candidates by researchers to develop drugs against SARS-CoV-2.
Collapse
Affiliation(s)
- Bowen Tang
- Department of Electrical Engineering and Computer Science, Informatics Institute, Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, MO 65211, USA
- Fujian Provincial Key Laboratory of Innovative Drug Target Research, School of Pharmaceutical Sciences, Xiamen University, Xiamen 361000, China
- MindRank AI Ltd., Hangzhou 310000, China
| | - Fengming He
- Fujian Provincial Key Laboratory of Innovative Drug Target Research, School of Pharmaceutical Sciences, Xiamen University, Xiamen 361000, China
| | - Dongpeng Liu
- Department of Electrical Engineering and Computer Science, Informatics Institute, Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, MO 65211, USA
| | - Fei He
- Department of Electrical Engineering and Computer Science, Informatics Institute, Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, MO 65211, USA
- School of Information Science and Technology, Northeast Normal University, Changchun 130117, China
| | - Tong Wu
- Department of Electrical Engineering and Computer Science, Informatics Institute, Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, MO 65211, USA
- Department of Epidemiology and Statistics, Institute of Basic Medical Sciences, Chinese Academy of Medical Sciences, School of Basic Medicine, Peking Union Medical College, Beijing 100006, China
| | - Meijuan Fang
- Fujian Provincial Key Laboratory of Innovative Drug Target Research, School of Pharmaceutical Sciences, Xiamen University, Xiamen 361000, China
| | | | - Zhen Wu
- Fujian Provincial Key Laboratory of Innovative Drug Target Research, School of Pharmaceutical Sciences, Xiamen University, Xiamen 361000, China
| | - Dong Xu
- Department of Electrical Engineering and Computer Science, Informatics Institute, Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, MO 65211, USA
| |
Collapse
|
214
|
Tang B, He F, Liu D, He F, Wu T, Fang M, Niu Z, Wu Z, Xu D. AI-Aided Design of Novel Targeted Covalent Inhibitors against SARS-CoV-2. Biomolecules 2022; 12:746. [PMID: 35740872 PMCID: PMC9220321 DOI: 10.3390/biom12060746] [Citation(s) in RCA: 21] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2022] [Revised: 05/17/2022] [Accepted: 05/20/2022] [Indexed: 02/04/2023] Open
Abstract
The drug repurposing of known approved drugs (e.g., lopinavir/ritonavir) has failed to treat SARS-CoV-2-infected patients. Therefore, it is important to generate new chemical entities against this virus. As a critical enzyme in the lifecycle of the coronavirus, the 3C-like main protease (3CLpro or Mpro) is the most attractive target for antiviral drug design. Based on a recently solved structure (PDB ID: 6LU7), we developed a novel advanced deep Q-learning network with a fragment-based drug design (ADQN-FBDD) for generating potential lead compounds targeting SARS-CoV-2 3CLpro. We obtained a series of derivatives from the lead compounds based on our structure-based optimization policy (SBOP). All of the 47 lead compounds obtained directly with our AI model and related derivatives based on the SBOP are accessible in our molecular library. These compounds can be used as potential candidates by researchers to develop drugs against SARS-CoV-2.
Collapse
Affiliation(s)
- Bowen Tang
- Department of Electrical Engineering and Computer Science, Informatics Institute, Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, MO 65211, USA; (B.T.); (D.L.); (F.H.); (T.W.)
- Fujian Provincial Key Laboratory of Innovative Drug Target Research, School of Pharmaceutical Sciences, Xiamen University, Xiamen 361000, China; (F.H.); (M.F.)
- MindRank AI Ltd., Hangzhou 310000, China;
| | - Fengming He
- Fujian Provincial Key Laboratory of Innovative Drug Target Research, School of Pharmaceutical Sciences, Xiamen University, Xiamen 361000, China; (F.H.); (M.F.)
| | - Dongpeng Liu
- Department of Electrical Engineering and Computer Science, Informatics Institute, Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, MO 65211, USA; (B.T.); (D.L.); (F.H.); (T.W.)
| | - Fei He
- Department of Electrical Engineering and Computer Science, Informatics Institute, Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, MO 65211, USA; (B.T.); (D.L.); (F.H.); (T.W.)
- School of Information Science and Technology, Northeast Normal University, Changchun 130117, China
| | - Tong Wu
- Department of Electrical Engineering and Computer Science, Informatics Institute, Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, MO 65211, USA; (B.T.); (D.L.); (F.H.); (T.W.)
- Department of Epidemiology and Statistics, Institute of Basic Medical Sciences, Chinese Academy of Medical Sciences, School of Basic Medicine, Peking Union Medical College, Beijing 100006, China
| | - Meijuan Fang
- Fujian Provincial Key Laboratory of Innovative Drug Target Research, School of Pharmaceutical Sciences, Xiamen University, Xiamen 361000, China; (F.H.); (M.F.)
| | | | - Zhen Wu
- Fujian Provincial Key Laboratory of Innovative Drug Target Research, School of Pharmaceutical Sciences, Xiamen University, Xiamen 361000, China; (F.H.); (M.F.)
| | - Dong Xu
- Department of Electrical Engineering and Computer Science, Informatics Institute, Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, MO 65211, USA; (B.T.); (D.L.); (F.H.); (T.W.)
| |
Collapse
|
215
|
Bender A, Schneider N, Segler M, Patrick Walters W, Engkvist O, Rodrigues T. Evaluation guidelines for machine learning tools in the chemical sciences. Nat Rev Chem 2022; 6:428-442. [PMID: 37117429 DOI: 10.1038/s41570-022-00391-9] [Citation(s) in RCA: 29] [Impact Index Per Article: 14.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/13/2022] [Indexed: 02/07/2023]
Abstract
Machine learning (ML) promises to tackle the grand challenges in chemistry and speed up the generation, improvement and/or ordering of research hypotheses. Despite the overarching applicability of ML workflows, one usually finds diverse evaluation study designs. The current heterogeneity in evaluation techniques and metrics leads to difficulty in (or the impossibility of) comparing and assessing the relevance of new algorithms. Ultimately, this may delay the digitalization of chemistry at scale and confuse method developers, experimentalists, reviewers and journal editors. In this Perspective, we critically discuss a set of method development and evaluation guidelines for different types of ML-based publications, emphasizing supervised learning. We provide a diverse collection of examples from various authors and disciplines in chemistry. While taking into account varying accessibility across research groups, our recommendations focus on reporting completeness and standardizing comparisons between tools. We aim to further contribute to improved ML transparency and credibility by suggesting a checklist of retro-/prospective tests and dissecting their importance. We envisage that the wide adoption and continuous update of best practices will encourage an informed use of ML on real-world problems related to the chemical sciences.
Collapse
|
216
|
Bung N, Krishnan SR, Roy A. An In Silico Explainable Multiparameter Optimization Approach for De Novo Drug Design against Proteins from the Central Nervous System. J Chem Inf Model 2022; 62:2685-2695. [PMID: 35581002 DOI: 10.1021/acs.jcim.2c00462] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
The aim of drug design and development is to produce a drug that can inhibit the target protein and possess a balanced physicochemical and toxicity profile. Traditionally, this is a multistep process where different parameters such as activity and physicochemical and pharmacokinetic properties are optimized sequentially, which often leads to high attrition rate during later stages of drug design and development. We have developed a deep learning-based de novo drug design method that can design novel small molecules by optimizing target specificity as well as multiple parameters (including late-stage parameters) in a single step. All possible combinations of parameters were optimized to understand the effect of each parameter over the other parameters. An explainable predictive model was used to identify the molecular fragments responsible for the property being optimized. The proposed method was applied against the human 5-hydroxy tryptamine receptor 1B (5-HT1B), a protein from the central nervous system (CNS). Various physicochemical properties specific to CNS drugs were considered along with the target specificity and blood-brain barrier permeability (BBBP), which act as an additional challenge for CNS drug delivery. The contribution of each parameter toward molecule design was identified by analyzing the properties of generated small molecules from optimization of all possible parameter combinations. The final optimized generative model was able to design similar inhibitors compared to known inhibitors of 5-HT1B. In addition, the functional groups of the generated small molecules that guide the BBBP predictive model were identified through feature attribution techniques.
Collapse
Affiliation(s)
- Navneet Bung
- TCS Research (Life Sciences Division), Tata Consultancy Services Limited, Hyderabad 500081, India
| | | | - Arijit Roy
- TCS Research (Life Sciences Division), Tata Consultancy Services Limited, Hyderabad 500081, India
| |
Collapse
|
217
|
Ding W, Nakai K, Gong H. Protein design via deep learning. Brief Bioinform 2022; 23:bbac102. [PMID: 35348602 PMCID: PMC9116377 DOI: 10.1093/bib/bbac102] [Citation(s) in RCA: 20] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2021] [Revised: 02/26/2022] [Accepted: 03/01/2022] [Indexed: 12/11/2022] Open
Abstract
Proteins with desired functions and properties are important in fields like nanotechnology and biomedicine. De novo protein design enables the production of previously unseen proteins from the ground up and is believed as a key point for handling real social challenges. Recent introduction of deep learning into design methods exhibits a transformative influence and is expected to represent a promising and exciting future direction. In this review, we retrospect the major aspects of current advances in deep-learning-based design procedures and illustrate their novelty in comparison with conventional knowledge-based approaches through noticeable cases. We not only describe deep learning developments in structure-based protein design and direct sequence design, but also highlight recent applications of deep reinforcement learning in protein design. The future perspectives on design goals, challenges and opportunities are also comprehensively discussed.
Collapse
Affiliation(s)
- Wenze Ding
- School of Artificial Intelligence, Nanjing University of Information Science and Technology, Nanjing 210044, China
- School of Future Technology, Nanjing University of Information Science and Technology, Nanjing 210044, China
- MOE Key Laboratory of Bioinformatics, School of Life Sciences, Tsinghua University, Beijing 100084, China
- Beijing Advanced Innovation Center for Structural Biology, Tsinghua University, Beijing 100084, China
| | - Kenta Nakai
- Institute of Medical Science, the University of Tokyo, Tokyo 1088639, Japan
| | - Haipeng Gong
- MOE Key Laboratory of Bioinformatics, School of Life Sciences, Tsinghua University, Beijing 100084, China
- Beijing Advanced Innovation Center for Structural Biology, Tsinghua University, Beijing 100084, China
| |
Collapse
|
218
|
Xie W, Wang F, Li Y, Lai L, Pei J. Advances and Challenges in De Novo Drug Design Using Three-Dimensional Deep Generative Models. J Chem Inf Model 2022; 62:2269-2279. [PMID: 35544331 DOI: 10.1021/acs.jcim.2c00042] [Citation(s) in RCA: 17] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
A persistent goal for de novo drug design is to generate novel chemical compounds with desirable properties in a labor-, time-, and cost-efficient manner. Deep generative models provide alternative routes to this goal. Numerous model architectures and optimization strategies have been explored in recent years, most of which have been developed to generate two-dimensional molecular structures. Some generative models aiming at three-dimensional (3D) molecule generation have also been proposed, gaining attention for their unique advantages and potential to directly design drug-like molecules in a target-conditioning manner. This review highlights current developments in 3D molecular generative models combined with deep learning and discusses future directions for de novo drug design.
Collapse
Affiliation(s)
- Weixin Xie
- Center for Quantitative Biology, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing 100871, China
| | - Fanhao Wang
- Center for Quantitative Biology, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing 100871, China
| | - Yibo Li
- Center for Life Sciences, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing 100871, China
| | - Luhua Lai
- Center for Quantitative Biology, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing 100871, China.,Peking-Tsinghua Center for Life Science at BNLMS, College of Chemistry and Molecular Engineering, Peking University, Beijing 100871, China
| | - Jianfeng Pei
- Center for Quantitative Biology, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing 100871, China
| |
Collapse
|
219
|
Panapitiya G, Girard M, Hollas A, Sepulveda J, Murugesan V, Wang W, Saldanha E. Evaluation of Deep Learning Architectures for Aqueous Solubility Prediction. ACS OMEGA 2022; 7:15695-15710. [PMID: 35571767 PMCID: PMC9096921 DOI: 10.1021/acsomega.2c00642] [Citation(s) in RCA: 22] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/31/2022] [Accepted: 04/11/2022] [Indexed: 05/17/2023]
Abstract
Determining the aqueous solubility of molecules is a vital step in many pharmaceutical, environmental, and energy storage applications. Despite efforts made over decades, there are still challenges associated with developing a solubility prediction model with satisfactory accuracy for many of these applications. The goals of this study are to assess current deep learning methods for solubility prediction, develop a general model capable of predicting the solubility of a broad range of organic molecules, and to understand the impact of data properties, molecular representation, and modeling architecture on predictive performance. Using the largest currently available solubility data set, we implement deep learning-based models to predict solubility from the molecular structure and explore several different molecular representations including molecular descriptors, simplified molecular-input line-entry system strings, molecular graphs, and three-dimensional atomic coordinates using four different neural network architectures-fully connected neural networks, recurrent neural networks, graph neural networks (GNNs), and SchNet. We find that models using molecular descriptors achieve the best performance, with GNN models also achieving good performance. We perform extensive error analysis to understand the molecular properties that influence model performance, perform feature analysis to understand which information about the molecular structure is most valuable for prediction, and perform a transfer learning and data size study to understand the impact of data availability on model performance.
Collapse
Affiliation(s)
- Gihan Panapitiya
- Pacific Northwest National
Laboratory, Richland, Washington 99352, United States
| | - Michael Girard
- Pacific Northwest National
Laboratory, Richland, Washington 99352, United States
| | - Aaron Hollas
- Pacific Northwest National
Laboratory, Richland, Washington 99352, United States
| | - Jonathan Sepulveda
- Pacific Northwest National
Laboratory, Richland, Washington 99352, United States
| | | | - Wei Wang
- Pacific Northwest National
Laboratory, Richland, Washington 99352, United States
| | - Emily Saldanha
- Pacific Northwest National
Laboratory, Richland, Washington 99352, United States
| |
Collapse
|
220
|
Hadfield TE, Imrie F, Merritt A, Birchall K, Deane CM. Incorporating Target-Specific Pharmacophoric Information into Deep Generative Models for Fragment Elaboration. J Chem Inf Model 2022; 62:2280-2292. [PMID: 35499971 PMCID: PMC9131447 DOI: 10.1021/acs.jcim.1c01311] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
Abstract
Despite recent interest in deep generative models for scaffold elaboration, their applicability to fragment-to-lead campaigns has so far been limited. This is primarily due to their inability to account for local protein structure or a user's design hypothesis. We propose a novel method for fragment elaboration, STRIFE, that overcomes these issues. STRIFE takes as input fragment hotspot maps (FHMs) extracted from a protein target and processes them to provide meaningful and interpretable structural information to its generative model, which in turn is able to rapidly generate elaborations with complementary pharmacophores to the protein. In a large-scale evaluation, STRIFE outperforms existing, structure-unaware, fragment elaboration methods in proposing highly ligand-efficient elaborations. In addition to automatically extracting pharmacophoric information from a protein target's FHM, STRIFE optionally allows the user to specify their own design hypotheses.
Collapse
Affiliation(s)
- Thomas E Hadfield
- Oxford Protein Informatics Group, Department of Statistics, University of Oxford, Oxford OX1 3LB, United Kingdom
| | - Fergus Imrie
- Oxford Protein Informatics Group, Department of Statistics, University of Oxford, Oxford OX1 3LB, United Kingdom
| | - Andy Merritt
- LifeArc, SBC Open Innovation Campus, Stevenage SG1 2FX, United Kingdom
| | - Kristian Birchall
- LifeArc, SBC Open Innovation Campus, Stevenage SG1 2FX, United Kingdom
| | - Charlotte M Deane
- Oxford Protein Informatics Group, Department of Statistics, University of Oxford, Oxford OX1 3LB, United Kingdom
| |
Collapse
|
221
|
Moshawih S, Goh HP, Kifli N, Idris AC, Yassin H, Kotra V, Goh KW, Liew KB, Ming LC. Synergy between machine learning and natural products cheminformatics: Application to the lead discovery of anthraquinone derivatives. Chem Biol Drug Des 2022; 100:185-217. [PMID: 35490393 DOI: 10.1111/cbdd.14062] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2022] [Revised: 04/15/2022] [Accepted: 04/23/2022] [Indexed: 11/28/2022]
Abstract
Cheminformatics utilizing machine learning (ML) techniques have opened up a new horizon in drug discovery. This is owing to vast chemical space expansion with rocketing numbers of expected hits and lead compounds that match druggable macromolecular targets, in particular from natural compounds. Due to the natural products' (NP) structural complexity, uniqueness, and diversity, they could occupy a bigger space in pharmaceuticals, allowing the industry to pursue more selective leads in the nanomolar range of binding affinity. ML is an essential part of each step of the drug design pipeline, such as target prediction, compound library preparation, and lead optimization. Notably, molecular mechanic and dynamic simulations, induced docking, and free energy perturbations are essential in predicting best binding poses, binding free energy values, and molecular mechanics force fields. Those applications have leveraged from artificial intelligence (AI), which decreases the computational costs required for such costly simulations. This review aimed to describe chemical space and compound libraries related to NPs. High-throughput screening utilized for fractionating NPs and high-throughput virtual screening and their strategies, and significance, are reviewed. Particular emphasis was given to AI approaches, ML tools, algorithms, and techniques, especially in drug discovery of macrocyclic compounds and approaches in computer-aided and ML-based drug discovery. Anthraquinone derivatives were discussed as a source of new lead compounds that can be developed using ML tools for diverse medicinal uses such as cancer, infectious diseases, and metabolic disorders. Furthermore, the power of principal component analysis in understanding relevant protein conformations, and molecular modeling of protein-ligand interaction were also presented. Apart from being a concise reference for cheminformatics, this review is a useful text to understand the application of ML-based algorithms to molecular dynamics simulation and in silico absorption, distribution, metabolism, excretion, and toxicity prediction.
Collapse
Affiliation(s)
- Said Moshawih
- PAP Rashidah Sa'adatul Bolkiah Institute of Health Sciences, Universiti Brunei Darussalam, Gadong, Brunei Darussalam
| | - Hui Poh Goh
- PAP Rashidah Sa'adatul Bolkiah Institute of Health Sciences, Universiti Brunei Darussalam, Gadong, Brunei Darussalam
| | - Nurolaini Kifli
- PAP Rashidah Sa'adatul Bolkiah Institute of Health Sciences, Universiti Brunei Darussalam, Gadong, Brunei Darussalam
| | - Azam Che Idris
- Faculty of Integrated Technologies, Universiti Brunei Darussalam, Gadong, Brunei Darussalam
| | - Hayati Yassin
- Faculty of Integrated Technologies, Universiti Brunei Darussalam, Gadong, Brunei Darussalam
| | - Vijay Kotra
- Faculty of Pharmacy, Quest International University, Perak, Malaysia
| | - Khang Wen Goh
- Faculty of Data Science and Information Technology, INTI International University, Nilai, Malaysia
| | - Kai Bin Liew
- Faculty of Pharmacy, University of Cyberjaya, Cyberjaya, Malaysia
| | - Long Chiau Ming
- PAP Rashidah Sa'adatul Bolkiah Institute of Health Sciences, Universiti Brunei Darussalam, Gadong, Brunei Darussalam
| |
Collapse
|
222
|
Nag S, Baidya ATK, Mandal A, Mathew AT, Das B, Devi B, Kumar R. Deep learning tools for advancing drug discovery and development. 3 Biotech 2022; 12:110. [PMID: 35433167 PMCID: PMC8994527 DOI: 10.1007/s13205-022-03165-8] [Citation(s) in RCA: 17] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2021] [Accepted: 03/18/2022] [Indexed: 12/26/2022] Open
Abstract
A few decades ago, drug discovery and development were limited to a bunch of medicinal chemists working in a lab with enormous amount of testing, validations, and synthetic procedures, all contributing to considerable investments in time and wealth to get one drug out into the clinics. The advancements in computational techniques combined with a boom in multi-omics data led to the development of various bioinformatics/pharmacoinformatics/cheminformatics tools that have helped speed up the drug development process. But with the advent of artificial intelligence (AI), machine learning (ML) and deep learning (DL), the conventional drug discovery process has been further rationalized. Extensive biological data in the form of big data present in various databases across the globe acts as the raw materials for the ML/DL-based approaches and helps in accurate identifications of patterns and models which can be used to identify therapeutically active molecules with much fewer investments on time, workforce and wealth. In this review, we have begun by introducing the general concepts in the drug discovery pipeline, followed by an outline of the fields in the drug discovery process where ML/DL can be utilized. We have also introduced ML and DL along with their applications, various learning methods, and training models used to develop the ML/DL-based algorithms. Furthermore, we have summarized various DL-based tools existing in the public domain with their application in the drug discovery paradigm which includes DL tools for identification of drug targets and drug-target interaction such as DeepCPI, DeepDTA, WideDTA, PADME DeepAffinity, and DeepPocket. Additionally, we have discussed various DL-based models used in protein structure prediction, de novo design of new chemical scaffolds, virtual screening of chemical libraries for hit identification, absorption, distribution, metabolism, excretion, and toxicity (ADMET) prediction, metabolite prediction, clinical trial design, and oral bioavailability prediction. In the end, we have tried to shed light on some of the successful ML/DL-based models used in the drug discovery and development pipeline while also discussing the current challenges and prospects of the application of DL tools in drug discovery and development. We believe that this review will be useful for medicinal and computational chemists searching for DL tools for use in their drug discovery projects.
Collapse
Affiliation(s)
- Sagorika Nag
- Department of Pharmaceutical Engineering and Technology, Indian Institute of Technology (B.H.U.), Varanasi, UP 221005 India
| | - Anurag T. K. Baidya
- Department of Pharmaceutical Engineering and Technology, Indian Institute of Technology (B.H.U.), Varanasi, UP 221005 India
| | - Abhimanyu Mandal
- Department of Pharmaceutical Engineering and Technology, Indian Institute of Technology (B.H.U.), Varanasi, UP 221005 India
| | - Alen T. Mathew
- Department of Pharmaceutical Engineering and Technology, Indian Institute of Technology (B.H.U.), Varanasi, UP 221005 India
| | - Bhanuranjan Das
- Department of Pharmaceutical Engineering and Technology, Indian Institute of Technology (B.H.U.), Varanasi, UP 221005 India
| | - Bharti Devi
- Department of Pharmaceutical Engineering and Technology, Indian Institute of Technology (B.H.U.), Varanasi, UP 221005 India
| | - Rajnish Kumar
- Department of Pharmaceutical Engineering and Technology, Indian Institute of Technology (B.H.U.), Varanasi, UP 221005 India
| |
Collapse
|
223
|
GEOM, energy-annotated molecular conformations for property prediction and molecular generation. Sci Data 2022; 9:185. [PMID: 35449137 PMCID: PMC9023519 DOI: 10.1038/s41597-022-01288-4] [Citation(s) in RCA: 43] [Impact Index Per Article: 21.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2021] [Accepted: 03/04/2022] [Indexed: 12/23/2022] Open
Abstract
Machine learning (ML) outperforms traditional approaches in many molecular design tasks. ML models usually predict molecular properties from a 2D chemical graph or a single 3D structure, but neither of these representations accounts for the ensemble of 3D conformers that are accessible to a molecule. Property prediction could be improved by using conformer ensembles as input, but there is no large-scale dataset that contains graphs annotated with accurate conformers and experimental data. Here we use advanced sampling and semi-empirical density functional theory (DFT) to generate 37 million molecular conformations for over 450,000 molecules. The Geometric Ensemble Of Molecules (GEOM) dataset contains conformers for 133,000 species from QM9, and 317,000 species with experimental data related to biophysics, physiology, and physical chemistry. Ensembles of 1,511 species with BACE-1 inhibition data are also labeled with high-quality DFT free energies in an implicit water solvent, and 534 ensembles are further optimized with DFT. GEOM will assist in the development of models that predict properties from conformer ensembles, and generative models that sample 3D conformations. Measurement(s) | Conformer geometries and properties | Technology Type(s) | Computational Chemistry |
Collapse
|
224
|
Rodríguez-Pérez R, Miljković F, Bajorath J. Machine Learning in Chemoinformatics and Medicinal Chemistry. Annu Rev Biomed Data Sci 2022; 5:43-65. [PMID: 35440144 DOI: 10.1146/annurev-biodatasci-122120-124216] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
In chemoinformatics and medicinal chemistry, machine learning has evolved into an important approach. In recent years, increasing computational resources and new deep learning algorithms have put machine learning onto a new level, addressing previously unmet challenges in pharmaceutical research. In silico approaches for compound activity predictions, de novo design, and reaction modeling have been further advanced by new algorithmic developments and the emergence of big data in the field. Herein, novel applications of machine learning and deep learning in chemoinformatics and medicinal chemistry are reviewed. Opportunities and challenges for new methods and applications are discussed, placing emphasis on proper baseline comparisons, robust validation methodologies, and new applicability domains. Expected final online publication date for the Annual Review of Biomedical Data Science, Volume 5 is August 2022. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.
Collapse
Affiliation(s)
- Raquel Rodríguez-Pérez
- Department of Life Science Informatics, B-IT (Bonn-Aachen International Center for Information Technology), Chemical Biology and Medicinal Chemistry Program Unit, LIMES (Life and Medical Sciences Institute), Rheinische Friedrich-Wilhelms-Universität, Bonn, Germany; .,Current affiliation: Novartis Institutes for Biomedical Research, Novartis Campus, Basel, Switzerland
| | - Filip Miljković
- Department of Life Science Informatics, B-IT (Bonn-Aachen International Center for Information Technology), Chemical Biology and Medicinal Chemistry Program Unit, LIMES (Life and Medical Sciences Institute), Rheinische Friedrich-Wilhelms-Universität, Bonn, Germany; .,Current affiliation: Data Science and AI, Imaging and Data Analytics, Clinical Pharmacology and Safety Sciences, R&D AstraZeneca, Gothenburg, Sweden
| | - Jürgen Bajorath
- Department of Life Science Informatics, B-IT (Bonn-Aachen International Center for Information Technology), Chemical Biology and Medicinal Chemistry Program Unit, LIMES (Life and Medical Sciences Institute), Rheinische Friedrich-Wilhelms-Universität, Bonn, Germany;
| |
Collapse
|
225
|
Ma R, Zhang H, Luo T. Exploring High Thermal Conductivity Amorphous Polymers Using Reinforcement Learning. ACS APPLIED MATERIALS & INTERFACES 2022; 14:15587-15598. [PMID: 35344333 DOI: 10.1021/acsami.1c23610] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Developing amorphous polymers with desirable thermal conductivity has significant implications, as they are ubiquitous in applications where thermal transport is critical. Conventional Edisonian approaches are slow and without guarantee of success in material development. In this work, using a reinforcement learning scheme, we design polymers with thermal conductivity above 0.400 W/m·K. We leverage a machine learning model trained against 469 thermal conductivity data calculated from high-throughput molecular dynamics (MD) simulations as the surrogate for thermal conductivity prediction, and we use a recurrent neural network trained with around one million virtual polymer structures as a polymer generator. For all generated polymers with thermal conductivity ≥0.400 W/m·K, we have evaluated their synthesizability by calculating the synthetic accessibility score and validated the thermal conductivity of selected polymers using MD simulations. The best thermally conductive polymer designed has an MD-calculated thermal conductivity of 0.693 W/m·K, which is also estimated to be easily synthesizable. Our demonstrated inverse design scheme based on reinforcement learning may advance polymer development with target properties, and the scheme can also be generalized to other material development tasks for different applications.
Collapse
Affiliation(s)
- Ruimin Ma
- Department of Aerospace and Mechanical Engineering, University of Notre Dame, Notre Dame, Indiana 46556, United States
| | - Hanfeng Zhang
- Department of Aerospace and Mechanical Engineering, University of Notre Dame, Notre Dame, Indiana 46556, United States
| | - Tengfei Luo
- Department of Aerospace and Mechanical Engineering, University of Notre Dame, Notre Dame, Indiana 46556, United States
- Department of Chemical and Biomolecular Engineering, University of Notre Dame, Notre Dame, Indiana 46556, United States
| |
Collapse
|
226
|
Langevin M, Vuilleumier R, Bianciotto M. Explaining and avoiding failure modes in goal-directed generation of small molecules. J Cheminform 2022; 14:20. [PMID: 35365218 PMCID: PMC8973583 DOI: 10.1186/s13321-022-00601-y] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2021] [Accepted: 03/20/2022] [Indexed: 11/16/2022] Open
Abstract
Despite growing interest and success in automated in-silico molecular design, questions remain regarding the ability of goal-directed generation algorithms to perform unbiased exploration of novel chemical spaces. A specific phenomenon has recently been highlighted: goal-directed generation guided with machine learning models produce molecules with high scores according to the optimization model, but low scores according to control models, even when trained on the same data distribution and the same target. In this work, we show that this worrisome behavior is actually due to issues with the predictive models and not the goal-directed generation algorithms. We show that with appropriate predictive models, this issue can be resolved, and molecules generated have high scores according to both the optimization and the control models.
Collapse
Affiliation(s)
- Maxime Langevin
- Molecular Design Sciences - Integrated Drug Discovery, Sanofi R&D, 94400, Vitry-sur-Seine, France.,PASTEUR, Département de chimie, École Normale Supérieure, PSL University, Sorbonne Université, CNRS, 75005, Paris, France
| | - Rodolphe Vuilleumier
- PASTEUR, Département de chimie, École Normale Supérieure, PSL University, Sorbonne Université, CNRS, 75005, Paris, France
| | - Marc Bianciotto
- Molecular Design Sciences - Integrated Drug Discovery, Sanofi R&D, 94400, Vitry-sur-Seine, France.
| |
Collapse
|
227
|
Hua Y, Fang X, Xing G, Xu Y, Liang L, Deng C, Dai X, Liu H, Lu T, Zhang Y, Chen Y. Effective Reaction-Based De Novo Strategy for Kinase Targets: A Case Study on MERTK Inhibitors. J Chem Inf Model 2022; 62:1654-1668. [PMID: 35353505 DOI: 10.1021/acs.jcim.2c00068] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
Abstract
Reaction-based de novo design is the computational generation of novel molecular structures by linking building blocks using reaction vectors derived from chemistry knowledge. In this work, we first adopted a recurrent neural network (RNN) model to generate three groups of building blocks with different functional groups and then constructed an in silico target-focused combinatorial library based on chemical reaction rules. Mer tyrosine kinase (MERTK) was used as a study case. Combined with a scaffold enrichment analysis, 15 novel MERTK inhibitors covering four scaffolds were achieved. Among them, compound 5a obtained an IC50 value of 53.4 nM against MERTK without any further optimization. The efficiency of hit identification could be significantly improved by shrinking the compound library with the fragment iterative optimization strategy and enriching the dominant scaffold in the hinge region. We hope that this strategy can provide new insights for accelerating the drug discovery process.
Collapse
Affiliation(s)
- Yi Hua
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing 211198, China
| | - Xiaobao Fang
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing 211198, China
| | - Guomeng Xing
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing 211198, China
| | - Yuan Xu
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing 211198, China
| | - Li Liang
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing 211198, China
| | - Chenglong Deng
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing 211198, China
| | - Xiaowen Dai
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing 211198, China
| | - Haichun Liu
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing 211198, China
| | - Tao Lu
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing 211198, China.,State Key Laboratory of Natural Medicines, China Pharmaceutical University, 24 Tongjiaxiang, Nanjing 210009, China
| | - Yanmin Zhang
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing 211198, China
| | - Yadong Chen
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing 211198, China
| |
Collapse
|
228
|
He J, Nittinger E, Tyrchan C, Czechtizky W, Patronov A, Bjerrum EJ, Engkvist O. Transformer-based molecular optimization beyond matched molecular pairs. J Cheminform 2022; 14:18. [PMID: 35346368 PMCID: PMC8962145 DOI: 10.1186/s13321-022-00599-3] [Citation(s) in RCA: 19] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2021] [Accepted: 03/11/2022] [Indexed: 11/11/2022] Open
Abstract
Molecular optimization aims to improve the drug profile of a starting molecule. It is a fundamental problem in drug discovery but challenging due to (i) the requirement of simultaneous optimization of multiple properties and (ii) the large chemical space to explore. Recently, deep learning methods have been proposed to solve this task by mimicking the chemist’s intuition in terms of matched molecular pairs (MMPs). Although MMPs is a widely used strategy by medicinal chemists, it offers limited capability in terms of exploring the space of structural modifications, therefore does not cover the complete space of solutions. Often more general transformations beyond the nature of MMPs are feasible and/or necessary, e.g. simultaneous modifications of the starting molecule at different places including the core scaffold. This study aims to provide a general methodology that offers more general structural modifications beyond MMPs. In particular, the same Transformer architecture is trained on different datasets. These datasets consist of a set of molecular pairs which reflect different types of transformations. Beyond MMP transformation, datasets reflecting general structural changes are constructed from ChEMBL based on two approaches: Tanimoto similarity (allows for multiple modifications) and scaffold matching (allows for multiple modifications but keep the scaffold constant) respectively. We investigate how the model behavior can be altered by tailoring the dataset while using the same model architecture. Our results show that the models trained on differently prepared datasets transform a given starting molecule in a way that it reflects the nature of the dataset used for training the model. These models could complement each other and unlock the capability for the chemists to pursue different options for improving a starting molecule.
Collapse
|
229
|
|
230
|
Creanza TM, Lamanna G, Delre P, Contino M, Corriero N, Saviano M, Mangiatordi GF, Ancona N. DeLA-Drug: A Deep Learning Algorithm for Automated Design of Druglike Analogues. J Chem Inf Model 2022; 62:1411-1424. [PMID: 35294184 DOI: 10.1021/acs.jcim.2c00205] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
In this paper, we present a deep learning algorithm for automated design of druglike analogues (DeLA-Drug), a recurrent neural network (RNN) model composed of two long short-term memory (LSTM) layers and conceived for data-driven generation of similar-to-bioactive compounds. DeLA-Drug captures the syntax of SMILES strings of more than 1 million compounds belonging to the ChEMBL28 database and, by employing a new strategy called sampling with substitutions (SWS), generates molecules starting from a single user-defined query compound. Remarkably, the algorithm preserves druglikeness and synthetic accessibility of the known bioactive compounds present in the ChEMBL28 repository. The absence of any time-demanding fine-tuning procedure enables DeLA-Drug to perform a fast generation of focused libraries for further high-throughput screening and makes it a suitable tool for performing de novo design even in low-data regimes. To provide a concrete idea of its applicability, DeLA-Drug was applied to the cannabinoid receptor subtype 2 (CB2R), a known target involved in different pathological conditions such as cancer and neurodegeneration. DeLA-Drug, available as a free web platform (http://www.ba.ic.cnr.it/softwareic/deladrugportal/), can help medicinal chemists interested in generating analogues of compounds already available in their laboratories and, for this reason, good candidates for an easy and low-cost synthesis.
Collapse
Affiliation(s)
- Teresa Maria Creanza
- CNR─Institute of Intelligent Industrial Technologies and Systems for Advanced Manufacturing, Via Amendola 122/o, 70126 Bari, Italy
| | - Giuseppe Lamanna
- Chemistry Department, University of Bari "Aldo Moro", via E. Orabona, 4, I-70125 Bari, Italy.,CNR─Institute of Crystallography, Via Amendola 122/o, 70126 Bari, Italy
| | - Pietro Delre
- Chemistry Department, University of Bari "Aldo Moro", via E. Orabona, 4, I-70125 Bari, Italy.,CNR─Institute of Crystallography, Via Amendola 122/o, 70126 Bari, Italy
| | - Marialessandra Contino
- Department of Pharmacy─Pharmaceutical Sciences, University of Bari "Aldo Moro", via E. Orabona, 4, I-70125 Bari, Italy
| | - Nicola Corriero
- CNR─Institute of Crystallography, Via Amendola 122/o, 70126 Bari, Italy
| | - Michele Saviano
- CNR─Institute of Crystallography, Via Amendola 122/o, 70126 Bari, Italy
| | | | - Nicola Ancona
- CNR─Institute of Intelligent Industrial Technologies and Systems for Advanced Manufacturing, Via Amendola 122/o, 70126 Bari, Italy
| |
Collapse
|
231
|
Moret M, Grisoni F, Katzberger P, Schneider G. Perplexity-Based Molecule Ranking and Bias Estimation of Chemical Language Models. J Chem Inf Model 2022; 62:1199-1206. [PMID: 35191696 PMCID: PMC8924923 DOI: 10.1021/acs.jcim.2c00079] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2022] [Indexed: 02/07/2023]
Abstract
Chemical language models (CLMs) can be employed to design molecules with desired properties. CLMs generate new chemical structures in the form of textual representations, such as the simplified molecular input line entry system (SMILES) strings. However, the quality of these de novo generated molecules is difficult to assess a priori. In this study, we apply the perplexity metric to determine the degree to which the molecules generated by a CLM match the desired design objectives. This model-intrinsic score allows identifying and ranking the most promising molecular designs based on the probabilities learned by the CLM. Using perplexity to compare "greedy" (beam search) with "explorative" (multinomial sampling) methods for SMILES generation, certain advantages of multinomial sampling become apparent. Additionally, perplexity scoring is performed to identify undesired model biases introduced during model training and allows the development of a new ranking system to remove those undesired biases.
Collapse
Affiliation(s)
- Michael Moret
- Department
of Chemistry and Applied Biosciences, ETH
Zurich, RETHINK, Vladimir-Prelog-Weg 4, Zurich 8093, Switzerland
| | - Francesca Grisoni
- Institute
for Complex Molecular Systems, Department of Biomedical Engineering, Eindhoven University of Technology, Groene Loper 7, Eindhoven 5612AZ, Netherlands
- Center
for Living Technologies, Alliance TU/e,
WUR, UU, UMC Utrecht, Princetonlaan 6, Utrecht 3584 CB, The Netherlands
| | - Paul Katzberger
- Department
of Chemistry and Applied Biosciences, ETH
Zurich, RETHINK, Vladimir-Prelog-Weg 4, Zurich 8093, Switzerland
| | - Gisbert Schneider
- Department
of Chemistry and Applied Biosciences, ETH
Zurich, RETHINK, Vladimir-Prelog-Weg 4, Zurich 8093, Switzerland
- ETH
Singapore SEC Ltd., 1
CREATE Way, #06-01 CREATE Tower, Singapore 138602, Singapore
| |
Collapse
|
232
|
Martinelli DD. Generative machine learning for de novo drug discovery: A systematic review. Comput Biol Med 2022; 145:105403. [PMID: 35339849 DOI: 10.1016/j.compbiomed.2022.105403] [Citation(s) in RCA: 25] [Impact Index Per Article: 12.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2022] [Revised: 03/10/2022] [Accepted: 03/11/2022] [Indexed: 02/08/2023]
Abstract
Recent research on artificial intelligence indicates that machine learning algorithms can auto-generate novel drug-like molecules. Generative models have revolutionized de novo drug discovery, rendering the explorative process more efficient. Several model frameworks and input formats have been proposed to enhance the performance of intelligent algorithms in generative molecular design. In this systematic literature review of experimental articles and reviews over the last five years, machine learning models, challenges associated with computational molecule design along with proposed solutions, and molecular encoding methods are discussed. A query-based search of the PubMed, ScienceDirect, Springer, Wiley Online Library, arXiv, MDPI, bioRxiv, and IEEE Xplore databases yielded 87 studies. Twelve additional studies were identified via citation searching. Of the articles in which machine learning was implemented, six prominent algorithms were identified: long short-term memory recurrent neural networks (LSTM-RNNs), variational autoencoders (VAEs), generative adversarial networks (GANs), adversarial autoencoders (AAEs), evolutionary algorithms, and gated recurrent unit (GRU-RNNs). Furthermore, eight central challenges were designated: homogeneity of generated molecular libraries, deficient synthesizability, limited assay data, model interpretability, incapacity for multi-property optimization, incomparability, restricted molecule size, and uncertainty in model evaluation. Molecules were encoded either as strings, which were occasionally augmented using randomization, as 2D graphs, or as 3D graphs. Statistical analysis and visualization are performed to illustrate how approaches to machine learning in de novo drug design have evolved over the past five years. Finally, future opportunities and reservations are discussed.
Collapse
|
233
|
Bilodeau C, Jin W, Jaakkola T, Barzilay R, Jensen KF. Generative models for molecular discovery: Recent advances and challenges. WIRES COMPUTATIONAL MOLECULAR SCIENCE 2022. [DOI: 10.1002/wcms.1608] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Affiliation(s)
- Camille Bilodeau
- Department of Chemical Engineering Massachusetts Institute of Technology Cambridge Massachusetts USA
| | - Wengong Jin
- Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology Cambridge Massachusetts USA
| | - Tommi Jaakkola
- Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology Cambridge Massachusetts USA
| | - Regina Barzilay
- Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology Cambridge Massachusetts USA
| | - Klavs F. Jensen
- Department of Chemical Engineering Massachusetts Institute of Technology Cambridge Massachusetts USA
| |
Collapse
|
234
|
Ragoza M, Masuda T, Koes DR. Generating 3D molecules conditional on receptor binding sites with deep generative models. Chem Sci 2022; 13:2701-2713. [PMID: 35356675 PMCID: PMC8890264 DOI: 10.1039/d1sc05976a] [Citation(s) in RCA: 38] [Impact Index Per Article: 19.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2021] [Accepted: 02/06/2022] [Indexed: 11/22/2022] Open
Abstract
The goal of structure-based drug discovery is to find small molecules that bind to a given target protein. Deep learning has been used to generate drug-like molecules with certain cheminformatic properties, but has not yet been applied to generating 3D molecules predicted to bind to proteins by sampling the conditional distribution of protein-ligand binding interactions. In this work, we describe for the first time a deep learning system for generating 3D molecular structures conditioned on a receptor binding site. We approach the problem using a conditional variational autoencoder trained on an atomic density grid representation of cross-docked protein-ligand structures. We apply atom fitting and bond inference procedures to construct valid molecular conformations from generated atomic densities. We evaluate the properties of the generated molecules and demonstrate that they change significantly when conditioned on mutated receptors. We also explore the latent space learned by our generative model using sampling and interpolation techniques. This work opens the door for end-to-end prediction of stable bioactive molecules from protein structures with deep learning.
Collapse
Affiliation(s)
- Matthew Ragoza
- Intelligent Systems Program, University of Pittsburgh Pittsburgh PA 15213 USA
| | - Tomohide Masuda
- Department of Computational and Systems Biology, University of Pittsburgh Pittsburgh PA 15213 USA
| | - David Ryan Koes
- Department of Computational and Systems Biology, University of Pittsburgh Pittsburgh PA 15213 USA
| |
Collapse
|
235
|
Ager Meldgaard S, Köhler J, Lund Mortensen H, Christiansen MPV, Noé F, Hammer B. Generating stable molecules using imitation and reinforcement learning. MACHINE LEARNING: SCIENCE AND TECHNOLOGY 2022. [DOI: 10.1088/2632-2153/ac3eb4] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022] Open
Abstract
Abstract
Chemical space is routinely explored by machine learning methods to discover interesting molecules, before time-consuming experimental synthesizing is attempted. However, these methods often rely on a graph representation, ignoring 3D information necessary for determining the stability of the molecules. We propose a reinforcement learning (RL) approach for generating molecules in Cartesian coordinates allowing for quantum chemical prediction of the stability. To improve sample-efficiency we learn basic chemical rules from imitation learning (IL) on the GDB-11 database to create an initial model applicable for all stoichiometries. We then deploy multiple copies of the model conditioned on a specific stoichiometry in a RL setting. The models correctly identify low energy molecules in the database and produce novel isomers not found in the training set. Finally, we apply the model to larger molecules to show how RL further refines the IL model in domains far from the training data.
Collapse
|
236
|
Shin B, Park S, Bak J, Ho JC. Controlled Molecule Generator for Optimizing Multiple Chemical Properties. ACM CHIL 2021 : PROCEEDINGS OF THE 2021 ACM CONFERENCE ON HEALTH, INFERENCE, AND LEARNING : APRIL 8-9, 2021, VIRTUAL EVENT. ACM CONFERENCE ON HEALTH, INFERENCE, AND LEARNING (2021 : ONLINE) 2022; 2021:146-153. [PMID: 35194593 DOI: 10.1145/3450439.3451879] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
Generating a novel and optimized molecule with desired chemical properties is an essential part of the drug discovery process. Failure to meet one of the required properties can frequently lead to failure in a clinical test which is costly. In addition, optimizing these multiple properties is a challenging task because the optimization of one property is prone to changing other properties. In this paper, we pose this multi-property optimization problem as a sequence translation process and propose a new optimized molecule generator model based on the Transformer with two constraint networks: property prediction and similarity prediction. We further improve the model by incorporating score predictions from these constraint networks in a modified beam search algorithm. The experiments demonstrate that our proposed model, Controlled Molecule Generator (CMG), outperforms state-of-the-art models by a significant margin for optimizing multiple properties simultaneously.
Collapse
|
237
|
Kaitoh K, Yamanishi Y. Scaffold-Retained Structure Generator to Exhaustively Create Molecules in an Arbitrary Chemical Space. J Chem Inf Model 2022; 62:2212-2225. [DOI: 10.1021/acs.jcim.1c01130] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
Affiliation(s)
- Kazuma Kaitoh
- Department of Bioscience and Bioinformatics, Faculty of Computer Science and Systems Engineering, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka, Fukuoka 820-8502, Japan
| | - Yoshihiro Yamanishi
- Department of Bioscience and Bioinformatics, Faculty of Computer Science and Systems Engineering, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka, Fukuoka 820-8502, Japan
| |
Collapse
|
238
|
Saldívar-González FI, Aldas-Bulos VD, Medina-Franco JL, Plisson F. Natural product drug discovery in the artificial intelligence era. Chem Sci 2022; 13:1526-1546. [PMID: 35282622 PMCID: PMC8827052 DOI: 10.1039/d1sc04471k] [Citation(s) in RCA: 50] [Impact Index Per Article: 25.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2021] [Accepted: 12/10/2021] [Indexed: 12/19/2022] Open
Abstract
Natural products (NPs) are primarily recognized as privileged structures to interact with protein drug targets. Their unique characteristics and structural diversity continue to marvel scientists for developing NP-inspired medicines, even though the pharmaceutical industry has largely given up. High-performance computer hardware, extensive storage, accessible software and affordable online education have democratized the use of artificial intelligence (AI) in many sectors and research areas. The last decades have introduced natural language processing and machine learning algorithms, two subfields of AI, to tackle NP drug discovery challenges and open up opportunities. In this article, we review and discuss the rational applications of AI approaches developed to assist in discovering bioactive NPs and capturing the molecular "patterns" of these privileged structures for combinatorial design or target selectivity.
Collapse
Affiliation(s)
- F I Saldívar-González
- DIFACQUIM Research Group, School of Chemistry, Department of Pharmacy, Universidad Nacional Autónoma de México Avenida Universidad 3000 04510 Mexico Mexico
| | - V D Aldas-Bulos
- Unidad de Genómica Avanzada, Laboratorio Nacional de Genómica para la Biodiversidad (Langebio), Centro de Investigación y de Estudios Avanzados del IPN Irapuato Guanajuato Mexico
| | - J L Medina-Franco
- DIFACQUIM Research Group, School of Chemistry, Department of Pharmacy, Universidad Nacional Autónoma de México Avenida Universidad 3000 04510 Mexico Mexico
| | - F Plisson
- CONACYT - Unidad de Genómica Avanzada, Laboratorio Nacional de Genómica para la Biodiversidad (Langebio), Centro de Investigación y de Estudios Avanzados del IPN Irapuato Guanajuato Mexico
| |
Collapse
|
239
|
Lin E, Lin CH, Lane HY. De Novo Peptide and Protein Design Using Generative Adversarial Networks: An Update. J Chem Inf Model 2022; 62:761-774. [DOI: 10.1021/acs.jcim.1c01361] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Affiliation(s)
- Eugene Lin
- Department of Biostatistics, University of Washington, Seattle, Washington 98195, United States
- Department of Electrical & Computer Engineering, University of Washington, Seattle, Washington 98195, United States
- Graduate Institute of Biomedical Sciences, China Medical University, Taichung 40402, Taiwan
| | - Chieh-Hsin Lin
- Graduate Institute of Biomedical Sciences, China Medical University, Taichung 40402, Taiwan
- Department of Psychiatry, Kaohsiung Chang Gung Memorial Hospital, Chang Gung University College of Medicine, Kaohsiung 83301, Taiwan
- School of Medicine, Chang Gung University, Taoyuan 33302, Taiwan
| | - Hsien-Yuan Lane
- Graduate Institute of Biomedical Sciences, China Medical University, Taichung 40402, Taiwan
- Department of Psychiatry, China Medical University Hospital, Taichung 40447, Taiwan
- Brain Disease Research Center, China Medical University Hospital, Taichung 40447, Taiwan
- Department of Psychology, College of Medical and Health Sciences, Asia University, Taichung 41354, Taiwan
| |
Collapse
|
240
|
Fan Y, Xia Y, Zhu J, Wu L, Xie S, Qin T. Back translation for molecule generation. Bioinformatics 2022; 38:1244-1251. [PMID: 34875015 DOI: 10.1093/bioinformatics/btab817] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2021] [Revised: 11/11/2021] [Accepted: 12/01/2021] [Indexed: 01/05/2023] Open
Abstract
MOTIVATION Molecule generation, which is to generate new molecules, is an important problem in bioinformatics. Typical tasks include generating molecules with given properties, molecular property improvement (i.e. improving specific properties of an input molecule), retrosynthesis (i.e. predicting the molecules that can be used to synthesize a target molecule), etc. Recently, deep-learning-based methods received more attention for molecule generation. The labeled data of bioinformatics is usually costly to obtain, but there are millions of unlabeled molecules. Inspired by the success of sequence generation in natural language processing with unlabeled data, we would like to explore an effective way of using unlabeled molecules for molecule generation. RESULTS We propose a new method, back translation for molecule generation, which is a simple yet effective semisupervised method. Let X be the source domain, which is the collection of properties, the molecules to be optimized, etc. Let Y be the target domain which is the collection of molecules. In particular, given a main task which is about to learn a mapping from the source domain X to the target domain Y, we first train a reversed model g for the Y to X mapping. After that, we use g to back translate the unlabeled data in Y to X and obtain more synthetic data. Finally, we combine the synthetic data with the labeled data and train a model for the main task. We conduct experiments on molecular property improvement and retrosynthesis, and we achieve state-of-the-art results on four molecule generation tasks and one retrosynthesis benchmark, USPTO-50k. AVAILABILITY AND IMPLEMENTATION Our code and data are available at https://github.com/fyabc/BT4MolGen. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Yang Fan
- University of Science and Technology of China, Hefei, Anhui 230027, China
| | - Yingce Xia
- Microsoft Research, Beijing 100080, China
| | - Jinhua Zhu
- University of Science and Technology of China, Hefei, Anhui 230027, China
| | - Lijun Wu
- Microsoft Research, Beijing 100080, China
| | | | - Tao Qin
- Microsoft Research, Beijing 100080, China
| |
Collapse
|
241
|
Xu Z, Su C, Xiao Y, Wang F. Artificial intelligence for COVID-19: battling the pandemic with computational intelligence. INTELLIGENT MEDICINE 2022; 2:13-29. [PMID: 34697578 PMCID: PMC8529224 DOI: 10.1016/j.imed.2021.09.001] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/20/2021] [Revised: 09/15/2021] [Accepted: 09/29/2021] [Indexed: 12/15/2022]
Abstract
The new coronavirus disease 2019 (COVID-19) has become a global pandemic leading to over 180 million confirmed cases and nearly 4 million deaths until June 2021, according to the World Health Organization. Since the initial report in December 2019 , COVID-19 has demonstrated a high transmission rate (with an R0 > 2), a diverse set of clinical characteristics (e.g., high rate of hospital and intensive care unit admission rates, multi-organ dysfunction for critically ill patients due to hyperinflammation, thrombosis, etc.), and a tremendous burden on health care systems around the world. To understand the serious and complex diseases and develop effective control, treatment, and prevention strategies, researchers from different disciplines have been making significant efforts from different aspects including epidemiology and public health, biology and genomic medicine, as well as clinical care and patient management. In recent years, artificial intelligence (AI) has been introduced into the healthcare field to aid clinical decision-making for disease diagnosis and treatment such as detecting cancer based on medical images, and has achieved superior performance in multiple data-rich application scenarios. In the COVID-19 pandemic, AI techniques have also been used as a powerful tool to overcome the complex diseases. In this context, the goal of this study is to review existing studies on applications of AI techniques in combating the COVID-19 pandemic. Specifically, these efforts can be grouped into the fields of epidemiology, therapeutics, clinical research, social and behavioral studies and are summarized. Potential challenges, directions, and open questions are discussed accordingly, which may provide new insights into addressing the COVID-19 pandemic and would be helpful for researchers to explore more related topics in the post-pandemic era.
Collapse
Affiliation(s)
- Zhenxing Xu
- Department of Population Health Sciences, Weill Cornell Medicine, Cornell University, New York 10065, United States
| | - Chang Su
- Department of Health Service Administration and Policy, Temple University, Philadelphia 19122, United States
| | - Yunyu Xiao
- Department of Population Health Sciences, Weill Cornell Medicine, Cornell University, New York 10065, United States
| | - Fei Wang
- Department of Population Health Sciences, Weill Cornell Medicine, Cornell University, New York 10065, United States
| |
Collapse
|
242
|
Oguike OE, Ugwuishiwu CH, Asogwa CN, Nnadi CO, Obonga WO, Attama AA. Systematic review on the application of machine learning to quantitative structure-activity relationship modeling against Plasmodium falciparum. Mol Divers 2022; 26:3447-3462. [PMID: 35064444 PMCID: PMC8782692 DOI: 10.1007/s11030-022-10380-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2021] [Accepted: 01/07/2022] [Indexed: 11/29/2022]
Abstract
Malaria accounts for over two million deaths globally. To flatten this curve, there is a need to develop new and high potent drugs against Plasmodium falciparum. Some major challenges include the dearth of suitable animal models for anti-P. falciparum assays, resistance to first-line drugs, lack of vaccines and the complex life cycle of Plasmodium. Gladly, newer approaches to antimalarial drug discovery have emerged due to the release of large datasets by pharmaceutical companies. This review provides insights into these new approaches to drug discovery covering different machine learning tools, which enhance the development of new compounds. It provides a systematic review on the use and prospects of machine learning in predicting, classifying and clustering IC50 values of bioactive compounds against P. falciparum. The authors identified many machine learning tools yet to be applied for this purpose. However, Random Forest and Support Vector Machines have been extensively applied though on a limited dataset of compounds.
Collapse
Affiliation(s)
- Osondu Everestus Oguike
- Machine Learning Research Group, University of Nigeria, Nsukka, 410001, Enugu State, Nigeria.,Department of Computer Science, Faculty of Physical Sciences, University of Nigeria, Nsukka, 410001, Enugu State, Nigeria
| | - Chikodili Helen Ugwuishiwu
- Machine Learning Research Group, University of Nigeria, Nsukka, 410001, Enugu State, Nigeria.,Department of Computer Science, Faculty of Physical Sciences, University of Nigeria, Nsukka, 410001, Enugu State, Nigeria
| | - Caroline Ngozi Asogwa
- Machine Learning Research Group, University of Nigeria, Nsukka, 410001, Enugu State, Nigeria.,Department of Computer Science, Faculty of Physical Sciences, University of Nigeria, Nsukka, 410001, Enugu State, Nigeria
| | - Charles Okeke Nnadi
- Machine Learning Research Group, University of Nigeria, Nsukka, 410001, Enugu State, Nigeria. .,Deprtment of Pharmaceutical and Medicinal Chemistry, Faculty of Pharmaceutical Sciences, University of Nigeria, Nsukka, 410001, Enugu State, Nigeria.
| | - Wilfred Ofem Obonga
- Machine Learning Research Group, University of Nigeria, Nsukka, 410001, Enugu State, Nigeria.,Deprtment of Pharmaceutical and Medicinal Chemistry, Faculty of Pharmaceutical Sciences, University of Nigeria, Nsukka, 410001, Enugu State, Nigeria
| | - Anthony Amaechi Attama
- Machine Learning Research Group, University of Nigeria, Nsukka, 410001, Enugu State, Nigeria.,Department of Pharmaceutics, Faculty of Pharmaceutical Sciences, University of Nigeria, Nsukka, 410001, Enugu State, Nigeria
| |
Collapse
|
243
|
Kwak HS, An Y, Giesen DJ, Hughes TF, Brown CT, Leswing K, Abroshan H, Halls MD. Design of Organic Electronic Materials With a Goal-Directed Generative Model Powered by Deep Neural Networks and High-Throughput Molecular Simulations. Front Chem 2022; 9:800370. [PMID: 35111730 PMCID: PMC8802168 DOI: 10.3389/fchem.2021.800370] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2021] [Accepted: 12/15/2021] [Indexed: 11/16/2022] Open
Abstract
In recent years, generative machine learning approaches have attracted significant attention as an enabling approach for designing novel molecular materials with minimal design bias and thereby realizing more directed design for a specific materials property space. Further, data-driven approaches have emerged as a new tool to accelerate the development of novel organic electronic materials for organic light-emitting diode (OLED) applications. We demonstrate and validate a goal-directed generative machine learning framework based on a recurrent neural network (RNN) deep reinforcement learning approach for the design of hole transporting OLED materials. These large-scale molecular simulations also demonstrate a rapid, cost-effective method to identify new materials in OLEDs while also enabling expansion into many other verticals such as catalyst design, aerospace, life science, and petrochemicals.
Collapse
Affiliation(s)
- H. Shaun Kwak
- Schrödinger, Inc., Portland, OR, United States
- *Correspondence: H. Shaun Kwak, ; Yuling An,
| | - Yuling An
- Schrödinger, Inc., New York, NY, United States
- *Correspondence: H. Shaun Kwak, ; Yuling An,
| | | | | | | | | | | | | |
Collapse
|
244
|
Tavakoli M, Mood A, Van Vranken D, Baldi P. Quantum Mechanics and Machine Learning Synergies: Graph Attention Neural Networks to Predict Chemical Reactivity. J Chem Inf Model 2022; 62:2121-2132. [PMID: 35020394 DOI: 10.1021/acs.jcim.1c01400] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
There is a lack of scalable quantitative measures of reactivity that cover the full range of functional groups in organic chemistry, ranging from highly unreactive C-C bonds to highly reactive naked ions. Measuring reactivity experimentally is costly and time-consuming, and no single method has sufficient dynamic range to cover the astronomical size of chemical reactivity space. In previous quantum chemistry studies, we have introduced Methyl Cation Affinities (MCA*) and Methyl Anion Affinities (MAA*), using a solvation model, as quantitative measures of reactivity for organic functional groups over the broadest range. Although MCA* and MAA* offer good estimates of reactivity parameters, their calculation through Density Functional Theory (DFT) simulations is time-consuming. To circumvent this problem, we first use DFT to calculate MCA* and MAA* for more than 2,400 organic molecules thereby establishing a large data set of chemical reactivity scores. We then design deep learning methods to predict the reactivity of molecular structures and train them using this curated data set in combination with different representations of molecular structures. Using 10-fold cross-validation, we show that graph attention neural networks applied to a relational model of molecular structures produce the most accurate estimates of reactivity, achieving over 91% test accuracy for predicting the MCA* ± 3.0 or MAA* ± 3.0, over 50 orders of magnitude. Finally, we demonstrate the application of these reactivity scores to two tasks: (1) chemical reaction prediction and (2) combinatorial generation of reaction mechanisms. The curated data sets of MCA* and MAA* scores is available through the ChemDB chemoinformatics web portal at cdb.ics.uci.edu under Chemical Reactivities data sets.
Collapse
Affiliation(s)
- Mohammadamin Tavakoli
- Department of Computer Science, University of California, Irvine, Irvine, California 92697, United States
| | - Aaron Mood
- Department of Chemistry, University of California, Irvine, Irvine, California 92697, United States
| | - David Van Vranken
- Department of Chemistry, University of California, Irvine, Irvine, California 92697, United States
| | - Pierre Baldi
- Department of Computer Science, University of California, Irvine, Irvine, California 92697, United States
| |
Collapse
|
245
|
Abstract
Abstract
Machine learning (ML) has revolutionised the field of structure-based drug design (SBDD) in recent years. During the training stage, ML techniques typically analyse large amounts of experimentally determined data to create predictive models in order to inform the drug discovery process. Deep learning (DL) is a subfield of ML, that relies on multiple layers of a neural network to extract significantly more complex patterns from experimental data, and has recently become a popular choice in SBDD. This review provides a thorough summary of the recent DL trends in SBDD with a particular focus on de novo drug design, binding site prediction, and binding affinity prediction of small molecules.
Collapse
|
246
|
Żurański AM, Wang JY, Shields BJ, Doyle AG. Auto-QChem: an automated workflow for the generation and storage of DFT calculations for organic molecules. REACT CHEM ENG 2022. [DOI: 10.1039/d2re00030j] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/15/2023]
Abstract
This perspective describes Auto-QChem, an automatic, high-throughput and end-to-end DFT calculation workflow that computes chemical descriptors for organic molecules.
Collapse
Affiliation(s)
| | - Jason Y. Wang
- Department of Chemistry, Princeton University, Princeton, NJ 08544, USA
- Department of Chemistry & Biochemistry, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Benjamin J. Shields
- Department of Chemistry, Princeton University, Princeton, NJ 08544, USA
- Bristol Myers Squibb, Cambridge, MA 02142, USA
| | - Abigail G. Doyle
- Department of Chemistry, Princeton University, Princeton, NJ 08544, USA
- Department of Chemistry & Biochemistry, University of California, Los Angeles, Los Angeles, CA 90095, USA
| |
Collapse
|
247
|
Santana MV, Silva-Jr FP. Artificial intelligence methods to repurpose and discover new drugs to fight the Coronavirus disease-2019 pandemic. COMPUTATIONAL APPROACHES FOR NOVEL THERAPEUTIC AND DIAGNOSTIC DESIGNING TO MITIGATE SARS-COV-2 INFECTION 2022. [PMCID: PMC9300478 DOI: 10.1016/b978-0-323-91172-6.00016-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
The Coronavirus disease 2019 pandemic struck the world at the end of 2019 and, as of 2021, there are no specific drugs available against the causative agent, the severe acute respiratory syndrome-Coronavirus-2 (SARS-CoV-2). From the onset of the pandemic, researchers have been trying to find drugs among the current therapeutic arsenal that could target crucial viral function, and many of these efforts resulted in clinical trials to repurpose a drug for this new indication. In this scenario, artificial intelligence (AI) is of fundamental importance, allowing academia and pharmaceutical companies to accelerate the discovery of biochemical insights from the chemical and biological information available in literature databases. This chapter will cover some AI methods that are being explored to repurpose drugs against SARS-CoV-2. It will be outlined how these methods work followed by a discussion of selected examples applying them to identify promising drugs.
Collapse
|
248
|
Artificial Intelligence in Medicine: Biochemical 3D Modeling and Drug Discovery. Artif Intell Med 2022. [DOI: 10.1007/978-3-030-64573-1_318] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
249
|
Bilodeau C, Jin W, Xu H, Emerson JA, Mukhopadhyay S, Kalantar TH, Jaakkola T, Barzilay R, Jensen KF. Generating molecules with optimized aqueous solubility using iterative graph translation. REACT CHEM ENG 2022. [DOI: 10.1039/d1re00315a] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
We present a generative modeling framework that can be used to discover new, optimal molecules. Our method involves iteratively 1) training a translation model, and 2) translating all molecules in the training dataset.
Collapse
Affiliation(s)
- Camille Bilodeau
- Department of Chemical Engineering, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, MA 02139, USA
| | - Wengong Jin
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - Hongyun Xu
- Dow Chemical Company, Midland, MI 48674, USA
| | | | | | | | - Tommi Jaakkola
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - Regina Barzilay
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - Klavs F. Jensen
- Department of Chemical Engineering, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, MA 02139, USA
| |
Collapse
|
250
|
Abstract
Artificial intelligence (AI) tools find increasing application in drug discovery supporting every stage of the Design-Make-Test-Analyse (DMTA) cycle. The main focus of this chapter is the application in molecular generation with the aid of deep neural networks (DNN). We present a historical overview of the main advances in the field. We analyze the concepts of distribution and goal-directed learning and then highlight some of the recent applications of generative models in drug design with a focus into research work from the biopharmaceutical industry. We present in some more detail REINVENT which is an open-source software developed within our group in AstraZeneca and the main platform for AI molecular design support for a number of medicinal chemistry projects in the company and we also demonstrate some of our work in library design. Finally, we present some of the main challenges in the application of AI in Drug Discovery and different approaches to respond to these challenges which define areas for current and future work.
Collapse
|