1
|
Talevi A. Computer-Aided Drug Discovery and Design: Recent Advances and Future Prospects. Methods Mol Biol 2024; 2714:1-20. [PMID: 37676590 DOI: 10.1007/978-1-0716-3441-7_1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/08/2023]
Abstract
Computer-aided drug discovery and design involve the use of information technologies to identify and develop, on a rational ground, chemical compounds that align a set of desired physicochemical and biological properties. In its most common form, it involves the identification and/or modification of an active scaffold (or the combination of known active scaffolds), although de novo drug design from scratch is also possible. Traditionally, the drug discovery and design processes have focused on the molecular determinants of the interactions between drug candidates and their known or intended pharmacological target(s). Nevertheless, in modern times, drug discovery and design are conceived as a particularly complex multiparameter optimization task, due to the complicated, often conflicting, property requirements.This chapter provides an updated overview of in silico approaches for identifying active scaffolds and guiding the subsequent optimization process. Recent groundbreaking advances in the field have also analyzed the integration of state-of-the-art machine learning approaches in every step of the drug discovery process (from prediction of target structure to customized molecular docking scoring functions), integration of multilevel omics data, and the use of a diversity of computational approaches to assist target validation and assess plausible binding pockets.
Collapse
Affiliation(s)
- Alan Talevi
- Laboratory of Bioactive Compound Research and Development (LIDeB), Faculty of Exact Sciences, National University of La Plata (UNLP), La Plata, Argentina.
- Argentinean National Council of Scientific and Technical Research (CONICET), La Plata, Argentina.
| |
Collapse
|
2
|
Zou J, Yu J, Hu P, Zhao L, Shi S. STAGAN: An approach for improve the stability of molecular graph generation based on generative adversarial networks. Comput Biol Med 2023; 167:107691. [PMID: 37976819 DOI: 10.1016/j.compbiomed.2023.107691] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2022] [Revised: 09/18/2023] [Accepted: 11/06/2023] [Indexed: 11/19/2023]
Abstract
With the wide application of deep learning in Drug Discovery, deep generative model has shown its advantages in drug molecular generation. Generative adversarial networks can be used to learn the internal structure of molecules, but the training process may be unstable, such as gradient disappearance and model collapse, which may lead to the generation of molecules that do not conform to chemical rules or a single style. In this paper, a novel method called STAGAN was proposed to solve the difficulty of model training, by adding a new gradient penalty term in the discriminator and designing a parallel layer of batch normalization used in generator. As an illustration of method, STAGAN generated higher valid and unique molecules than previous models in training datasets from QM9 and ZINC-250K. This indicates that the proposed method can effectively solve the instability problem in the model training process, and can provide more instructive guidance for the further study of molecular graph generation.
Collapse
Affiliation(s)
- Jinping Zou
- Department of Mathematics, School of Mathematics and Computer Sciences, Nanchang University, Nanchang, 330031, China; Institute of Mathematics and Interdisciplinary Sciences, Nanchang University, Nanchang, 330031, China
| | - Jialin Yu
- Department of Mathematics, School of Mathematics and Computer Sciences, Nanchang University, Nanchang, 330031, China; Institute of Mathematics and Interdisciplinary Sciences, Nanchang University, Nanchang, 330031, China
| | - Pengwei Hu
- Department of Mathematics, School of Mathematics and Computer Sciences, Nanchang University, Nanchang, 330031, China; Institute of Mathematics and Interdisciplinary Sciences, Nanchang University, Nanchang, 330031, China
| | - Long Zhao
- Department of Mathematics, School of Mathematics and Computer Sciences, Nanchang University, Nanchang, 330031, China; Institute of Mathematics and Interdisciplinary Sciences, Nanchang University, Nanchang, 330031, China
| | - Shaoping Shi
- Department of Mathematics, School of Mathematics and Computer Sciences, Nanchang University, Nanchang, 330031, China; Institute of Mathematics and Interdisciplinary Sciences, Nanchang University, Nanchang, 330031, China.
| |
Collapse
|
3
|
Cremer J, Medrano Sandonas L, Tkatchenko A, Clevert DA, De Fabritiis G. Equivariant Graph Neural Networks for Toxicity Prediction. Chem Res Toxicol 2023; 36. [PMID: 37690056 PMCID: PMC10583285 DOI: 10.1021/acs.chemrestox.3c00032] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2023] [Indexed: 09/12/2023]
Abstract
Predictive modeling of toxicity is a crucial step in the drug discovery pipeline. It can help filter out molecules with a high probability of failing in the early stages of de novo drug design. Thus, several machine learning (ML) models have been developed to predict the toxicity of molecules by combining classical ML techniques or deep neural networks with well-known molecular representations such as fingerprints or 2D graphs. But the more natural, accurate representation of molecules is expected to be defined in physical 3D space like in ab initio methods. Recent studies successfully used equivariant graph neural networks (EGNNs) for representation learning based on 3D structures to predict quantum-mechanical properties of molecules. Inspired by this, we investigated the performance of EGNNs to construct reliable ML models for toxicity prediction. We used the equivariant transformer (ET) model in TorchMD-NET for this. Eleven toxicity data sets taken from MoleculeNet, TDCommons, and ToxBenchmark have been considered to evaluate the capability of ET for toxicity prediction. Our results show that ET adequately learns 3D representations of molecules that can successfully correlate with toxicity activity, achieving good accuracies on most data sets comparable to state-of-the-art models. We also test a physicochemical property, namely, the total energy of a molecule, to inform the toxicity prediction with a physical prior. However, our work suggests that these two properties can not be related. We also provide an attention weight analysis for helping to understand the toxicity prediction in 3D space and thus increase the explainability of the ML model. In summary, our findings offer promising insights considering 3D geometry information via EGNNs and provide a straightforward way to integrate molecular conformers into ML-based pipelines for predicting and investigating toxicity prediction in physical space. We expect that in the future, especially for larger, more diverse data sets, EGNNs will be an essential tool in this domain.
Collapse
Affiliation(s)
- Julian Cremer
- Computational
Science Laboratory, Universitat Pompeu Fabra,
Barcelona Biomedical Research Park (PRBB), Carrer Dr. Aiguader 88, 08003 Barcelona, Spain
- Machine
Learning Research, Pfizer Worldwide Research
Development and Medical, Linkstr. 10, 10785 Berlin, Germany
| | - Leonardo Medrano Sandonas
- Department
of Physics and Materials Science, University
of Luxembourg, L-1511 Luxembourg City, Luxembourg
| | - Alexandre Tkatchenko
- Department
of Physics and Materials Science, University
of Luxembourg, L-1511 Luxembourg City, Luxembourg
| | - Djork-Arné Clevert
- Machine
Learning Research, Pfizer Worldwide Research
Development and Medical, Linkstr. 10, 10785 Berlin, Germany
| | - Gianni De Fabritiis
- Computational
Science Laboratory, Universitat Pompeu Fabra,
Barcelona Biomedical Research Park (PRBB), Carrer Dr. Aiguader 88, 08003 Barcelona, Spain
- ICREA, Passeig Lluis Companys 23, 08010 Barcelona, Spain
| |
Collapse
|
4
|
Yamada M, Sugiyama M. Molecular Graph Generation by Decomposition and Reassembling. ACS OMEGA 2023; 8:19575-19586. [PMID: 37305268 PMCID: PMC10249382 DOI: 10.1021/acsomega.3c01078] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/17/2023] [Accepted: 04/10/2023] [Indexed: 06/13/2023]
Abstract
Designing molecular structures with desired chemical properties is an essential task in drug discovery and materials design. However, finding molecules with the optimized desired properties is still a challenging task due to combinatorial explosion of the candidate space of molecules. Here we propose a novel decomposition-and-reassembling-based approach, which does not include any optimization in hidden space, and our generation process is highly interpretable. Our method is a two-step procedure: In the first decomposition step, we apply frequent subgraph mining to a molecular database to collect a smaller size of subgraphs as building blocks of molecules. In the second reassembling step, we search desirable building blocks guided via reinforcement learning and combine them to generate new molecules. Our experiments show that our method not only can find better molecules in terms of two standard criteria, the penalized log P and druglikeness, but also can generate drug molecules showing the valid intermediate molecules.
Collapse
Affiliation(s)
- Masatsugu Yamada
- School
of Multidisciplinary Sciences, Department of Informatics, The Graduate University for Advanced Studies, SOKENDAI, Kanagawa 240-0115, Japan
- National
Institute of Informatics, Chiyoda-ku, Tokyo 101-8430, Japan
- Innovative
Technology Laboratories, AGC Inc., 230-0045 Kanagawa, Japan
| | - Mahito Sugiyama
- School
of Multidisciplinary Sciences, Department of Informatics, The Graduate University for Advanced Studies, SOKENDAI, Kanagawa 240-0115, Japan
- National
Institute of Informatics, Chiyoda-ku, Tokyo 101-8430, Japan
| |
Collapse
|
5
|
Zhu J, Azam NA, Zhang F, Shurbevski A, Haraguchi K, Zhao L, Nagamochi H, Akutsu T. A Novel Method for Inferring Chemical Compounds With Prescribed Topological Substructures Based on Integer Programming. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:3233-3245. [PMID: 34520360 DOI: 10.1109/tcbb.2021.3112598] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Drug discovery is one of the major goals of computational biology and bioinformatics. A novel framework has recently been proposed for the design of chemical graphs using both artificial neural networks (ANNs) and mixed integer linear programming (MILP). This method consists of a prediction phase and an inverse prediction phase. In the first phase, an ANN is trained using data on existing chemical compounds. In the second phase, given a target chemical property, a feature vector is inferred by solving an MILP formulated from the trained ANN and then a set of chemical structures is enumerated by a graph enumeration algorithm. Although exact solutions are guaranteed by this framework, the types of chemical graphs have been restricted to such classes as trees, monocyclic graphs, and graphs with a specified polymer topology with cycle index up to 2. To overcome the limitation on the topological structure, we propose a new flexible modeling method to the framework so that we can specify a topological substructure of graphs and a partial assignment of chemical elements and bond-multiplicity to a target graph. The results of computational experiments suggest that the proposed system can infer chemical graphs with around up to 50 non-hydrogen atoms.
Collapse
|
6
|
Yang J, Cai Y, Zhao K, Xie H, Chen X. Concepts and applications of chemical fingerprint for hit and lead screening. Drug Discov Today 2022; 27:103356. [PMID: 36113834 DOI: 10.1016/j.drudis.2022.103356] [Citation(s) in RCA: 17] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2021] [Revised: 07/28/2022] [Accepted: 09/08/2022] [Indexed: 11/22/2022]
Abstract
Molecular fingerprints are used to represent chemical (structural, physicochemical, etc.) properties of large-scale chemical sets in a low computational cost way. They have a prominent role in transforming chemical data sets into consistent input formats (bit strings or numeric values) suitable for in silico approaches. In this review, we summarize and classify common and state-of-the-art fingerprints into eight different types (dictionary based, circular, topological, pharmacophore, protein-ligand interaction, shape based, reinforced, and multi). We also highlight applications of fingerprints in early drug research and development (R&D). Thus, this review provides a guide for the selection of appropriate fingerprints of compounds (or ligand-protein complexes) for use in drug R&D.
Collapse
Affiliation(s)
- Jingbo Yang
- Department of Pharmagenomics, College of Bioinformatics Science and Technology, Harbin Medical University, 150081 Harbin, Heilongjiang, China
| | - Yiyang Cai
- Department of Pharmagenomics, College of Bioinformatics Science and Technology, Harbin Medical University, 150081 Harbin, Heilongjiang, China
| | - Kairui Zhao
- Department of Pharmagenomics, College of Bioinformatics Science and Technology, Harbin Medical University, 150081 Harbin, Heilongjiang, China
| | - Hongbo Xie
- Department of Pharmagenomics, College of Bioinformatics Science and Technology, Harbin Medical University, 150081 Harbin, Heilongjiang, China.
| | - Xiujie Chen
- Department of Pharmagenomics, College of Bioinformatics Science and Technology, Harbin Medical University, 150081 Harbin, Heilongjiang, China.
| |
Collapse
|
7
|
Kumar R, Sharma A, Alexiou A, Ashraf GM. Artificial Intelligence in De novo Drug Design: Are We Still There? Curr Top Med Chem 2022; 22:2483-2492. [PMID: 36263480 DOI: 10.2174/1568026623666221017143244] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2022] [Revised: 09/06/2022] [Accepted: 09/15/2022] [Indexed: 01/20/2023]
Abstract
BACKGROUND The artificial intelligence (AI)-assisted design of drug candidates with novel structures and desired properties has received significant attention in the recent past, so related areas of forward prediction that aim to discover chemical matters worth synthesizing and further experimental investigation. OBJECTIVES The purpose behind developing AI-driven models is to explore the broader chemical space and suggest new drug candidate scaffolds with promising therapeutic value. Moreover, it is anticipated that such AI-based models may not only significantly reduce the cost and time but also decrease the attrition rate of drug candidates that fail to reach the desirable endpoints at the final stages of drug development. In an attempt to develop AI-based models for de novo drug design, numerous methods have been proposed by various study groups by applying machine learning and deep learning algorithms to chemical datasets. However, there are many challenges in obtaining accurate predictions, and real breakthroughs in de novo drug design are still scarce. METHODS In this review, we explore the recent trends in developing AI-based models for de novo drug design to assess the current status, challenges, and opportunities in the field. CONCLUSION The consistently improved AI algorithms and the abundance of curated training chemical data indicate that AI-based de novo drug design should perform better than the current models. Improvements in the performance are warranted to obtain better outcomes in the form of potential drug candidates, which can perform well in in vivo conditions, especially in the case of more complex diseases.
Collapse
Affiliation(s)
- Rajnish Kumar
- Amity Institute of Biotechnology, Amity University Uttar Pradesh Lucknow Campus, Uttar Pradesh, India
| | - Anju Sharma
- Department of Applied Science, Indian Institute of Information Technology, Allahabad, Uttar Pradesh, India
| | - Athanasios Alexiou
- Novel Global Community Educational Foundation, Hebersham, 2770 NSW, Australia.,AFNP Med Austria, 1010 Wien, Austria
| | - Ghulam Md Ashraf
- Pre-Clinical Research Unit (PCRU), King Fahd Medical Research Center, King Abdulaziz University, Jeddah, Saudi Arabia.,Department of Medical Laboratory Technology, Faculty of Applied Medical Sciences, King Abdulaziz University, Jeddah, Saudi Arabia
| |
Collapse
|
8
|
D'Souza S, Kv P, Balaji S. Training recurrent neural networks as generative neural networks for molecular structures: how does it impact drug discovery? Expert Opin Drug Discov 2022; 17:1071-1079. [PMID: 36216812 DOI: 10.1080/17460441.2023.2134340] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
Abstract
INTRODUCTION Deep learning approaches have become popular in recent years in de novo drug design. Generative models for molecule generation and optimization have shown promising results. Molecules trained on different chemical data could regenerate molecules that were similar to the query molecule, thus supporting lead optimization. Recurrent neural network-based generative models have demonstrated application in low-data drug discovery, fragment-based drug design and in lead optimization. AREAS COVERED In this review, we have provided an overview of recurrent neural network models and their variants for molecule generation with recent examples. The input representation of molecules as SMILES and molecular graphs have been discussed. The evaluation benchmarks and metrics used in generative neural network models are also highlighted. For this, ScienceDirect, Web of Science, and Google Scholar databases were searched with the article's keywords and their combinations to retrieve the most relevant and up-to-date information. EXPERT OPINION The simplicity of SMILES notation makes it suitable for training a sequence-based model such as a recurrent neural network. However, models that could be trained on molecular graphs to generate molecular structures which could be synthesized could open new possibility for valid molecule generation and synthetic feasibility.
Collapse
Affiliation(s)
- Sofia D'Souza
- Department of Computer Science and Engineering, Manipal Institute of Technology, MAHE, Manipal, India
| | - Prema Kv
- Department of Computer Science and Engineering, Manipal Institute of Technology, MAHE, Manipal, India
| | - Seetharaman Balaji
- Department of Computer Science and Engineering, Manipal Institute of Technology, MAHE, Manipal, India
| |
Collapse
|
9
|
Wang Y, Michael S, Yang SM, Huang R, Cruz-Gutierrez K, Zhang Y, Zhao J, Xia M, Shinn P, Sun H. Retro Drug Design: From Target Properties to Molecular Structures. J Chem Inf Model 2022; 62:2659-2669. [PMID: 35653613 PMCID: PMC9198977 DOI: 10.1021/acs.jcim.2c00123] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023]
Abstract
![]()
To
deliver more therapeutics to more patients more quickly and
economically is the ultimate goal of pharmaceutical researchers. The
advent and rapid development of artificial intelligence (AI), in combination
with other powerful computational methods in drug discovery, makes
this goal more practical than ever before. Here, we describe a new
strategy, retro drug design, or RDD, to create novel small-molecule
drugs from scratch to meet multiple predefined requirements, including
biological activity against a drug target and optimal range of physicochemical
and ADMET properties. The molecular structure was represented by an
atom typing based molecular descriptor system, optATP, which was further
transformed to the space of loading vectors from principal component
analysis. Traditional predictive models were trained over experimental
data for the target properties using optATP and shallow machine learning
methods. The Monte Carlo sampling algorithm was then utilized to find
the solutions in the space of loading vectors that have the target
properties. Finally, a deep learning model was employed to decode
molecular structures from the solutions. To test the feasibility of
the algorithm, we challenged RDD to generate novel kinase inhibitors
from random numbers with five different ADMET properties optimized
at the same time. The best Tanimoto similarity score between the generated
valid structures and the available 4,314 kinase inhibitors was <
0.50, indicating a high extent of novelty of the generated compounds.
From the 3,040 structures that met all six target properties, 20 were
selected for synthesis and experimental measurement of inhibition
activity over 97 representative kinases and the ADMET properties.
Fifteen and eight compounds were determined to be hits or strong hits,
respectively. Five of the six strong kinase inhibitors have excellent
experimental ADMET properties. The results presented in this paper
illustrate that RDD has the potential to significantly improve the
current drug discovery process.
Collapse
Affiliation(s)
- Yuhong Wang
- National Center for Advancing Translational Sciences (NCATS), 9800 Medical Center Drive, Rockville, Maryland 20850, United States
| | - Sam Michael
- National Center for Advancing Translational Sciences (NCATS), 9800 Medical Center Drive, Rockville, Maryland 20850, United States
| | - Shyh-Ming Yang
- National Center for Advancing Translational Sciences (NCATS), 9800 Medical Center Drive, Rockville, Maryland 20850, United States
| | - Ruili Huang
- National Center for Advancing Translational Sciences (NCATS), 9800 Medical Center Drive, Rockville, Maryland 20850, United States
| | - Kennie Cruz-Gutierrez
- National Center for Advancing Translational Sciences (NCATS), 9800 Medical Center Drive, Rockville, Maryland 20850, United States
| | - Yaqing Zhang
- National Center for Advancing Translational Sciences (NCATS), 9800 Medical Center Drive, Rockville, Maryland 20850, United States
| | - Jinghua Zhao
- National Center for Advancing Translational Sciences (NCATS), 9800 Medical Center Drive, Rockville, Maryland 20850, United States
| | - Menghang Xia
- National Center for Advancing Translational Sciences (NCATS), 9800 Medical Center Drive, Rockville, Maryland 20850, United States
| | - Paul Shinn
- National Center for Advancing Translational Sciences (NCATS), 9800 Medical Center Drive, Rockville, Maryland 20850, United States
| | - Hongmao Sun
- National Center for Advancing Translational Sciences (NCATS), 9800 Medical Center Drive, Rockville, Maryland 20850, United States
| |
Collapse
|
10
|
Rodríguez-Pérez R, Miljković F, Bajorath J. Machine Learning in Chemoinformatics and Medicinal Chemistry. Annu Rev Biomed Data Sci 2022; 5:43-65. [PMID: 35440144 DOI: 10.1146/annurev-biodatasci-122120-124216] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
In chemoinformatics and medicinal chemistry, machine learning has evolved into an important approach. In recent years, increasing computational resources and new deep learning algorithms have put machine learning onto a new level, addressing previously unmet challenges in pharmaceutical research. In silico approaches for compound activity predictions, de novo design, and reaction modeling have been further advanced by new algorithmic developments and the emergence of big data in the field. Herein, novel applications of machine learning and deep learning in chemoinformatics and medicinal chemistry are reviewed. Opportunities and challenges for new methods and applications are discussed, placing emphasis on proper baseline comparisons, robust validation methodologies, and new applicability domains. Expected final online publication date for the Annual Review of Biomedical Data Science, Volume 5 is August 2022. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.
Collapse
Affiliation(s)
- Raquel Rodríguez-Pérez
- Department of Life Science Informatics, B-IT (Bonn-Aachen International Center for Information Technology), Chemical Biology and Medicinal Chemistry Program Unit, LIMES (Life and Medical Sciences Institute), Rheinische Friedrich-Wilhelms-Universität, Bonn, Germany; .,Current affiliation: Novartis Institutes for Biomedical Research, Novartis Campus, Basel, Switzerland
| | - Filip Miljković
- Department of Life Science Informatics, B-IT (Bonn-Aachen International Center for Information Technology), Chemical Biology and Medicinal Chemistry Program Unit, LIMES (Life and Medical Sciences Institute), Rheinische Friedrich-Wilhelms-Universität, Bonn, Germany; .,Current affiliation: Data Science and AI, Imaging and Data Analytics, Clinical Pharmacology and Safety Sciences, R&D AstraZeneca, Gothenburg, Sweden
| | - Jürgen Bajorath
- Department of Life Science Informatics, B-IT (Bonn-Aachen International Center for Information Technology), Chemical Biology and Medicinal Chemistry Program Unit, LIMES (Life and Medical Sciences Institute), Rheinische Friedrich-Wilhelms-Universität, Bonn, Germany;
| |
Collapse
|
11
|
A new approach to the design of acyclic chemical compounds using skeleton trees and integer linear programming. APPL INTELL 2022. [DOI: 10.1007/s10489-021-03088-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
AbstractIntelligent systems are applied in a wide range of areas, and computer-aided drug design is a highly important one. One major approach to drug design is the inverse QSAR/QSPR (quantitative structure-activity and structure-property relationship), for which a method that uses both artificial neural networks (ANN) and mixed integer linear programming (MILP) has been proposed recently. This method consists of two phases: a forward prediction phase, and an inverse, inference phase. In the prediction phase, a feature function f over chemical compounds is defined, whereby a chemical compound G is represented as a vector f(G) of descriptors. Following, for a given chemical property $$\pi$$, using a dataset of chemical compounds with known values for property $$\pi$$, a regressive prediction function $$\psi$$ is computed by an ANN. It is desired that $$\psi (f(G))$$ takes a value that is close to the true value of property $$\pi$$ for the compound G for many of the compounds in the dataset. In the inference phase, one starts with a target value $$y^*$$ of the chemical property $$\pi$$, and then a chemical structure $$G^*$$ such that $$\psi (f(G^*))$$ is within a certain tolerance level of $$y^*$$ is constructed from the solution to a specially formulated MILP. This method has been used for the case of inferring acyclic chemical compounds. With this paper, we propose a new concept on acyclic chemical graphs, called a skeleton tree, and based on it develop a new MILP formulation for inferring acyclic chemical compounds. Our computational experiments indicate that our newly proposed method significantly outperforms the existing method when the diameter of graphs is up to 8. In a particular example where we inferred acyclic chemical compounds with 38 non-hydrogen atoms from the set {C, O, S} times faster.
Collapse
|
12
|
Designing a multilayer film via machine learning of scientific literature. Sci Rep 2022; 12:930. [PMID: 35042971 PMCID: PMC8766440 DOI: 10.1038/s41598-022-05010-7] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2021] [Accepted: 01/04/2022] [Indexed: 12/23/2022] Open
Abstract
Scientists who design chemical substances often use materials informatics (MI), a data-driven approach with either computer simulation or artificial intelligence (AI). MI is a valuable technique, but applying it to layered structures is difficult. Most of the proposed computer-aided material search techniques use atomic or molecular simulations, which are limited to small areas. Some AI approaches have planned layered structures, but they require a physical theory or abundant experimental results. There is no universal design tool for multilayer films in MI. Here, we show a multilayer film can be designed through machine learning (ML) of experimental procedures extracted from chemical-coating articles. We converted material names according to International Union of Pure and Applied Chemistry rules and stored them in databases for each fabrication step without any physicochemical theory. Compared with experimental results which depend on authors, experimental protocol is superiority at almost unified and less data loss. Connecting scientific knowledge through ML enables us to predict untrained film structures. This suggests that AI imitates research activity, which is normally inspired by other scientific achievements and can thus be used as a general design technique.
Collapse
|
13
|
Wang M, Wang Z, Sun H, Wang J, Shen C, Weng G, Chai X, Li H, Cao D, Hou T. Deep learning approaches for de novo drug design: An overview. Curr Opin Struct Biol 2021; 72:135-144. [PMID: 34823138 DOI: 10.1016/j.sbi.2021.10.001] [Citation(s) in RCA: 39] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2021] [Revised: 08/28/2021] [Accepted: 10/10/2021] [Indexed: 01/01/2023]
Abstract
De novo drug design is the process of generating novel lead compounds with desirable pharmacological and physiochemical properties. The application of deep learning (DL) in de novo drug design has become a hot topic, and many DL-based approaches have been developed for molecular generation tasks. Generally, these approaches were developed as per four frameworks: recurrent neural networks; encoder-decoder; reinforcement learning; and generative adversarial networks. In this review, we first introduced the molecular representation and assessment metrics used in DL-based de novo drug design. Then, we summarized the features of each architecture. Finally, the potential challenges and future directions of DL-based molecular generation were prospected.
Collapse
Affiliation(s)
- Mingyang Wang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang 310058, PR China
| | - Zhe Wang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang 310058, PR China
| | - Huiyong Sun
- Department of Medicinal Chemistry, China Pharmaceutical University, Nanjing 210009, Jiangsu, PR China
| | - Jike Wang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang 310058, PR China
| | - Chao Shen
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang 310058, PR China
| | - Gaoqi Weng
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang 310058, PR China
| | - Xin Chai
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang 310058, PR China
| | - Honglin Li
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang 310058, PR China; Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science & Technology, Shanghai 200237, PR China.
| | - Dongsheng Cao
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha 410013, Hunan, PR China.
| | - Tingjun Hou
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang 310058, PR China.
| |
Collapse
|
14
|
Lv Q, Chen G, Zhao L, Zhong W, Yu-Chian Chen C. Mol2Context-vec: learning molecular representation from context awareness for drug discovery. Brief Bioinform 2021; 22:6357185. [PMID: 34428290 DOI: 10.1093/bib/bbab317] [Citation(s) in RCA: 21] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2021] [Revised: 07/15/2021] [Accepted: 07/21/2021] [Indexed: 11/14/2022] Open
Abstract
With the rapid development of proteomics and the rapid increase of target molecules for drug action, computer-aided drug design (CADD) has become a basic task in drug discovery. One of the key challenges in CADD is molecular representation. High-quality molecular expression with chemical intuition helps to promote many boundary problems of drug discovery. At present, molecular representation still faces several urgent problems, such as the polysemy of substructures and unsmooth information flow between atomic groups. In this research, we propose a deep contextualized Bi-LSTM architecture, Mol2Context-vec, which can integrate different levels of internal states to bring dynamic representations of molecular substructures. And the obtained molecular context representation can capture the interactions between any atomic groups, especially a pair of atomic groups that are topologically distant. Experiments show that Mol2Context-vec achieves state-of-the-art performance on multiple benchmark datasets. In addition, the visual interpretation of Mol2Context-vec is very close to the structural properties of chemical molecules as understood by humans. These advantages indicate that Mol2Context-vec can be used as a reliable and effective tool for molecular expression. Availability: The source code is available for download in https://github.com/lol88/Mol2Context-vec.
Collapse
Affiliation(s)
- Qiujie Lv
- School of Intelligent Systems Engineering, Sun Yat-sen University, Shenzhen, 510275, China
| | - Guanxing Chen
- School of Intelligent Systems Engineering, Sun Yat-sen University, Shenzhen, 510275, China
| | - Lu Zhao
- The Sixth Affiliated Hospital, Sun Yat-sen University, Guangzhou, 510655, China
| | - Weihe Zhong
- School of Intelligent Systems Engineering, Sun Yat-sen University, Shenzhen, 510275, China
| | - Calvin Yu-Chian Chen
- School of Intelligent Systems Engineering, Sun Yat-sen University, Shenzhen, 510275, China.,Department of Medical Research, China Medical University Hospital, Taichung 40447, Taiwan.,Department of Bioinformatics and Medical Engineering, Asia University, Taichung 41354, Taiwan
| |
Collapse
|
15
|
Kaitoh K, Yamanishi Y. TRIOMPHE: Transcriptome-Based Inference and Generation of Molecules with Desired Phenotypes by Machine Learning. J Chem Inf Model 2021; 61:4303-4320. [PMID: 34528432 DOI: 10.1021/acs.jcim.1c00967] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
One of the most challenging tasks in the drug-discovery process is the efficient identification of small molecules with desired phenotypes. In this study, we propose a novel computational method for omics-based de novo drug design, which we call TRIOMPHE (transcriptome-based inference and generation of molecules with desired phenotypes). We investigated the correlation between chemically induced transcriptome profiles (reflecting cellular responses to compound treatment) and genetically perturbed transcriptome profiles (reflecting cellular responses to gene knock-down or gene overexpression of target proteins) in terms of ligand-target interactions. Subsequently, we developed novel machine learning methods to generate the chemical structures of new molecules with desired transcriptome profiles in the framework of a variational autoencoder. The use of desired transcriptome profiles enables the automatic design of molecules that are likely to have bioactivities for target proteins of interest. We showed that our methods can generate chemically valid molecules that are likely to have biological activities on 10 target proteins; moreover, they can outperform previous methods that had the same objective. Our omics-based structure generator is expected to be useful for the de novo design of drugs for a variety of target proteins.
Collapse
Affiliation(s)
- Kazuma Kaitoh
- Department of Bioscience and Bioinformatics, Faculty of Computer Science and Systems Engineering, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka, Fukuoka 820-8502, Japan
| | - Yoshihiro Yamanishi
- Department of Bioscience and Bioinformatics, Faculty of Computer Science and Systems Engineering, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka, Fukuoka 820-8502, Japan
| |
Collapse
|
16
|
Azam NA, Zhu J, Sun Y, Shi Y, Shurbevski A, Zhao L, Nagamochi H, Akutsu T. A novel method for inference of acyclic chemical compounds with bounded branch-height based on artificial neural networks and integer programming. Algorithms Mol Biol 2021; 16:18. [PMID: 34391471 PMCID: PMC8364129 DOI: 10.1186/s13015-021-00197-2] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2020] [Accepted: 04/27/2021] [Indexed: 11/10/2022] Open
Abstract
Analysis of chemical graphs is becoming a major research topic in computational molecular biology due to its potential applications to drug design. One of the major approaches in such a study is inverse quantitative structure activity/property relationship (inverse QSAR/QSPR) analysis, which is to infer chemical structures from given chemical activities/properties. Recently, a novel two-phase framework has been proposed for inverse QSAR/QSPR, where in the first phase an artificial neural network (ANN) is used to construct a prediction function. In the second phase, a mixed integer linear program (MILP) formulated on the trained ANN and a graph search algorithm are used to infer desired chemical structures. The framework has been applied to the case of chemical compounds with cycle index up to 2 so far. The computational results conducted on instances with n non-hydrogen atoms show that a feature vector can be inferred by solving an MILP for up to [Formula: see text], whereas graphs can be enumerated for up to [Formula: see text]. When applied to the case of chemical acyclic graphs, the maximum computable diameter of a chemical structure was up to 8. In this paper, we introduce a new characterization of graph structure, called "branch-height" based on which a new MILP formulation and a new graph search algorithm are designed for chemical acyclic graphs. The results of computational experiments using such chemical properties as octanol/water partition coefficient, boiling point and heat of combustion suggest that the proposed method can infer chemical acyclic graphs with around [Formula: see text] and diameter 30.
Collapse
Affiliation(s)
- Naveed Ahmed Azam
- Department of Applied Mathematics and Physics, Kyoto University, Yoshida Honmachi, Sakyo, Kyoto, 606-8501, Japan
| | - Jianshen Zhu
- Department of Applied Mathematics and Physics, Kyoto University, Yoshida Honmachi, Sakyo, Kyoto, 606-8501, Japan
| | - Yanming Sun
- Department of Applied Mathematics and Physics, Kyoto University, Yoshida Honmachi, Sakyo, Kyoto, 606-8501, Japan
| | - Yu Shi
- Department of Applied Mathematics and Physics, Kyoto University, Yoshida Honmachi, Sakyo, Kyoto, 606-8501, Japan
| | - Aleksandar Shurbevski
- Department of Applied Mathematics and Physics, Kyoto University, Yoshida Honmachi, Sakyo, Kyoto, 606-8501, Japan
| | - Liang Zhao
- Graduate School of Advanced Integrated Studies in Human Survivability, Kyoto University, Yoshida Nakaadachi-cho, Sakyo, Kyoto, 606-8306, Japan
| | - Hiroshi Nagamochi
- Department of Applied Mathematics and Physics, Kyoto University, Yoshida Honmachi, Sakyo, Kyoto, 606-8501, Japan.
| | - Tatsuya Akutsu
- Bioinformatics Center, Institute for Chemical Research, Kyoto University, Gokasho, Uji, 611-0011, Japan.
| |
Collapse
|
17
|
Jiménez-Luna J, Grisoni F, Weskamp N, Schneider G. Artificial intelligence in drug discovery: recent advances and future perspectives. Expert Opin Drug Discov 2021; 16:949-959. [PMID: 33779453 DOI: 10.1080/17460441.2021.1909567] [Citation(s) in RCA: 83] [Impact Index Per Article: 27.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
Introduction: Artificial intelligence (AI) has inspired computer-aided drug discovery. The widespread adoption of machine learning, in particular deep learning, in multiple scientific disciplines, and the advances in computing hardware and software, among other factors, continue to fuel this development. Much of the initial skepticism regarding applications of AI in pharmaceutical discovery has started to vanish, consequently benefitting medicinal chemistry.Areas covered: The current status of AI in chemoinformatics is reviewed. The topics discussed herein include quantitative structure-activity/property relationship and structure-based modeling, de novo molecular design, and chemical synthesis prediction. Advantages and limitations of current deep learning applications are highlighted, together with a perspective on next-generation AI for drug discovery.Expert opinion: Deep learning-based approaches have only begun to address some fundamental problems in drug discovery. Certain methodological advances, such as message-passing models, spatial-symmetry-preserving networks, hybrid de novo design, and other innovative machine learning paradigms, will likely become commonplace and help address some of the most challenging questions. Open data sharing and model development will play a central role in the advancement of drug discovery with AI.
Collapse
Affiliation(s)
- José Jiménez-Luna
- Department of Chemistry and Applied Biosciences, ETH Zurich, Zurich, Switzerland
| | - Francesca Grisoni
- Department of Chemistry and Applied Biosciences, ETH Zurich, Zurich, Switzerland
| | - Nils Weskamp
- Boehringer Ingelheim Pharma GmbH & Co. KG, Biberach an Der Riss, Germany
| | - Gisbert Schneider
- Department of Chemistry and Applied Biosciences, ETH Zurich, Zurich, Switzerland
| |
Collapse
|
18
|
Shi Y, Zhu J, Azam NA, Haraguchi K, Zhao L, Nagamochi H, Akutsu T. An Inverse QSAR Method Based on a Two-Layered Model and Integer Programming. Int J Mol Sci 2021; 22:2847. [PMID: 33799613 PMCID: PMC8002091 DOI: 10.3390/ijms22062847] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2021] [Revised: 03/05/2021] [Accepted: 03/07/2021] [Indexed: 11/16/2022] Open
Abstract
A novel framework for inverse quantitative structure-activity relationships (inverse QSAR) has recently been proposed and developed using both artificial neural networks and mixed integer linear programming. However, classes of chemical graphs treated by the framework are limited. In order to deal with an arbitrary graph in the framework, we introduce a new model, called a two-layered model, and develop a corresponding method. In this model, each chemical graph is regarded as two parts: the exterior and the interior. The exterior consists of maximal acyclic induced subgraphs with bounded height, the interior is the connected subgraph obtained by ignoring the exterior, and the feature vector consists of the frequency of adjacent atom pairs in the interior and the frequency of chemical acyclic graphs in the exterior. Our method is more flexible than the existing method in the sense that any type of graphs can be inferred. We compared the proposed method with an existing method using several data sets obtained from PubChem database. The new method could infer more general chemical graphs with up to 50 non-hydrogen atoms. The proposed inverse QSAR method can be applied to the inference of more general chemical graphs than before.
Collapse
Affiliation(s)
- Yu Shi
- Department of Applied Mathematics and Physics, Kyoto University, Kyoto 606-8501, Japan; (Y.S.); (J.Z.); (N.A.A.); (K.H.)
| | - Jianshen Zhu
- Department of Applied Mathematics and Physics, Kyoto University, Kyoto 606-8501, Japan; (Y.S.); (J.Z.); (N.A.A.); (K.H.)
| | - Naveed Ahmed Azam
- Department of Applied Mathematics and Physics, Kyoto University, Kyoto 606-8501, Japan; (Y.S.); (J.Z.); (N.A.A.); (K.H.)
| | - Kazuya Haraguchi
- Department of Applied Mathematics and Physics, Kyoto University, Kyoto 606-8501, Japan; (Y.S.); (J.Z.); (N.A.A.); (K.H.)
| | - Liang Zhao
- Graduate School of Advanced Integrated Studies in Human Survivability (Shishu-Kan), Kyoto University, Kyoto 606-8306, Japan;
| | - Hiroshi Nagamochi
- Department of Applied Mathematics and Physics, Kyoto University, Kyoto 606-8501, Japan; (Y.S.); (J.Z.); (N.A.A.); (K.H.)
| | - Tatsuya Akutsu
- Bioinformatics Center, Institute for Chemical Research, Kyoto University, Uji 611-0011, Japan;
| |
Collapse
|
19
|
Schultz KJ, Colby SM, Yesiltepe Y, Nuñez JR, McGrady MY, Renslow RS. Application and assessment of deep learning for the generation of potential NMDA receptor antagonists. Phys Chem Chem Phys 2021; 23:1197-1214. [PMID: 33355332 DOI: 10.1039/d0cp03620j] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Abstract
Uncompetitive antagonists of the N-methyl d-aspartate receptor (NMDAR) have demonstrated therapeutic benefit in the treatment of neurological diseases such as Parkinson's and Alzheimer's, but some also cause dissociative effects that have led to the synthesis of illicit drugs. The ability to generate NMDAR antagonists in silico is therefore desirable for both new medication development and preempting and identifying new designer drugs. Recently, generative deep learning models have been applied to de novo drug design as a means to expand the amount of chemical space that can be explored for potential drug-like compounds. In this study, we assess the application of a generative model to the NMDAR to achieve two primary objectives: (i) the creation and release of a comprehensive library of experimentally validated NMDAR phencyclidine (PCP) site antagonists to assist the drug discovery community and (ii) an analysis of both the advantages conferred by applying such generative artificial intelligence models to drug design and the current limitations of the approach. We apply, and provide source code for, a variety of ligand- and structure-based assessment techniques used in standard drug discovery analyses to the deep learning-generated compounds. We present twelve candidate antagonists that are not available in existing chemical databases to provide an example of what this type of workflow can achieve, though synthesis and experimental validation of these compounds are still required.
Collapse
Affiliation(s)
| | - Sean M Colby
- Pacific Northwest National Laboratory, Richland, WA, USA.
| | | | - Jamie R Nuñez
- Pacific Northwest National Laboratory, Richland, WA, USA.
| | | | - Ryan S Renslow
- Pacific Northwest National Laboratory, Richland, WA, USA.
| |
Collapse
|
20
|
Signal Deconvolution and Generative Topographic Mapping Regression for Solid-State NMR of Multi-Component Materials. Int J Mol Sci 2021; 22:ijms22031086. [PMID: 33499371 PMCID: PMC7865946 DOI: 10.3390/ijms22031086] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/25/2020] [Revised: 01/15/2021] [Accepted: 01/17/2021] [Indexed: 01/19/2023] Open
Abstract
Solid-state nuclear magnetic resonance (ssNMR) spectroscopy provides information on native structures and the dynamics for predicting and designing the physical properties of multi-component solid materials. However, such an analysis is difficult because of the broad and overlapping spectra of these materials. Therefore, signal deconvolution and prediction are great challenges for their ssNMR analysis. We examined signal deconvolution methods using a short-time Fourier transform (STFT) and a non-negative tensor/matrix factorization (NTF, NMF), and methods for predicting NMR signals and physical properties using generative topographic mapping regression (GTMR). We demonstrated the applications for macromolecular samples involved in cellulose degradation, plastics, and microalgae such as Euglena gracilis. During cellulose degradation, 13C cross-polarization (CP)-magic angle spinning spectra were separated into signals of cellulose, proteins, and lipids by STFT and NTF. GTMR accurately predicted cellulose degradation for catabolic products such as acetate and CO2. Using these methods, the 1H anisotropic spectrum of poly-ε-caprolactone was separated into the signals of crystalline and amorphous solids. Forward prediction and inverse prediction of GTMR were used to compute STFT-processed NMR signals from the physical properties of polylactic acid. These signal deconvolution and prediction methods for ssNMR spectra of macromolecules can resolve the problem of overlapping spectra and support macromolecular characterization and material design.
Collapse
|
21
|
Kell DB, Samanta S, Swainston N. Deep learning and generative methods in cheminformatics and chemical biology: navigating small molecule space intelligently. Biochem J 2020; 477:4559-4580. [PMID: 33290527 PMCID: PMC7733676 DOI: 10.1042/bcj20200781] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2020] [Revised: 11/11/2020] [Accepted: 11/12/2020] [Indexed: 12/15/2022]
Abstract
The number of 'small' molecules that may be of interest to chemical biologists - chemical space - is enormous, but the fraction that have ever been made is tiny. Most strategies are discriminative, i.e. have involved 'forward' problems (have molecule, establish properties). However, we normally wish to solve the much harder generative or inverse problem (describe desired properties, find molecule). 'Deep' (machine) learning based on large-scale neural networks underpins technologies such as computer vision, natural language processing, driverless cars, and world-leading performance in games such as Go; it can also be applied to the solution of inverse problems in chemical biology. In particular, recent developments in deep learning admit the in silico generation of candidate molecular structures and the prediction of their properties, thereby allowing one to navigate (bio)chemical space intelligently. These methods are revolutionary but require an understanding of both (bio)chemistry and computer science to be exploited to best advantage. We give a high-level (non-mathematical) background to the deep learning revolution, and set out the crucial issue for chemical biology and informatics as a two-way mapping from the discrete nature of individual molecules to the continuous but high-dimensional latent representation that may best reflect chemical space. A variety of architectures can do this; we focus on a particular type known as variational autoencoders. We then provide some examples of recent successes of these kinds of approach, and a look towards the future.
Collapse
Affiliation(s)
- Douglas B. Kell
- Department of Biochemistry and Systems Biology, Institute of Systems, Molecular and Integrative Biology, Faculty of Health and Life Sciences, University of Liverpool, Crown St, Liverpool L69 7ZB, U.K
- Novo Nordisk Foundation Centre for Biosustainability, Technical University of Denmark, Building 220, Kemitorvet, 2800 Kgs. Lyngby, Denmark
| | - Soumitra Samanta
- Department of Biochemistry and Systems Biology, Institute of Systems, Molecular and Integrative Biology, Faculty of Health and Life Sciences, University of Liverpool, Crown St, Liverpool L69 7ZB, U.K
| | - Neil Swainston
- Department of Biochemistry and Systems Biology, Institute of Systems, Molecular and Integrative Biology, Faculty of Health and Life Sciences, University of Liverpool, Crown St, Liverpool L69 7ZB, U.K
| |
Collapse
|
22
|
Abstract
The identification of synthetic routes that end with the desired product is considered an inherently time-consuming process that is largely dependent on expert knowledge regarding a limited proportion of the entire reaction space. At present, emerging machine learning technologies are reformulating the process of retrosynthetic planning. This study aimed to discover synthetic routes backwardly from a given desired molecule to commercially available compounds. The problem is reduced to a combinatorial optimization task with the solution space subject to the combinatorial complexity of all possible pairs of purchasable reactants. We address this issue within the framework of Bayesian inference and computation. The workflow consists of the training of a deep neural network, which is used to forwardly predict a product of the given reactants with a high level of accuracy, followed by inversion of the forward model into the backward one via Bayes' law of conditional probability. Using the backward model, a diverse set of highly probable reaction sequences ending with a given synthetic target is exhaustively explored using a Monte Carlo search algorithm. With a forward model prediction accuracy of approximately 87%, the Bayesian retrosynthesis algorithm successfully rediscovered 81.8 and 33.3% of known synthetic routes of one-step and two-step reactions, respectively, with top-10 accuracy. Remarkably, the Monte Carlo algorithm, which was specifically designed for the presence of multiple diverse routes, often revealed a ranked list of hundreds of reaction routes to the same synthetic target. We also investigated the potential applicability of such diverse candidates based on expert knowledge of synthetic organic chemistry.
Collapse
Affiliation(s)
- Zhongliang Guo
- The Institute of Statistical Mathematics, Research Organization of Information and Systems, Tachikawa, Tokyo 190-8562, Japan
| | - Stephen Wu
- The Institute of Statistical Mathematics, Research Organization of Information and Systems, Tachikawa, Tokyo 190-8562, Japan.,The Graduate University for Advanced Studies, SOKENDAI, Tachikawa, Tokyo 190-8562, Japan
| | - Mitsuru Ohno
- Daicel Corporation, Kita-ku, Osaka 530-0011, Japan
| | - Ryo Yoshida
- The Institute of Statistical Mathematics, Research Organization of Information and Systems, Tachikawa, Tokyo 190-8562, Japan.,The Graduate University for Advanced Studies, SOKENDAI, Tachikawa, Tokyo 190-8562, Japan.,National Institute for Materials Science, Tsukuba, Ibaraki 305-0047, Japan
| |
Collapse
|
23
|
Joo S, Kim MS, Yang J, Park J. Generative Model for Proposing Drug Candidates Satisfying Anticancer Properties Using a Conditional Variational Autoencoder. ACS OMEGA 2020; 5:18642-18650. [PMID: 32775866 PMCID: PMC7407547 DOI: 10.1021/acsomega.0c01149] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/16/2020] [Accepted: 07/07/2020] [Indexed: 05/08/2023]
Abstract
Deep learning-based molecular generative models have successfully identified drug candidates with desired properties against biological targets of interest. However, syntactically invalid molecules generated from a deep learning-generated model hinder the model from being applied to drug discovery. Herein, we propose a conditional variational autoencoder (CVAE) as a generative model to propose drug candidates with the desired property outside a data set range. We train the CVAE using molecular fingerprints and corresponding GI50 (inhibition of growth by 50%) results for breast cancer cell lines instead of training with various physical properties for each molecule together. We confirm that the generated fingerprints, not included in the training data set, represent the desired property using the CVAE model. In addition, our method can be used as a query expansion method for searching databases because fingerprints generated using our method can be regarded as expanded queries.
Collapse
|
24
|
Chuang KV, Gunsalus LM, Keiser MJ. Learning Molecular Representations for Medicinal Chemistry. J Med Chem 2020; 63:8705-8722. [PMID: 32366098 DOI: 10.1021/acs.jmedchem.0c00385] [Citation(s) in RCA: 73] [Impact Index Per Article: 18.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Abstract
The accurate modeling and prediction of small molecule properties and bioactivities depend on the critical choice of molecular representation. Decades of informatics-driven research have relied on expert-designed molecular descriptors to establish quantitative structure-activity and structure-property relationships for drug discovery. Now, advances in deep learning make it possible to efficiently and compactly learn molecular representations directly from data. In this review, we discuss how active research in molecular deep learning can address limitations of current descriptors and fingerprints while creating new opportunities in cheminformatics and virtual screening. We provide a concise overview of the role of representations in cheminformatics, key concepts in deep learning, and argue that learning representations provides a way forward to improve the predictive modeling of small molecule bioactivities and properties.
Collapse
Affiliation(s)
- Kangway V Chuang
- Department of Pharmaceutical Chemistry, Department of Bioengineering & Therapeutic Sciences, Institute for Neurodegenerative Diseases, Kavli Institute for Fundamental Neuroscience, Bakar Computational Health Sciences Institute, University of California, San Francisco, 675 Nelson Rising Lane, San Francisco, California 94143, United States
| | - Laura M Gunsalus
- Department of Pharmaceutical Chemistry, Department of Bioengineering & Therapeutic Sciences, Institute for Neurodegenerative Diseases, Kavli Institute for Fundamental Neuroscience, Bakar Computational Health Sciences Institute, University of California, San Francisco, 675 Nelson Rising Lane, San Francisco, California 94143, United States
| | - Michael J Keiser
- Department of Pharmaceutical Chemistry, Department of Bioengineering & Therapeutic Sciences, Institute for Neurodegenerative Diseases, Kavli Institute for Fundamental Neuroscience, Bakar Computational Health Sciences Institute, University of California, San Francisco, 675 Nelson Rising Lane, San Francisco, California 94143, United States
| |
Collapse
|
25
|
Polishchuk P. CReM: chemically reasonable mutations framework for structure generation. J Cheminform 2020; 12:28. [PMID: 33430959 PMCID: PMC7178718 DOI: 10.1186/s13321-020-00431-w] [Citation(s) in RCA: 31] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2020] [Accepted: 04/15/2020] [Indexed: 12/12/2022] Open
Abstract
Structure generators are widely used in de novo design studies and their performance substantially influences an outcome. Approaches based on the deep learning models and conventional atom-based approaches may result in invalid structures and fail to address their synthetic feasibility issues. On the other hand, conventional reaction-based approaches result in synthetically feasible compounds but novelty and diversity of generated compounds may be limited. Fragment-based approaches can provide both better novelty and diversity of generated compounds but the issue of synthetic complexity of generated structure was not explicitly addressed before. Here we developed a new framework of fragment-based structure generation that, by design, results in the chemically valid structures and provides flexible control over diversity, novelty, synthetic complexity and chemotypes of generated compounds. The framework was implemented as an open-source Python module and can be used to create custom workflows for the exploration of chemical space.
Collapse
Affiliation(s)
- Pavel Polishchuk
- Institute of Molecular and Translational Medicine, Faculty of Medicine and Dentistry, Palacky University and University Hospital in Olomouc, Hnevotinska 5, 77900, Olomouc, Czech Republic.
| |
Collapse
|
26
|
Grisoni F, Moret M, Lingwood R, Schneider G. Bidirectional Molecule Generation with Recurrent Neural Networks. J Chem Inf Model 2020; 60:1175-1183. [PMID: 31904964 DOI: 10.1021/acs.jcim.9b00943] [Citation(s) in RCA: 85] [Impact Index Per Article: 21.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/08/2023]
Abstract
Recurrent neural networks (RNNs) are able to generate de novo molecular designs using simplified molecular input line entry systems (SMILES) string representations of the chemical structure. RNN-based structure generation is usually performed unidirectionally, by growing SMILES strings from left to right. However, there is no natural start or end of a small molecule, and SMILES strings are intrinsically nonunivocal representations of molecular graphs. These properties motivate bidirectional structure generation. Here, bidirectional generative RNNs for SMILES-based molecule design are introduced. To this end, two established bidirectional methods were implemented, and a new method for SMILES string generation and data augmentation is introduced-the bidirectional molecule design by alternate learning (BIMODAL). These three bidirectional strategies were compared to the unidirectional forward RNN approach for SMILES string generation, in terms of the (i) novelty, (ii) scaffold diversity, and (iii) chemical-biological relevance of the computer-generated molecules. The results positively advocate bidirectional strategies for SMILES-based molecular de novo design, with BIMODAL showing superior results to the unidirectional forward RNN for most of the criteria in the tested conditions. The code of the methods and the pretrained models can be found at URL https://github.com/ETHmodlab/BIMODAL.
Collapse
Affiliation(s)
- Francesca Grisoni
- Department of Chemistry and Applied Biosciences, RETHINK, ETH Zurich, Vladimir-Prelog-Weg 4, 8093 Zurich, Switzerland
| | - Michael Moret
- Department of Chemistry and Applied Biosciences, RETHINK, ETH Zurich, Vladimir-Prelog-Weg 4, 8093 Zurich, Switzerland
| | - Robin Lingwood
- Department of Chemistry and Applied Biosciences, RETHINK, ETH Zurich, Vladimir-Prelog-Weg 4, 8093 Zurich, Switzerland
| | - Gisbert Schneider
- Department of Chemistry and Applied Biosciences, RETHINK, ETH Zurich, Vladimir-Prelog-Weg 4, 8093 Zurich, Switzerland
| |
Collapse
|
27
|
Chen G, Shen Z, Iyer A, Ghumman UF, Tang S, Bi J, Chen W, Li Y. Machine-Learning-Assisted De Novo Design of Organic Molecules and Polymers: Opportunities and Challenges. Polymers (Basel) 2020; 12:E163. [PMID: 31936321 PMCID: PMC7023065 DOI: 10.3390/polym12010163] [Citation(s) in RCA: 53] [Impact Index Per Article: 13.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2019] [Revised: 12/27/2019] [Accepted: 01/02/2020] [Indexed: 12/18/2022] Open
Abstract
Organic molecules and polymers have a broad range of applications in biomedical, chemical, and materials science fields. Traditional design approaches for organic molecules and polymers are mainly experimentally-driven, guided by experience, intuition, and conceptual insights. Though they have been successfully applied to discover many important materials, these methods are facing significant challenges due to the tremendous demand of new materials and vast design space of organic molecules and polymers. Accelerated and inverse materials design is an ideal solution to these challenges. With advancements in high-throughput computation, artificial intelligence (especially machining learning, ML), and the growth of materials databases, ML-assisted materials design is emerging as a promising tool to flourish breakthroughs in many areas of materials science and engineering. To date, using ML-assisted approaches, the quantitative structure property/activity relation for material property prediction can be established more accurately and efficiently. In addition, materials design can be revolutionized and accelerated much faster than ever, through ML-enabled molecular generation and inverse molecular design. In this perspective, we review the recent progresses in ML-guided design of organic molecules and polymers, highlight several successful examples, and examine future opportunities in biomedical, chemical, and materials science fields. We further discuss the relevant challenges to solve in order to fully realize the potential of ML-assisted materials design for organic molecules and polymers. In particular, this study summarizes publicly available materials databases, feature representations for organic molecules, open-source tools for feature generation, methods for molecular generation, and ML models for prediction of material properties, which serve as a tutorial for researchers who have little experience with ML before and want to apply ML for various applications. Last but not least, it draws insights into the current limitations of ML-guided design of organic molecules and polymers. We anticipate that ML-assisted materials design for organic molecules and polymers will be the driving force in the near future, to meet the tremendous demand of new materials with tailored properties in different fields.
Collapse
Affiliation(s)
- Guang Chen
- Department of Mechanical Engineering, University of Connecticut, Storrs, CT 06269, USA; (G.C.); (Z.S.)
| | - Zhiqiang Shen
- Department of Mechanical Engineering, University of Connecticut, Storrs, CT 06269, USA; (G.C.); (Z.S.)
| | - Akshay Iyer
- Department of Mechanical Engineering, Northwestern University, Evanston, IL 60208, USA; (A.I.); (U.F.G.)
| | - Umar Farooq Ghumman
- Department of Mechanical Engineering, Northwestern University, Evanston, IL 60208, USA; (A.I.); (U.F.G.)
| | - Shan Tang
- State Key Laboratory of Structural Analysis for Industrial Equipment, Department of Engineering Mechanics, and International Research Center for Computational Mechanics, Dalian University of Technology, Dalian 116023, China;
| | - Jinbo Bi
- Department of Computer Science and Engineering, University of Connecticut, Storrs, CT 06269, USA;
| | - Wei Chen
- Department of Mechanical Engineering, Northwestern University, Evanston, IL 60208, USA; (A.I.); (U.F.G.)
| | - Ying Li
- Department of Mechanical Engineering, University of Connecticut, Storrs, CT 06269, USA; (G.C.); (Z.S.)
- Polymer Program, Institute of Materials Science, University of Connecticut, Storrs, CT 06269, USA
| |
Collapse
|
28
|
Wu S, Lambard G, Liu C, Yamada H, Yoshida R. iQSPR in XenonPy: A Bayesian Molecular Design Algorithm. Mol Inform 2020; 39:e1900107. [PMID: 31841276 PMCID: PMC7050509 DOI: 10.1002/minf.201900107] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2019] [Accepted: 10/14/2019] [Indexed: 01/10/2023]
Abstract
iQSPR is an inverse molecular design algorithm based on Bayesian inference that was developed in our previous study. Here, the algorithm is integrated in Python as a new module called iQSPR-X in the all-in-one materials informatics platform XenonPy. Our new software provides a flexible, easy-to-use, and extensible platform for users to build customized molecular design algorithms using pre-set modules and a pre-trained model library in XenonPy. In this paper, we describe key features of iQSPR-X and provide guidance on its use, illustrated by an application to a polymer design that targets a specific range of bandgap and dielectric constant.
Collapse
Affiliation(s)
- Stephen Wu
- The Institute of Statistical Mathematics, Research Organization of Information and Systems10-3 Midori-choTachikawa, Tokyo190-8562Japan
- The Graduate University for Advanced Studies, SOKENDAI10-3 Midori-choTachikawa, Tokyo190-8562Japan
| | - Guillaume Lambard
- Center for Materials Research by Information Integration (CMI)Research and Services Division of Materials Data and Integrated System (MaDIS), National Institute for Materials Science (NIMS)1-2-1 Sengen, TsukubaIbaraki305-0047Japan
| | - Chang Liu
- The Institute of Statistical Mathematics, Research Organization of Information and Systems10-3 Midori-choTachikawa, Tokyo190-8562Japan
| | - Hironao Yamada
- The Institute of Statistical Mathematics, Research Organization of Information and Systems10-3 Midori-choTachikawa, Tokyo190-8562Japan
- School of Pharmacy, TokyoUniversity of Pharmacy and Life Sciences1432-1 Horinouchi, HachiojiTokyo192-0392Japan
| | - Ryo Yoshida
- The Institute of Statistical Mathematics, Research Organization of Information and Systems10-3 Midori-choTachikawa, Tokyo190-8562Japan
- The Graduate University for Advanced Studies, SOKENDAI10-3 Midori-choTachikawa, Tokyo190-8562Japan
- Center for Materials Research by Information Integration (CMI)Research and Services Division of Materials Data and Integrated System (MaDIS), National Institute for Materials Science (NIMS)1-2-1 Sengen, TsukubaIbaraki305-0047Japan
| |
Collapse
|
29
|
Schneider P, Walters WP, Plowright AT, Sieroka N, Listgarten J, Goodnow RA, Fisher J, Jansen JM, Duca JS, Rush TS, Zentgraf M, Hill JE, Krutoholow E, Kohler M, Blaney J, Funatsu K, Luebkemann C, Schneider G. Rethinking drug design in the artificial intelligence era. Nat Rev Drug Discov 2019. [DOI: 78495111110.1038/s41573-019-0050-3' target='_blank'>'"<>78495111110.1038/s41573-019-0050-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [78495111110.1038/s41573-019-0050-3','', '10.1021/acs.jcim.5b00628')">Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/29/2022]
78495111110.1038/s41573-019-0050-3" />
|
30
|
Rethinking drug design in the artificial intelligence era. Nat Rev Drug Discov 2019; 19:353-364. [DOI: 10.1038/s41573-019-0050-3] [Citation(s) in RCA: 222] [Impact Index Per Article: 44.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/28/2019] [Indexed: 12/17/2022]
|
31
|
Horvath D, Marcou G, Varnek A. Generative topographic mapping in drug design. DRUG DISCOVERY TODAY. TECHNOLOGIES 2019; 32-33:99-107. [PMID: 33386101 DOI: 10.1016/j.ddtec.2020.06.003] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/07/2020] [Revised: 06/10/2020] [Accepted: 06/18/2020] [Indexed: 06/12/2023]
Abstract
This is a review article of Generative Topographic Mapping (GTM) - a non-linear dimensionality reduction technique producing generative 2D maps of high-dimensional vector spaces - and its specific applications in Drug Design (chemical space cartography, compound library design and analysis, virtual screening, pharmacological profiling, de novo drug design, conformational space & docking interaction cartography, etc.) Written by chemoinformaticians for potential users among medicinal chemists and biologists, the article purposely avoids all underlying mathematics. First, the GTM concept is intuitively explained, based on the strong analogies with the rather popular Self-Organizing Maps (SOMs), which are well established library analysis tools. GTM is basically a fuzzy-logics-based generalization of SOMs. The second part of the review, some of published GTM applications in drug design are briefly revisited.
Collapse
Affiliation(s)
- Dragos Horvath
- Laboratory of Chemoinformatics, UMR 7140 University of Strasbourg/CNRS, 4 rue Blaise Pascal, 67000 Strasbourg, France.
| | - Gilles Marcou
- Laboratory of Chemoinformatics, UMR 7140 University of Strasbourg/CNRS, 4 rue Blaise Pascal, 67000 Strasbourg, France
| | - Alexandre Varnek
- Laboratory of Chemoinformatics, UMR 7140 University of Strasbourg/CNRS, 4 rue Blaise Pascal, 67000 Strasbourg, France.
| |
Collapse
|
32
|
Gantzer P, Creton B, Nieto-Draghi C. Inverse-QSPR for de novo Design: A Review. Mol Inform 2019; 39:e1900087. [PMID: 31682079 DOI: 10.1002/minf.201900087] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2019] [Accepted: 11/04/2019] [Indexed: 11/09/2022]
Abstract
The use of computer tools to solve chemistry-related problems has given rise to a large and increasing number of publications these last decades. This new field of science is now well recognized and labelled Chemoinformatics. Among all chemoinformatics techniques, the use of statistical based approaches for property predictions has been the subject of numerous research reflecting both new developments and many cases of applications. The so obtained predictive models relating a property to molecular features - descriptors - are gathered under the acronym QSPR, for Quantitative Structure Property Relationships. Apart from the obvious use of such models to predict property values for new compounds, their use to virtually synthesize new molecules - de novo design - is currently a high-interest subject. Inverse-QSPR (i-QSPR) methods have hence been developed to accelerate the discovery of new materials that meet a set of specifications. In the proposed manuscript, we review existing i-QSPR methodologies published in the open literature in a way to highlight developments, applications, improvements and limitations of each.
Collapse
Affiliation(s)
- Philippe Gantzer
- IFP Energies nouvelles, 1 et 4 avenue de Bois-Préau, 92852, Rueil-Malmaison, France
| | - Benoit Creton
- IFP Energies nouvelles, 1 et 4 avenue de Bois-Préau, 92852, Rueil-Malmaison, France
| | - Carlos Nieto-Draghi
- IFP Energies nouvelles, 1 et 4 avenue de Bois-Préau, 92852, Rueil-Malmaison, France
| |
Collapse
|
33
|
Xiong Z, Wang D, Liu X, Zhong F, Wan X, Li X, Li Z, Luo X, Chen K, Jiang H, Zheng M. Pushing the Boundaries of Molecular Representation for Drug Discovery with the Graph Attention Mechanism. J Med Chem 2019; 63:8749-8760. [DOI: 10.1021/acs.jmedchem.9b00959] [Citation(s) in RCA: 151] [Impact Index Per Article: 30.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Affiliation(s)
- Zhaoping Xiong
- Shanghai Institute for Advanced Immunochemical Studies, and School of Life Science and Technology, ShanghaiTech University, Shanghai 200031, China
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai 201203, China
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing 100049, China
| | - Dingyan Wang
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai 201203, China
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing 100049, China
| | - Xiaohong Liu
- Shanghai Institute for Advanced Immunochemical Studies, and School of Life Science and Technology, ShanghaiTech University, Shanghai 200031, China
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai 201203, China
| | - Feisheng Zhong
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai 201203, China
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing 100049, China
| | - Xiaozhe Wan
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai 201203, China
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing 100049, China
| | - Xutong Li
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai 201203, China
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing 100049, China
| | - Zhaojun Li
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai 201203, China
| | - Xiaomin Luo
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai 201203, China
| | - Kaixian Chen
- Shanghai Institute for Advanced Immunochemical Studies, and School of Life Science and Technology, ShanghaiTech University, Shanghai 200031, China
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai 201203, China
| | - Hualiang Jiang
- Shanghai Institute for Advanced Immunochemical Studies, and School of Life Science and Technology, ShanghaiTech University, Shanghai 200031, China
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai 201203, China
| | - Mingyue Zheng
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai 201203, China
| |
Collapse
|
34
|
Yang X, Wang Y, Byrne R, Schneider G, Yang S. Concepts of Artificial Intelligence for Computer-Assisted Drug Discovery. Chem Rev 2019; 119:10520-10594. [PMID: 31294972 DOI: 10.1021/acs.chemrev.8b00728] [Citation(s) in RCA: 346] [Impact Index Per Article: 69.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
Abstract
Artificial intelligence (AI), and, in particular, deep learning as a subcategory of AI, provides opportunities for the discovery and development of innovative drugs. Various machine learning approaches have recently (re)emerged, some of which may be considered instances of domain-specific AI which have been successfully employed for drug discovery and design. This review provides a comprehensive portrayal of these machine learning techniques and of their applications in medicinal chemistry. After introducing the basic principles, alongside some application notes, of the various machine learning algorithms, the current state-of-the art of AI-assisted pharmaceutical discovery is discussed, including applications in structure- and ligand-based virtual screening, de novo drug design, physicochemical and pharmacokinetic property prediction, drug repurposing, and related aspects. Finally, several challenges and limitations of the current methods are summarized, with a view to potential future directions for AI-assisted drug discovery and design.
Collapse
Affiliation(s)
- Xin Yang
- State Key Laboratory of Biotherapy and Cancer Center, West China Hospital , Sichuan University , Chengdu , Sichuan 610041 , China
| | - Yifei Wang
- State Key Laboratory of Biotherapy and Cancer Center, West China Hospital , Sichuan University , Chengdu , Sichuan 610041 , China
| | - Ryan Byrne
- ETH Zurich , Department of Chemistry and Applied Biosciences , Vladimir-Prelog-Weg 4 , CH-8093 Zurich , Switzerland
| | - Gisbert Schneider
- ETH Zurich , Department of Chemistry and Applied Biosciences , Vladimir-Prelog-Weg 4 , CH-8093 Zurich , Switzerland
| | - Shengyong Yang
- State Key Laboratory of Biotherapy and Cancer Center, West China Hospital , Sichuan University , Chengdu , Sichuan 610041 , China
| |
Collapse
|
35
|
Sattarov B, Baskin II, Horvath D, Marcou G, Bjerrum EJ, Varnek A. De Novo Molecular Design by Combining Deep Autoencoder Recurrent Neural Networks with Generative Topographic Mapping. J Chem Inf Model 2019; 59:1182-1196. [PMID: 30785751 DOI: 10.1021/acs.jcim.8b00751] [Citation(s) in RCA: 75] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Here we show that Generative Topographic Mapping (GTM) can be used to explore the latent space of the SMILES-based autoencoders and generate focused molecular libraries of interest. We have built a sequence-to-sequence neural network with Bidirectional Long Short-Term Memory layers and trained it on the SMILES strings from ChEMBL23. Very high reconstruction rates of the test set molecules were achieved (>98%), which are comparable to the ones reported in related publications. Using GTM, we have visualized the autoencoder latent space on the two-dimensional topographic map. Targeted map zones can be used for generating novel molecular structures by sampling associated latent space points and decoding them to SMILES. The sampling method based on a genetic algorithm was introduced to optimize compound properties "on the fly". The generated focused molecular libraries were shown to contain original and a priori feasible compounds which, pending actual synthesis and testing, showed encouraging behavior in independent structure-based affinity estimation procedures (pharmacophore matching, docking).
Collapse
Affiliation(s)
- Boris Sattarov
- Laboratory of Chemoinformatics , UMR 7177 University of Strasbourg/CNRS , 4 rue B. Pascal , 67000 Strasbourg , France
| | - Igor I Baskin
- Faculty of Physics , M.V. Lomonosov Moscow State University , Leninskie Gory , Moscow 19991 , Russia
| | - Dragos Horvath
- Laboratory of Chemoinformatics , UMR 7177 University of Strasbourg/CNRS , 4 rue B. Pascal , 67000 Strasbourg , France
| | - Gilles Marcou
- Laboratory of Chemoinformatics , UMR 7177 University of Strasbourg/CNRS , 4 rue B. Pascal , 67000 Strasbourg , France
| | - Esben Jannik Bjerrum
- Wildcard Pharmaceutical Consulting, Zeaborg Science Center, Frødings Allé 41 , 2860 Søborg , Denmark
| | - Alexandre Varnek
- Laboratory of Chemoinformatics , UMR 7177 University of Strasbourg/CNRS , 4 rue B. Pascal , 67000 Strasbourg , France
| |
Collapse
|
36
|
Hatakeyama-Sato K, Tezuka T, Nishikitani Y, Nishide H, Oyaizu K. Synthesis of Lithium-ion Conducting Polymers Designed by Machine Learning-based Prediction and Screening. CHEM LETT 2019. [DOI: 10.1246/cl.180847] [Citation(s) in RCA: 25] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Affiliation(s)
- Kan Hatakeyama-Sato
- Department of Applied Chemistry and Research Institute for Science and Engineering, Waseda University, Tokyo 169-8555, Japan
| | - Toshiki Tezuka
- Department of Applied Chemistry and Research Institute for Science and Engineering, Waseda University, Tokyo 169-8555, Japan
| | - Yoshinori Nishikitani
- Department of Applied Chemistry and Research Institute for Science and Engineering, Waseda University, Tokyo 169-8555, Japan
| | - Hiroyuki Nishide
- Department of Applied Chemistry and Research Institute for Science and Engineering, Waseda University, Tokyo 169-8555, Japan
| | - Kenichi Oyaizu
- Department of Applied Chemistry and Research Institute for Science and Engineering, Waseda University, Tokyo 169-8555, Japan
| |
Collapse
|
37
|
Xue D, Gong Y, Yang Z, Chuai G, Qu S, Shen A, Yu J, Liu Q. Advances and challenges in deep generative models for de novo molecule generation. WILEY INTERDISCIPLINARY REVIEWS-COMPUTATIONAL MOLECULAR SCIENCE 2018. [DOI: 10.1002/wcms.1395] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
Affiliation(s)
- Dongyu Xue
- Department of Endocrinology & Metabolism Shanghai Tenth People's Hospital Shanghai China
- Bioinformatics Department, School of Life Sciences and Technology Tongji University Shanghai China
| | - Yukang Gong
- Department of Ophthalmology, Shanghai Tenth People's Hospital Tongji University Shanghai China
- Department of Ophthalmology Ninghai First Hospital Zhejiang China
| | - Zhaoyi Yang
- Department of Pharmacy The First Affiliated Hospital of University of Science and Technology of China Hefei China
| | - Guohui Chuai
- Department of Endocrinology & Metabolism Shanghai Tenth People's Hospital Shanghai China
- Bioinformatics Department, School of Life Sciences and Technology Tongji University Shanghai China
| | - Sheng Qu
- Department of Endocrinology & Metabolism Shanghai Tenth People's Hospital Shanghai China
- Bioinformatics Department, School of Life Sciences and Technology Tongji University Shanghai China
| | - Aizong Shen
- Department of Pharmacy The First Affiliated Hospital of University of Science and Technology of China Hefei China
| | - Jing Yu
- Department of Ophthalmology, Shanghai Tenth People's Hospital Tongji University Shanghai China
- Department of Ophthalmology Ninghai First Hospital Zhejiang China
| | - Qi Liu
- Department of Endocrinology & Metabolism Shanghai Tenth People's Hospital Shanghai China
- Bioinformatics Department, School of Life Sciences and Technology Tongji University Shanghai China
- Department of Ophthalmology Ninghai First Hospital Zhejiang China
| |
Collapse
|
38
|
Kaneko H. Data Visualization, Regression, Applicability Domains and Inverse Analysis Based on Generative Topographic Mapping. Mol Inform 2018; 38:e1800088. [PMID: 30259699 DOI: 10.1002/minf.201800088] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2018] [Accepted: 08/30/2018] [Indexed: 01/11/2023]
Abstract
This paper introduces two generative topographic mapping (GTM) methods that can be used for data visualization, regression analysis, inverse analysis, and the determination of applicability domains (ADs). In GTM-multiple linear regression (GTM-MLR), the prior probability distribution of the descriptors or explanatory variables (X) is calculated with GTM, and the posterior probability distribution of the property/activity or objective variable (y) given X is calculated with MLR; inverse analysis is then performed using the product rule and Bayes' theorem. In GTM-regression (GTMR), X and y are combined and GTM is performed to obtain the joint probability distribution of X and y; this leads to the posterior probability distributions of y given X and of X given y, which are used for regression and inverse analysis, respectively. Simulations using linear and nonlinear datasets and quantitative structure-activity relationship (QSAR) and quantitative structure-property relationship (QSPR) datasets confirm that GTM-MLR and GTMR enable data visualization, regression analysis, and inverse analysis considering appropriate ADs. Python and MATLAB codes for the proposed algorithms are available at https://github.com/hkaneko1985/gtm-generativetopographicmapping.
Collapse
Affiliation(s)
- Hiromasa Kaneko
- Department of Applied Chemistry, Meiji University 1-1-1 Higashi-Mita, Tama-ku, Kawasaki, Kanagawa, 214-8571, Japan
| |
Collapse
|
39
|
Popova M, Isayev O, Tropsha A. Deep reinforcement learning for de novo drug design. SCIENCE ADVANCES 2018; 4:eaap7885. [PMID: 30050984 PMCID: PMC6059760 DOI: 10.1126/sciadv.aap7885] [Citation(s) in RCA: 499] [Impact Index Per Article: 83.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/26/2017] [Accepted: 06/13/2018] [Indexed: 05/20/2023]
Abstract
We have devised and implemented a novel computational strategy for de novo design of molecules with desired properties termed ReLeaSE (Reinforcement Learning for Structural Evolution). On the basis of deep and reinforcement learning (RL) approaches, ReLeaSE integrates two deep neural networks-generative and predictive-that are trained separately but are used jointly to generate novel targeted chemical libraries. ReLeaSE uses simple representation of molecules by their simplified molecular-input line-entry system (SMILES) strings only. Generative models are trained with a stack-augmented memory network to produce chemically feasible SMILES strings, and predictive models are derived to forecast the desired properties of the de novo-generated compounds. In the first phase of the method, generative and predictive models are trained separately with a supervised learning algorithm. In the second phase, both models are trained jointly with the RL approach to bias the generation of new chemical structures toward those with the desired physical and/or biological properties. In the proof-of-concept study, we have used the ReLeaSE method to design chemical libraries with a bias toward structural complexity or toward compounds with maximal, minimal, or specific range of physical properties, such as melting point or hydrophobicity, or toward compounds with inhibitory activity against Janus protein kinase 2. The approach proposed herein can find a general use for generating targeted chemical libraries of novel compounds optimized for either a single desired property or multiple properties.
Collapse
Affiliation(s)
- Mariya Popova
- Laboratory for Molecular Modeling, Division of Chemical Biology and Medicinal Chemistry, UNC Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, NC 27599, USA
- Moscow Institute of Physics and Technology, Dolgoprudny, Moscow 141700, Russia
- Skolkovo Institute of Science and Technology, Moscow 143026, Russia
| | - Olexandr Isayev
- Laboratory for Molecular Modeling, Division of Chemical Biology and Medicinal Chemistry, UNC Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, NC 27599, USA
- Corresponding author. (A.T.); (O.I.)
| | - Alexander Tropsha
- Laboratory for Molecular Modeling, Division of Chemical Biology and Medicinal Chemistry, UNC Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, NC 27599, USA
- Corresponding author. (A.T.); (O.I.)
| |
Collapse
|
40
|
Panteleev J, Gao H, Jia L. Recent applications of machine learning in medicinal chemistry. Bioorg Med Chem Lett 2018; 28:2807-2815. [PMID: 30122222 DOI: 10.1016/j.bmcl.2018.06.046] [Citation(s) in RCA: 53] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2018] [Revised: 06/24/2018] [Accepted: 06/26/2018] [Indexed: 12/20/2022]
Abstract
In recent decades, artificial intelligence and machine learning have played a significant role in increasing the efficiency of processes across a wide spectrum of industries. When it comes to the pharmaceutical and biotechnology sectors, numerous tools enabled by advancement of computer science have been developed and are now routinely utilized. However, there are many aspects of the drug discovery process, which can further benefit from refinement of computational methods and tools, as well as improvement of accessibility of these new technologies. In this review, examples of recent developments in machine learning application are described, which have the potential to impact different parts of the drug discovery and development flow scheme. Notably, new deep learning-based approaches across compound design and synthesis, prediction of binding, activity and ADMET properties, as well as applications of genetic algorithms are highlighted.
Collapse
Affiliation(s)
- Jane Panteleev
- Amgen Discovery Research, 360 Binney St., Cambridge, MA 02141, USA
| | - Hua Gao
- Amgen Discovery Research, 360 Binney St., Cambridge, MA 02141, USA
| | - Lei Jia
- Amgen Discovery Research, One Amgen Center Dr., Thousand Oaks, CA 91320, USA.
| |
Collapse
|
41
|
Segler MHS, Kogej T, Tyrchan C, Waller MP. Generating Focused Molecule Libraries for Drug Discovery with Recurrent Neural Networks. ACS CENTRAL SCIENCE 2018; 4:120-131. [PMID: 29392184 PMCID: PMC5785775 DOI: 10.1021/acscentsci.7b00512] [Citation(s) in RCA: 653] [Impact Index Per Article: 108.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/24/2017] [Indexed: 05/20/2023]
Abstract
In de novo drug design, computational strategies are used to generate novel molecules with good affinity to the desired biological target. In this work, we show that recurrent neural networks can be trained as generative models for molecular structures, similar to statistical language models in natural language processing. We demonstrate that the properties of the generated molecules correlate very well with the properties of the molecules used to train the model. In order to enrich libraries with molecules active toward a given biological target, we propose to fine-tune the model with small sets of molecules, which are known to be active against that target. Against Staphylococcus aureus, the model reproduced 14% of 6051 hold-out test molecules that medicinal chemists designed, whereas against Plasmodium falciparum (Malaria), it reproduced 28% of 1240 test molecules. When coupled with a scoring function, our model can perform the complete de novo drug design cycle to generate large sets of novel molecules for drug discovery.
Collapse
Affiliation(s)
- Marwin H. S. Segler
- Institute of Organic
Chemistry & Center for Multiscale Theory and Computation, Westfälische Wilhelms-Universität Münster, 48149 Münster, Germany
| | - Thierry Kogej
- Hit Discovery, Discovery Sciences, AstraZeneca R&D, Gothenburg, Sweden
| | - Christian Tyrchan
- Department of Medicinal
Chemistry, IMED RIA, AstraZeneca R&D, Gothenburg, Sweden
| | - Mark P. Waller
- Department of Physics & International Centre for Quantum and
Molecular Structures, Shanghai University, Shanghai, China
| |
Collapse
|
42
|
Abstract
The term drug design describes the search of novel compounds with biological activity, on a systematic basis. In its most common form, it involves modification of a known active scaffold or linking known active scaffolds, although de novo drug design (i.e., from scratch) is also possible. Though highly interrelated, identification of active scaffolds should be conceptually separated from drug design. Traditionally, the drug design process has focused on the molecular determinants of the interactions between the drug and its known or intended molecular target. Nevertheless, current drug design also takes into consideration other relevant processes than influence drug efficacy and safety (e.g., bioavailability, metabolic stability, interaction with antitargets).This chapter provides an overview on possible approaches to identify active scaffolds (including in silico approximations to approach that task) and computational methods to guide the subsequent optimization process. It also discusses in which situations each of the overviewed techniques is more appropriate.
Collapse
Affiliation(s)
- Alan Talevi
- Laboratorio de Investigación y Desarrollo de Bioactivos (LIDeB), Faculty of Exact Sciences, National University of La Plata (UNLP), Buenos Aires, Argentina.
- Argentinean National Council of Scientific and Technical Research (CONICET), Buenos Aires, Argentina.
| |
Collapse
|
43
|
Blaschke T, Olivecrona M, Engkvist O, Bajorath J, Chen H. Application of Generative Autoencoder in De Novo Molecular Design. Mol Inform 2018; 37:1700123. [PMID: 29235269 PMCID: PMC5836887 DOI: 10.1002/minf.201700123] [Citation(s) in RCA: 201] [Impact Index Per Article: 33.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2017] [Accepted: 11/19/2017] [Indexed: 11/16/2022]
Abstract
A major challenge in computational chemistry is the generation of novel molecular structures with desirable pharmacological and physiochemical properties. In this work, we investigate the potential use of autoencoder, a deep learning methodology, for de novo molecular design. Various generative autoencoders were used to map molecule structures into a continuous latent space and vice versa and their performance as structure generator was assessed. Our results show that the latent space preserves chemical similarity principle and thus can be used for the generation of analogue structures. Furthermore, the latent space created by autoencoders were searched systematically to generate novel compounds with predicted activity against dopamine receptor type 2 and compounds similar to known active compounds not included in the trainings set were identified.
Collapse
Affiliation(s)
- Thomas Blaschke
- Hit Discovery, Discovery Sciences, Innovative Medicines and Early Development Biotech UnitAstraZeneca R&D Gothenburg431 83MölndalSweden
- University of Bonn, Bonn Aachen International Center for Information Technology BIT, Life Science InformaticsDahlmannstrasse 253113BonnGermany
| | - Marcus Olivecrona
- Hit Discovery, Discovery Sciences, Innovative Medicines and Early Development Biotech UnitAstraZeneca R&D Gothenburg431 83MölndalSweden
| | - Ola Engkvist
- Hit Discovery, Discovery Sciences, Innovative Medicines and Early Development Biotech UnitAstraZeneca R&D Gothenburg431 83MölndalSweden
| | - Jürgen Bajorath
- University of Bonn, Bonn Aachen International Center for Information Technology BIT, Life Science InformaticsDahlmannstrasse 253113BonnGermany
| | - Hongming Chen
- Hit Discovery, Discovery Sciences, Innovative Medicines and Early Development Biotech UnitAstraZeneca R&D Gothenburg431 83MölndalSweden
| |
Collapse
|
44
|
Abstract
Small-molecule drug discovery can be viewed as a challenging multidimensional problem in which various characteristics of compounds - including efficacy, pharmacokinetics and safety - need to be optimized in parallel to provide drug candidates. Recent advances in areas such as microfluidics-assisted chemical synthesis and biological testing, as well as artificial intelligence systems that improve a design hypothesis through feedback analysis, are now providing a basis for the introduction of greater automation into aspects of this process. This could potentially accelerate time frames for compound discovery and optimization and enable more effective searches of chemical space. However, such approaches also raise considerable conceptual, technical and organizational challenges, as well as scepticism about the current hype around them. This article aims to identify the approaches and technologies that could be implemented robustly by medicinal chemists in the near future and to critically analyse the opportunities and challenges for their more widespread application.
Collapse
|
45
|
Bellera CL, Di Ianni ME, Talevi A. The application of molecular topology for ulcerative colitis drug discovery. Expert Opin Drug Discov 2017; 13:89-101. [PMID: 29088918 DOI: 10.1080/17460441.2018.1396314] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
INTRODUCTION Although the therapeutic arsenal against ulcerative colitis has greatly expanded (including the revolutionary advent of biologics), there remain patients who are refractory to current medications while the safety of the available therapeutics could also be improved. Molecular topology provides a theoretic framework for the discovery of new therapeutic agents in a very efficient manner, and its applications in the field of ulcerative colitis have slowly begun to flourish. Areas covered: After discussing the basics of molecular topology, the authors review QSAR models focusing on validated targets for the treatment of ulcerative colitis, entirely or partially based on topological descriptors. Expert opinion: The application of molecular topology to ulcerative colitis drug discovery is still very limited, and many of the existing reports seem to be strictly theoretic, with no experimental validation or practical applications. Interestingly, mechanism-independent models based on phenotypic responses have recently been reported. Such models are in agreement with the recent interest raised by network pharmacology as a potential solution for complex disorders. These and other similar studies applying molecular topology suggest that some therapeutic categories may present a 'topological pattern' that goes beyond a specific mechanism of action.
Collapse
Affiliation(s)
- Carolina L Bellera
- a Medicinal Chemistry/Laboratory of Bioactive Research and Development, Department of Biological Sciences, Faculty of Exact Sciences , University of La Plata (UNLP) , La Plata , Buenos Aires , Argentina
| | - Mauricio E Di Ianni
- a Medicinal Chemistry/Laboratory of Bioactive Research and Development, Department of Biological Sciences, Faculty of Exact Sciences , University of La Plata (UNLP) , La Plata , Buenos Aires , Argentina
| | - Alan Talevi
- a Medicinal Chemistry/Laboratory of Bioactive Research and Development, Department of Biological Sciences, Faculty of Exact Sciences , University of La Plata (UNLP) , La Plata , Buenos Aires , Argentina
| |
Collapse
|
46
|
Molecular de-novo design through deep reinforcement learning. J Cheminform 2017; 9:48. [PMID: 29086083 PMCID: PMC5583141 DOI: 10.1186/s13321-017-0235-x] [Citation(s) in RCA: 487] [Impact Index Per Article: 69.6] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2017] [Accepted: 08/23/2017] [Indexed: 01/15/2023] Open
Abstract
This work introduces a method to tune a sequence-based generative model for molecular de novo design that through augmented episodic likelihood can learn to generate structures with certain specified desirable properties. We demonstrate how this model can execute a range of tasks such as generating analogues to a query structure and generating compounds predicted to be active against a biological target. As a proof of principle, the model is first trained to generate molecules that do not contain sulphur. As a second example, the model is trained to generate analogues to the drug Celecoxib, a technique that could be used for scaffold hopping or library expansion starting from a single molecule. Finally, when tuning the model towards generating compounds predicted to be active against the dopamine receptor type 2, the model generates structures of which more than 95% are predicted to be active, including experimentally confirmed actives that have not been included in either the generative model nor the activity prediction model.. ![]()
Collapse
|
47
|
Abstract
Inverse quantitative structure-activity relationship (QSAR) modeling encompasses the generation of compound structures from values of descriptors corresponding to high activity predicted with a given QSAR model. Structure generation proceeds from descriptor coordinates optimized for activity prediction. Herein, we concentrate on the first phase of the inverse QSAR process and introduce a new methodology for coordinate optimization, termed differential evolution (DE), that originated from computer science and engineering. Using simulation and compound activity data, we demonstrate that DE in combination with support vector regression (SVR) yields effective and robust predictions of optimized coordinates satisfying model constraints and requirements. For different compound activity classes, optimized coordinates are obtained that exclusively map to regions of high activity in feature space, represent novel positions for structure generation, and are chemically meaningful.
Collapse
Affiliation(s)
- Tomoyuki Miyao
- Department of Chemical System Engineering, School of Engineering, The University of Tokyo, Tokyo, 113-8656, Japan.,Department of Life Science Informatics, B-IT, LIMES Program Unit Chemical Biology and Medicinal Chemistry, Rheinische Friedrich-Wilhelms-Universität, Bonn, D-53113, Germany
| | - Kimito Funatsu
- Department of Chemical System Engineering, School of Engineering, The University of Tokyo, Tokyo, 113-8656, Japan
| | - Jürgen Bajorath
- Department of Life Science Informatics, B-IT, LIMES Program Unit Chemical Biology and Medicinal Chemistry, Rheinische Friedrich-Wilhelms-Universität, Bonn, D-53113, Germany
| |
Collapse
|
48
|
Dearden JC. The History and Development of Quantitative Structure-Activity Relationships (QSARs). ACTA ACUST UNITED AC 2017. [DOI: 10.4018/ijqspr.2017070104] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Following the publication of the history and development of QSAR, it became apparent that a number of matters had not been covered. This addendum is an attempt to rectify that. A very early approach (ca. 60 B.C.) by Lucretius shows that he understood how molecular size and complexity affect liquid viscosity. Comments by Kant (1724-1804) emphasized the necessity of mathematics in science. A claim that the work of von Bibra and Harless in 1847 pre-dated that of Overton and H.H. Meyer is shown not to be correct. K.H. Meyer and Gottlieb-Billroth published in 1920 what is probably the first QSAR equation. Brown, who with his co-author Fraser is credited with the first definitive recognition in 1868-9 that biological activity is a function of molecular structure, is often cited as Crum Brown; in fact, Crum was his second given name. The QSAR work of the Soviet chemist N.V. Lazarev in the 1940s was far ahead of his time, showing numerous correlations of biological activities and physicochemical properties with molecular descriptors. The subject of inverse QSAR is discussed.
Collapse
Affiliation(s)
- John C. Dearden
- School of Pharmacy & Biomolecular Sciences, Liverpool John Moores University, Liverpool, UK
| |
Collapse
|
49
|
Miyao T, Funatsu K. Finding Chemical Structures Corresponding to a Set of Coordinates in Chemical Descriptor Space. Mol Inform 2017; 36. [DOI: 10.1002/minf.201700030] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2017] [Accepted: 04/04/2017] [Indexed: 11/10/2022]
Affiliation(s)
- Tomoyuki Miyao
- Department of Chemical System Engineering; The University of Tokyo; 7-3-1 Hongo, Bunkyo-ku Tokyo 113-8656 Japan
| | - Kimito Funatsu
- Department of Chemical System Engineering; The University of Tokyo; 7-3-1 Hongo, Bunkyo-ku Tokyo 113-8656 Japan
| |
Collapse
|
50
|
Bayesian molecular design with a chemical language model. J Comput Aided Mol Des 2017; 31:379-391. [PMID: 28281211 PMCID: PMC5393296 DOI: 10.1007/s10822-016-0008-z] [Citation(s) in RCA: 55] [Impact Index Per Article: 7.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2016] [Accepted: 12/31/2016] [Indexed: 11/05/2022]
Abstract
The aim of computational molecular design is the identification of promising hypothetical molecules with a predefined set of desired properties. We address the issue of accelerating the material discovery with state-of-the-art machine learning techniques. The method involves two different types of prediction; the forward and backward predictions. The objective of the forward prediction is to create a set of machine learning models on various properties of a given molecule. Inverting the trained forward models through Bayes’ law, we derive a posterior distribution for the backward prediction, which is conditioned by a desired property requirement. Exploring high-probability regions of the posterior with a sequential Monte Carlo technique, molecules that exhibit the desired properties can computationally be created. One major difficulty in the computational creation of molecules is the exclusion of the occurrence of chemically unfavorable structures. To circumvent this issue, we derive a chemical language model that acquires commonly occurring patterns of chemical fragments through natural language processing of ASCII strings of existing compounds, which follow the SMILES chemical language notation. In the backward prediction, the trained language model is used to refine chemical strings such that the properties of the resulting structures fall within the desired property region while chemically unfavorable structures are successfully removed. The present method is demonstrated through the design of small organic molecules with the property requirements on HOMO-LUMO gap and internal energy. The R package iqspr is available at the CRAN repository.
Collapse
|