151
|
Liu X, IJzerman AP, van Westen GJP. Computational Approaches for De Novo Drug Design: Past, Present, and Future. Methods Mol Biol 2021; 2190:139-165. [PMID: 32804364 DOI: 10.1007/978-1-0716-0826-5_6] [Citation(s) in RCA: 24] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
Drug discovery is time- and resource-consuming. To this end, computational approaches that are applied in de novo drug design play an important role to improve the efficiency and decrease costs to develop novel drugs. Over several decades, a variety of methods have been proposed and applied in practice. Traditionally, drug design problems are always taken as combinational optimization in discrete chemical space. Hence optimization methods were exploited to search for new drug molecules to meet multiple objectives. With the accumulation of data and the development of machine learning methods, computational drug design methods have gradually shifted to a new paradigm. There has been particular interest in the potential application of deep learning methods to drug design. In this chapter, we will give a brief description of these two different de novo methods, compare their application scopes and discuss their possible development in the future.
Collapse
Affiliation(s)
- Xuhan Liu
- Drug Discovery and Safety, Leiden Academic Centre for Drug Research, Leiden, The Netherlands
| | - Adriaan P IJzerman
- Drug Discovery and Safety, Leiden Academic Centre for Drug Research, Leiden, The Netherlands
| | - Gerard J P van Westen
- Drug Discovery and Safety, Leiden Academic Centre for Drug Research, Leiden, The Netherlands.
| |
Collapse
|
152
|
Yu K, Visweswaran S, Batmanghelich K. Semi-supervised Hierarchical Drug Embedding in Hyperbolic Space. J Chem Inf Model 2020; 60:5647-5657. [PMID: 33140969 PMCID: PMC7943198 DOI: 10.1021/acs.jcim.0c00681] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023]
Abstract
Learning accurate drug representations is essential for tasks such as computational drug repositioning and prediction of drug side effects. A drug hierarchy is a valuable source that encodes knowledge of relations among drugs in a tree-like structure where drugs that act on the same organs, treat the same disease, or bind to the same biological target are grouped together. However, its utility in learning drug representations has not yet been explored, and currently described drug representations cannot place novel molecules in a drug hierarchy. Here, we develop a semi-supervised drug embedding that incorporates two sources of information: (1) underlying chemical grammar that is inferred from chemical structures of drugs and drug-like molecules (unsupervised) and (2) hierarchical relations that are encoded in an expert-crafted hierarchy of approved drugs (supervised). We use the Variational Auto-Encoder (VAE) framework to encode the chemical structures of molecules and use the drug-drug similarity information obtained from the hierarchy to induce the clustering of drugs in hyperbolic space. The hyperbolic space is amenable for encoding hierarchical relations. Both quantitative and qualitative results support that the learned drug embedding can accurately reproduce the chemical structure and recapitulate the hierarchical relations among drugs. Furthermore, our approach can infer the pharmacological properties of novel molecules by retrieving similar drugs from the embedding space. We demonstrate that our drug embedding can predict new uses and discover new side effects of existing drugs. We show that it significantly outperforms comparison methods in both tasks.
Collapse
Affiliation(s)
- Ke Yu
- Intelligent Systems Program, School of Computing and Information, University of Pittsburgh, Pittsburgh, Pennsylvania 15206, United States
| | - Shyam Visweswaran
- Intelligent Systems Program, School of Computing and Information, University of Pittsburgh, Pittsburgh, Pennsylvania 15206, United States
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, Pennsylvania 15206, United States
| | - Kayhan Batmanghelich
- Intelligent Systems Program, School of Computing and Information, University of Pittsburgh, Pittsburgh, Pennsylvania 15206, United States
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, Pennsylvania 15206, United States
| |
Collapse
|
153
|
Coley CW, Eyke NS, Jensen KF. Autonome Entdeckung in den chemischen Wissenschaften, Teil II: Ausblick. Angew Chem Int Ed Engl 2020. [DOI: 10.1002/ange.201909989] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
Affiliation(s)
- Connor W. Coley
- Department of Chemical Engineering Massachusetts Institute of Technology Cambridge MA 02139 USA
| | - Natalie S. Eyke
- Department of Chemical Engineering Massachusetts Institute of Technology Cambridge MA 02139 USA
| | - Klavs F. Jensen
- Department of Chemical Engineering Massachusetts Institute of Technology Cambridge MA 02139 USA
| |
Collapse
|
154
|
Gao H, Pauphilet J, Struble TJ, Coley CW, Jensen KF. Direct Optimization across Computer-Generated Reaction Networks Balances Materials Use and Feasibility of Synthesis Plans for Molecule Libraries. J Chem Inf Model 2020; 61:493-504. [PMID: 33331158 DOI: 10.1021/acs.jcim.0c01032] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
The synthesis of thousands of candidate compounds in drug discovery and development offers opportunities for computer-aided synthesis planning to simplify the synthesis of molecule libraries by leveraging common starting materials and reaction conditions. We develop an optimization-based method to analyze large organic chemical reaction networks and design overlapping synthesis plans for entire molecule libraries so as to minimize the overall number of unique chemical compounds needed as either starting materials or reaction conditions. We consider multiple objectives, including the number of starting materials, the number of catalysts/solvents/reagents, and the likelihood of success of the overall syntheses plan, to select an optimal reaction network to access the target molecules. The library synthesis planning task was formulated as a network flow optimization problem, and we design an efficient decomposition scheme that reduces solution time by a factor of 5 and scales to instance with 48 target molecules and nearly 8000 intermediate reactions within hours. In four case studies of pharmaceutical compounds, the approach reduces the number of starting materials and catalysts/solvents/reagents needed by 32.2 and 66.0% on average and up to 63.2 and 80.0% in the best cases. The code implementation can be found at https://github.com/Coughy1991/Molecule_library_synthesis.
Collapse
Affiliation(s)
- Hanyu Gao
- Department of Chemical Engineering, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, Massachusetts 02139, United States
| | - Jean Pauphilet
- London Business School, Regent's Park, London NW1 4SA, U.K
| | - Thomas J Struble
- Department of Chemical Engineering, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, Massachusetts 02139, United States
| | - Connor W Coley
- Department of Chemical Engineering, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, Massachusetts 02139, United States
| | - Klavs F Jensen
- Department of Chemical Engineering, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, Massachusetts 02139, United States
| |
Collapse
|
155
|
Kell DB, Samanta S, Swainston N. Deep learning and generative methods in cheminformatics and chemical biology: navigating small molecule space intelligently. Biochem J 2020; 477:4559-4580. [PMID: 33290527 PMCID: PMC7733676 DOI: 10.1042/bcj20200781] [Citation(s) in RCA: 22] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2020] [Revised: 11/11/2020] [Accepted: 11/12/2020] [Indexed: 12/15/2022]
Abstract
The number of 'small' molecules that may be of interest to chemical biologists - chemical space - is enormous, but the fraction that have ever been made is tiny. Most strategies are discriminative, i.e. have involved 'forward' problems (have molecule, establish properties). However, we normally wish to solve the much harder generative or inverse problem (describe desired properties, find molecule). 'Deep' (machine) learning based on large-scale neural networks underpins technologies such as computer vision, natural language processing, driverless cars, and world-leading performance in games such as Go; it can also be applied to the solution of inverse problems in chemical biology. In particular, recent developments in deep learning admit the in silico generation of candidate molecular structures and the prediction of their properties, thereby allowing one to navigate (bio)chemical space intelligently. These methods are revolutionary but require an understanding of both (bio)chemistry and computer science to be exploited to best advantage. We give a high-level (non-mathematical) background to the deep learning revolution, and set out the crucial issue for chemical biology and informatics as a two-way mapping from the discrete nature of individual molecules to the continuous but high-dimensional latent representation that may best reflect chemical space. A variety of architectures can do this; we focus on a particular type known as variational autoencoders. We then provide some examples of recent successes of these kinds of approach, and a look towards the future.
Collapse
Affiliation(s)
- Douglas B. Kell
- Department of Biochemistry and Systems Biology, Institute of Systems, Molecular and Integrative Biology, Faculty of Health and Life Sciences, University of Liverpool, Crown St, Liverpool L69 7ZB, U.K
- Novo Nordisk Foundation Centre for Biosustainability, Technical University of Denmark, Building 220, Kemitorvet, 2800 Kgs. Lyngby, Denmark
| | - Soumitra Samanta
- Department of Biochemistry and Systems Biology, Institute of Systems, Molecular and Integrative Biology, Faculty of Health and Life Sciences, University of Liverpool, Crown St, Liverpool L69 7ZB, U.K
| | - Neil Swainston
- Department of Biochemistry and Systems Biology, Institute of Systems, Molecular and Integrative Biology, Faculty of Health and Life Sciences, University of Liverpool, Crown St, Liverpool L69 7ZB, U.K
| |
Collapse
|
156
|
Langevin M, Minoux H, Levesque M, Bianciotto M. Scaffold-Constrained Molecular Generation. J Chem Inf Model 2020; 60:5637-5646. [PMID: 33301333 DOI: 10.1021/acs.jcim.0c01015] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]
Abstract
One of the major applications of generative models for drug discovery targets the lead-optimization phase. During the optimization of a lead series, it is common to have scaffold constraints imposed on the structure of the molecules designed. Without enforcing such constraints, the probability of generating molecules with the required scaffold is extremely low and hinders the practicality of generative models for de novo drug design. To tackle this issue, we introduce a new algorithm, named SAMOA (Scaffold Constrained Molecular Generation), to perform scaffold-constrained in silico molecular design. We build on the well-known SMILES-based Recurrent Neural Network (RNN) generative model, with a modified sampling procedure to achieve scaffold-constrained generation. We directly benefit from the associated reinforcement learning methods, allowing to design molecules optimized for different properties while exploring only the relevant chemical space. We showcase the method's ability to perform scaffold-constrained generation on various tasks: designing novel molecules around scaffolds extracted from SureChEMBL chemical series, generating novel active molecules on the Dopamine Receptor D2 (DRD2) target, and finally, designing predicted actives on the MMP-12 series, an industrial lead-optimization project.
Collapse
Affiliation(s)
- Maxime Langevin
- PASTEUR, Département de chimie, École Normale Supérieure, PSL University, Sorbonne Université, CNRS, 75005 Paris, France.,Molecular Design Sciences - Integrated Drug Discovery, Sanofi R&D, 94400 Vitry-sur-Seine, France
| | - Hervé Minoux
- Molecular Design Sciences - Integrated Drug Discovery, Sanofi R&D, 94400 Vitry-sur-Seine, France
| | - Maximilien Levesque
- PASTEUR, Département de chimie, École Normale Supérieure, PSL University, Sorbonne Université, CNRS, 75005 Paris, France.,Aqemia, 75001 Paris, France
| | - Marc Bianciotto
- Molecular Design Sciences - Integrated Drug Discovery, Sanofi R&D, 94400 Vitry-sur-Seine, France
| |
Collapse
|
157
|
Elbadawi M, Gaisford S, Basit AW. Advanced machine-learning techniques in drug discovery. Drug Discov Today 2020; 26:769-777. [PMID: 33290820 DOI: 10.1016/j.drudis.2020.12.003] [Citation(s) in RCA: 50] [Impact Index Per Article: 12.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2020] [Revised: 11/16/2020] [Accepted: 12/02/2020] [Indexed: 01/20/2023]
Abstract
The popularity of machine learning (ML) across drug discovery continues to grow, yielding impressive results. As their use increases, so do their limitations become apparent. Such limitations include their need for big data, sparsity in data, and their lack of interpretability. It has also become apparent that the techniques are not truly autonomous, requiring retraining even post deployment. In this review, we detail the use of advanced techniques to circumvent these challenges, with examples drawn from drug discovery and allied disciplines. In addition, we present emerging techniques and their potential role in drug discovery. The techniques presented herein are anticipated to expand the applicability of ML in drug discovery.
Collapse
Affiliation(s)
- Moe Elbadawi
- Department of Pharmaceutics, UCL School of Pharmacy, University College London, 29-39 Brunswick Square, London, WC1N 1AX, UK
| | - Simon Gaisford
- Department of Pharmaceutics, UCL School of Pharmacy, University College London, 29-39 Brunswick Square, London, WC1N 1AX, UK; FabRx Ltd, 3 Romney Road, Ashford, TN24 0RW, UK
| | - Abdul W Basit
- Department of Pharmaceutics, UCL School of Pharmacy, University College London, 29-39 Brunswick Square, London, WC1N 1AX, UK; FabRx Ltd, 3 Romney Road, Ashford, TN24 0RW, UK.
| |
Collapse
|
158
|
Iovanac NC, Savoie BM. Improving the generative performance of chemical autoencoders through transfer learning. MACHINE LEARNING: SCIENCE AND TECHNOLOGY 2020. [DOI: 10.1088/2632-2153/abae75] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Abstract
Generative models are a sub-class of machine learning models that are capable of generating new samples with a target set of properties. In chemical and materials applications, these new samples might be drug targets, novel semiconductors, or catalysts constrained to exhibit an application-specific set of properties. Given their potential to yield high-value targets from otherwise intractable design spaces, generative models are currently under intense study with respect to how predictions can be improved through changes in model architecture and data representation. Here we explore the potential of multi-task transfer learning as a complementary approach to improving the validity and property specificity of molecules generated by such models. We have compared baseline generative models trained on a single property prediction task against models trained on additional ancillary prediction tasks and observe a generic positive impact on the validity and specificity of the multi-task models. In particular, we observe that the validity of generated structures is strongly affected by whether or not the models have chemical property data, as opposed to only syntactic structural data, supplied during learning. We demonstrate this effect in both interpolative and extrapolative scenarios (i.e., where the generative targets are poorly represented in training data) for models trained to generate high energy structures and models trained to generated structures with targeted bandgaps within certain ranges. In both instances, the inclusion of additional chemical property data improves the ability of models to generate valid, unique structures with increased property specificity. This approach requires only minor alterations to existing generative models, in many cases leveraging prediction frameworks already native to these models. Additionally, the transfer learning strategy is complementary to ongoing efforts to improve model architectures and data representation and can foreseeably be stacked on top of these developments.
Collapse
|
159
|
Siramshetty VB, Shah P, Kerns E, Nguyen K, Yu KR, Kabir M, Williams J, Neyra J, Southall N, Nguyễn ÐT, Xu X. Retrospective assessment of rat liver microsomal stability at NCATS: data and QSAR models. Sci Rep 2020; 10:20713. [PMID: 33244000 PMCID: PMC7693334 DOI: 10.1038/s41598-020-77327-0] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2020] [Accepted: 11/04/2020] [Indexed: 11/09/2022] Open
Abstract
Hepatic metabolic stability is a key pharmacokinetic parameter in drug discovery. Metabolic stability is usually assessed in microsomal fractions and only the best compounds progress in the drug discovery process. A high-throughput single time point substrate depletion assay in rat liver microsomes (RLM) is employed at the National Center for Advancing Translational Sciences. Between 2012 and 2020, RLM stability data was generated for ~ 24,000 compounds from more than 250 projects that cover a wide range of pharmacological targets and cellular pathways. Although a crucial endpoint, little or no data exists in the public domain. In this study, computational models were developed for predicting RLM stability using different machine learning methods. In addition, a retrospective time-split validation was performed, and local models were built for projects that performed poorly with global models. Further analysis revealed inherent medicinal chemistry knowledge potentially useful to chemists in the pursuit of synthesizing metabolically stable compounds. In addition, we deposited experimental data for ~ 2500 compounds in the PubChem bioassay database (AID: 1508591). The global prediction models are made publicly accessible ( https://opendata.ncats.nih.gov/adme ). This is to the best of our knowledge, the first publicly available RLM prediction model built using high-quality data generated at a single laboratory.
Collapse
Affiliation(s)
- Vishal B Siramshetty
- National Center for Advancing Translational Sciences (NCATS), 9800 Medical Center Drive, Rockville, MD, 20850, USA
| | - Pranav Shah
- National Center for Advancing Translational Sciences (NCATS), 9800 Medical Center Drive, Rockville, MD, 20850, USA
| | - Edward Kerns
- National Center for Advancing Translational Sciences (NCATS), 9800 Medical Center Drive, Rockville, MD, 20850, USA
| | - Kimloan Nguyen
- National Center for Advancing Translational Sciences (NCATS), 9800 Medical Center Drive, Rockville, MD, 20850, USA.,NY State Public Health, DOHMH 42-09 28th St, Long Island City, NY, 11101, USA
| | - Kyeong Ri Yu
- National Center for Advancing Translational Sciences (NCATS), 9800 Medical Center Drive, Rockville, MD, 20850, USA.,School of Medicine, Virginia Commonwealth University, 1201 E Marshall St, Richmond, VA, 23298, USA
| | - Md Kabir
- National Center for Advancing Translational Sciences (NCATS), 9800 Medical Center Drive, Rockville, MD, 20850, USA.,The Graduate School of Biomedical Sciences, Icahn School of Medicine at Mount Sinai, 1 Gustave L. Levy Place, New York, 10029, USA
| | - Jordan Williams
- National Center for Advancing Translational Sciences (NCATS), 9800 Medical Center Drive, Rockville, MD, 20850, USA
| | - Jorge Neyra
- National Center for Advancing Translational Sciences (NCATS), 9800 Medical Center Drive, Rockville, MD, 20850, USA
| | - Noel Southall
- National Center for Advancing Translational Sciences (NCATS), 9800 Medical Center Drive, Rockville, MD, 20850, USA
| | - Ðắc-Trung Nguyễn
- National Center for Advancing Translational Sciences (NCATS), 9800 Medical Center Drive, Rockville, MD, 20850, USA
| | - Xin Xu
- National Center for Advancing Translational Sciences (NCATS), 9800 Medical Center Drive, Rockville, MD, 20850, USA.
| |
Collapse
|
160
|
Patel L, Shukla T, Huang X, Ussery DW, Wang S. Machine Learning Methods in Drug Discovery. Molecules 2020; 25:E5277. [PMID: 33198233 PMCID: PMC7696134 DOI: 10.3390/molecules25225277] [Citation(s) in RCA: 127] [Impact Index Per Article: 31.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2020] [Revised: 11/04/2020] [Accepted: 11/09/2020] [Indexed: 12/30/2022] Open
Abstract
The advancements of information technology and related processing techniques have created a fertile base for progress in many scientific fields and industries. In the fields of drug discovery and development, machine learning techniques have been used for the development of novel drug candidates. The methods for designing drug targets and novel drug discovery now routinely combine machine learning and deep learning algorithms to enhance the efficiency, efficacy, and quality of developed outputs. The generation and incorporation of big data, through technologies such as high-throughput screening and high through-put computational analysis of databases used for both lead and target discovery, has increased the reliability of the machine learning and deep learning incorporated techniques. The use of these virtual screening and encompassing online information has also been highlighted in developing lead synthesis pathways. In this review, machine learning and deep learning algorithms utilized in drug discovery and associated techniques will be discussed. The applications that produce promising results and methods will be reviewed.
Collapse
Affiliation(s)
- Lauv Patel
- Chemistry Department, University of Arkansas at Little Rock, Little Rock, AR 72204, USA; (L.P.); (T.S.)
| | - Tripti Shukla
- Chemistry Department, University of Arkansas at Little Rock, Little Rock, AR 72204, USA; (L.P.); (T.S.)
| | - Xiuzhen Huang
- Department of Computer Science, Arkansas State University, Jonesboro, AR 72467, USA;
| | - David W. Ussery
- Department of Biomedical Informatics, University of Arkansas for Medical Sciences, Little Rock, AR 72205, USA;
| | - Shanzhi Wang
- Chemistry Department, University of Arkansas at Little Rock, Little Rock, AR 72204, USA; (L.P.); (T.S.)
| |
Collapse
|
161
|
Blaschke T, Engkvist O, Bajorath J, Chen H. Memory-assisted reinforcement learning for diverse molecular de novo design. J Cheminform 2020; 12:68. [PMID: 33292554 PMCID: PMC7654024 DOI: 10.1186/s13321-020-00473-0] [Citation(s) in RCA: 50] [Impact Index Per Article: 12.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2020] [Accepted: 10/29/2020] [Indexed: 12/23/2022] Open
Abstract
In de novo molecular design, recurrent neural networks (RNN) have been shown to be effective methods for sampling and generating novel chemical structures. Using a technique called reinforcement learning (RL), an RNN can be tuned to target a particular section of chemical space with optimized desirable properties using a scoring function. However, ligands generated by current RL methods so far tend to have relatively low diversity, and sometimes even result in duplicate structures when optimizing towards desired properties. Here, we propose a new method to address the low diversity issue in RL for molecular design. Memory-assisted RL is an extension of the known RL, with the introduction of a so-called memory unit. As proof of concept, we applied our method to generate structures with a desired AlogP value. In a second case study, we applied our method to design ligands for the dopamine type 2 receptor and the 5-hydroxytryptamine type 1A receptor. For both receptors, a machine learning model was developed to predict whether generated molecules were active or not for the receptor. In both case studies, it was found that memory-assisted RL led to the generation of more compounds predicted to be active having higher chemical diversity, thus achieving better coverage of chemical space of known ligands compared to established RL methods.
Collapse
Affiliation(s)
- Thomas Blaschke
- Hit Discovery, Discovery Sciences, R&D, AstraZeneca Gothenburg, Mölndal, Sweden
| | - Ola Engkvist
- Hit Discovery, Discovery Sciences, R&D, AstraZeneca Gothenburg, Mölndal, Sweden
| | - Jürgen Bajorath
- Department of Life Science Informatics, LIMES Program Unit Chemical Biology and Medicinal Chemistry B-IT, Rheinische Friedrich-Wilhelms-Universität, Endenicher Allee 19c, Bonn, 53115, Germany
| | - Hongming Chen
- Centre of Chemistry and Chemical Biology, Guangzhou Regenerative Medicine and Health-Guangdong Laboratory, Science Park, Guangzhou, China.
| |
Collapse
|
162
|
Mangiatordi GF, Intranuovo F, Delre P, Abatematteo FS, Abate C, Niso M, Creanza TM, Ancona N, Stefanachi A, Contino M. Cannabinoid Receptor Subtype 2 (CB2R) in a Multitarget Approach: Perspective of an Innovative Strategy in Cancer and Neurodegeneration. J Med Chem 2020; 63:14448-14469. [PMID: 33094613 DOI: 10.1021/acs.jmedchem.0c01357] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
The cannabinoid receptor subtype 2 (CB2R) represents an interesting and new therapeutic target for its involvement in the first steps of neurodegeneration as well as in cancer onset and progression. Several studies, focused on different types of tumors, report a promising anticancer activity induced by CB2R agonists due to their ability to reduce inflammation and cell proliferation. Moreover, in neuroinflammation, the stimulation of CB2R, overexpressed in microglial cells, exerts beneficial effects in neurodegenerative disorders. With the aim to overcome current treatment limitations, new drugs can be developed by specifically modulating, together with CB2R, other targets involved in such multifactorial disorders. Building on successful case studies of already developed multitarget strategies involving CB2R, in this Perspective we aim at prompting the scientific community to consider new promising target associations involving HDACs (histone deacetylases) and σ receptors by employing modern approaches based on molecular hybridization, computational polypharmacology, and machine learning algorithms.
Collapse
Affiliation(s)
| | - Francesca Intranuovo
- Dipartimento di Farmacia-Scienze del Farmaco, Università degli Studi di Bari Aldo Moro, Via Orabona 4, 70125 Bari, Italy
| | - Pietro Delre
- CNR-Institute of Crystallography, Via Amendola 122/o, 70126 Bari, Italy.,Dipartimento di Chimica, Università degli Studi di Bari Aldo Moro, 70125 Bari, Italy
| | - Francesca Serena Abatematteo
- Dipartimento di Farmacia-Scienze del Farmaco, Università degli Studi di Bari Aldo Moro, Via Orabona 4, 70125 Bari, Italy
| | - Carmen Abate
- Dipartimento di Farmacia-Scienze del Farmaco, Università degli Studi di Bari Aldo Moro, Via Orabona 4, 70125 Bari, Italy
| | - Mauro Niso
- Dipartimento di Farmacia-Scienze del Farmaco, Università degli Studi di Bari Aldo Moro, Via Orabona 4, 70125 Bari, Italy
| | - Teresa Maria Creanza
- CNR-Institute of Intelligent Industrial Technologies and Systems for Advanced Manufacturing, Via Amendola 122/o, 70126 Bari, Italy
| | - Nicola Ancona
- CNR-Institute of Intelligent Industrial Technologies and Systems for Advanced Manufacturing, Via Amendola 122/o, 70126 Bari, Italy
| | - Angela Stefanachi
- Dipartimento di Farmacia-Scienze del Farmaco, Università degli Studi di Bari Aldo Moro, Via Orabona 4, 70125 Bari, Italy
| | - Marialessandra Contino
- Dipartimento di Farmacia-Scienze del Farmaco, Università degli Studi di Bari Aldo Moro, Via Orabona 4, 70125 Bari, Italy
| |
Collapse
|
163
|
Amabilino S, Bratholm LA, Bennie SJ, O’Connor MB, Glowacki DR. Training atomic neural networks using fragment-based data generated in virtual reality. J Chem Phys 2020; 153:154105. [DOI: 10.1063/5.0015950] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/31/2023] Open
Affiliation(s)
- Silvia Amabilino
- School of Chemistry, University of Bristol, Bristol BS8 1TS, United Kingdom
- Intangible Realities Laboratory, University of Bristol, Bristol BS8 1UB, United Kingdom
| | - Lars A. Bratholm
- School of Chemistry, University of Bristol, Bristol BS8 1TS, United Kingdom
- Intangible Realities Laboratory, University of Bristol, Bristol BS8 1UB, United Kingdom
| | - Simon J. Bennie
- School of Chemistry, University of Bristol, Bristol BS8 1TS, United Kingdom
- Intangible Realities Laboratory, University of Bristol, Bristol BS8 1UB, United Kingdom
| | - Michael B. O’Connor
- Intangible Realities Laboratory, University of Bristol, Bristol BS8 1UB, United Kingdom
- Department of Computer Science, University of Bristol, Bristol BS8 1UB, United Kingdom
| | - David R. Glowacki
- School of Chemistry, University of Bristol, Bristol BS8 1TS, United Kingdom
- Intangible Realities Laboratory, University of Bristol, Bristol BS8 1UB, United Kingdom
- Department of Computer Science, University of Bristol, Bristol BS8 1UB, United Kingdom
| |
Collapse
|
164
|
Domenico A, Nicola G, Daniela T, Fulvio C, Nicola A, Orazio N. De Novo Drug Design of Targeted Chemical Libraries Based on Artificial Intelligence and Pair-Based Multiobjective Optimization. J Chem Inf Model 2020; 60:4582-4593. [PMID: 32845150 DOI: 10.1021/acs.jcim.0c00517] [Citation(s) in RCA: 39] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Abstract
Artificial intelligence and multiobjective optimization represent promising solutions to bridge chemical and biological landscapes by addressing the automated de novo design of compounds as a result of a humanlike creative process. In the present study, we conceived a novel pair-based multiobjective approach implemented in an adapted SMILES generative algorithm based on recurrent neural networks for the automated de novo design of new molecules whose overall features are optimized by finding the best trade-offs among relevant physicochemical properties (MW, logP, HBA, HBD) and additional similarity-based constraints biasing specific biological targets. In this respect, we carried out the de novo design of chemical libraries targeting neuraminidase, acetylcholinesterase, and the main protease of severe acute respiratory syndrome coronavirus 2. Several quality metrics were employed to assess drug-likeness, chemical feasibility, diversity content, and validity. Molecular docking was finally carried out to better evaluate the scoring and posing of the de novo generated molecules with respect to X-ray cognate ligands of the corresponding molecular counterparts. Our results indicate that artificial intelligence and multiobjective optimization allow us to capture the latent links joining chemical and biological aspects, thus providing easy-to-use options for customizable design strategies, which are especially effective for both lead generation and lead optimization. The algorithm is freely downloadable at https://github.com/alberdom88/moo-denovo and all of the data are available as Supporting Information.
Collapse
Affiliation(s)
- Alberga Domenico
- Dipartimento di Farmacia-Scienze del Farmaco, Università degli Studi di Bari "Aldo Moro", Via E. Orabona, 4, I-70126 Bari, Italy
| | - Gambacorta Nicola
- Dipartimento di Farmacia-Scienze del Farmaco, Università degli Studi di Bari "Aldo Moro", Via E. Orabona, 4, I-70126 Bari, Italy
| | - Trisciuzzi Daniela
- Dipartimento di Farmacia-Scienze del Farmaco, Università degli Studi di Bari "Aldo Moro", Via E. Orabona, 4, I-70126 Bari, Italy.,Molecular Horizon srl, Via Montelino 32, 06084 Bettona, Italy
| | - Ciriaco Fulvio
- Dipartimento di Chimica, Università degli Studi di Bari "Aldo Moro", Via E. Orabona, 4, I-70126 Bari, Italy
| | - Amoroso Nicola
- Dipartimento di Farmacia-Scienze del Farmaco, Università degli Studi di Bari "Aldo Moro", Via E. Orabona, 4, I-70126 Bari, Italy
| | - Nicolotti Orazio
- Dipartimento di Farmacia-Scienze del Farmaco, Università degli Studi di Bari "Aldo Moro", Via E. Orabona, 4, I-70126 Bari, Italy
| |
Collapse
|
165
|
Khemchandani Y, O'Hagan S, Samanta S, Swainston N, Roberts TJ, Bollegala D, Kell DB. DeepGraphMolGen, a multi-objective, computational strategy for generating molecules with desirable properties: a graph convolution and reinforcement learning approach. J Cheminform 2020; 12:53. [PMID: 33431037 PMCID: PMC7487898 DOI: 10.1186/s13321-020-00454-3] [Citation(s) in RCA: 24] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2020] [Accepted: 08/18/2020] [Indexed: 02/03/2023] Open
Abstract
We address the problem of generating novel molecules with desired interaction properties as a multi-objective optimization problem. Interaction binding models are learned from binding data using graph convolution networks (GCNs). Since the experimentally obtained property scores are recognised as having potentially gross errors, we adopted a robust loss for the model. Combinations of these terms, including drug likeness and synthetic accessibility, are then optimized using reinforcement learning based on a graph convolution policy approach. Some of the molecules generated, while legitimate chemically, can have excellent drug-likeness scores but appear unusual. We provide an example based on the binding potency of small molecules to dopamine transporters. We extend our method successfully to use a multi-objective reward function, in this case for generating novel molecules that bind with dopamine transporters but not with those for norepinephrine. Our method should be generally applicable to the generation in silico of molecules with desirable properties.
Collapse
Affiliation(s)
- Yash Khemchandani
- Department of Biochemistry and Systems Biology, Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Crown St, Liverpool, L69 7ZB, UK
- Indian Institute of Technology Bombay, Powai, Mumbai, Maharashtra, 400 076, India
| | - Stephen O'Hagan
- Dept of Chemistry, Manchester Institute of Biotechnology, The University of Manchester, 131 Princess St, Manchester, M1 7DN, UK
| | - Soumitra Samanta
- Department of Biochemistry and Systems Biology, Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Crown St, Liverpool, L69 7ZB, UK
| | - Neil Swainston
- Department of Biochemistry and Systems Biology, Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Crown St, Liverpool, L69 7ZB, UK
| | - Timothy J Roberts
- Department of Biochemistry and Systems Biology, Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Crown St, Liverpool, L69 7ZB, UK
| | - Danushka Bollegala
- Dept of Computer Science, University of Liverpool, Ashton Building, Ashton Street, Liverpool, L69 3BX, UK
| | - Douglas B Kell
- Department of Biochemistry and Systems Biology, Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Crown St, Liverpool, L69 7ZB, UK.
- The Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, Kemitorvet 200, Kgs, 2800, Lyngby, Denmark.
| |
Collapse
|
166
|
Couvillion SP, Agrawal N, Colby SM, Brandvold KR, Metz TO. Who Is Metabolizing What? Discovering Novel Biomolecules in the Microbiome and the Organisms Who Make Them. Front Cell Infect Microbiol 2020; 10:388. [PMID: 32850487 PMCID: PMC7410922 DOI: 10.3389/fcimb.2020.00388] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2020] [Accepted: 06/25/2020] [Indexed: 12/14/2022] Open
Abstract
Even as the field of microbiome research has made huge strides in mapping microbial community composition in a variety of environments and organisms, explaining the phenotypic influences on the host by microbial taxa-both known and unknown-and their specific functions still remain major challenges. A pressing need is the ability to assign specific functions in terms of enzymes and small molecules to specific taxa or groups of taxa in the community. This knowledge will be crucial for advancing personalized therapies based on the targeted modulation of microbes or metabolites that have predictable outcomes to benefit the human host. This perspective article advocates for the combined use of standards-free metabolomics and activity-based protein profiling strategies to address this gap in functional knowledge in microbiome research via the identification of novel biomolecules and the attribution of their production to specific microbial taxa.
Collapse
Affiliation(s)
- Sneha P. Couvillion
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, WA, United States
| | - Neha Agrawal
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, WA, United States
| | - Sean M. Colby
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, WA, United States
| | - Kristoffer R. Brandvold
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, WA, United States
- Elson S. Floyd College of Medicine, Washington State University, Spokane, WA, United States
| | - Thomas O. Metz
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, WA, United States
| |
Collapse
|
167
|
Samanta S, O’Hagan S, Swainston N, Roberts TJ, Kell DB. VAE-Sim: A Novel Molecular Similarity Measure Based on a Variational Autoencoder. Molecules 2020; 25:E3446. [PMID: 32751155 PMCID: PMC7435890 DOI: 10.3390/molecules25153446] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2020] [Revised: 07/21/2020] [Accepted: 07/28/2020] [Indexed: 01/13/2023] Open
Abstract
Molecular similarity is an elusive but core "unsupervised" cheminformatics concept, yet different "fingerprint" encodings of molecular structures return very different similarity values, even when using the same similarity metric. Each encoding may be of value when applied to other problems with objective or target functions, implying that a priori none are "better" than the others, nor than encoding-free metrics such as maximum common substructure (MCSS). We here introduce a novel approach to molecular similarity, in the form of a variational autoencoder (VAE). This learns the joint distribution p(z|x) where z is a latent vector and x are the (same) input/output data. It takes the form of a "bowtie"-shaped artificial neural network. In the middle is a "bottleneck layer" or latent vector in which inputs are transformed into, and represented as, a vector of numbers (encoding), with a reverse process (decoding) seeking to return the SMILES string that was the input. We train a VAE on over six million druglike molecules and natural products (including over one million in the final holdout set). The VAE vector distances provide a rapid and novel metric for molecular similarity that is both easily and rapidly calculated. We describe the method and its application to a typical similarity problem in cheminformatics.
Collapse
Affiliation(s)
- Soumitra Samanta
- Department of Biochemistry and Systems Biology, Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Crown St, Liverpool L69 7ZB, UK; (S.S.); (N.S.); (T.J.R.)
| | - Steve O’Hagan
- Department of Chemistry, The Manchester Institute of Biotechnology, The University of Manchester, 131 Princess St, Manchester M1 7DN, UK;
| | - Neil Swainston
- Department of Biochemistry and Systems Biology, Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Crown St, Liverpool L69 7ZB, UK; (S.S.); (N.S.); (T.J.R.)
| | - Timothy J. Roberts
- Department of Biochemistry and Systems Biology, Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Crown St, Liverpool L69 7ZB, UK; (S.S.); (N.S.); (T.J.R.)
| | - Douglas B. Kell
- Department of Biochemistry and Systems Biology, Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Crown St, Liverpool L69 7ZB, UK; (S.S.); (N.S.); (T.J.R.)
- Novo Nordisk Foundation Centre for Biosustainability, Technical University of Denmark, Building 220, Kemitorvet, 2800 Kgs Lyngby, Denmark
| |
Collapse
|
168
|
Cai C, Wang S, Xu Y, Zhang W, Tang K, Ouyang Q, Lai L, Pei J. Transfer Learning for Drug Discovery. J Med Chem 2020; 63:8683-8694. [PMID: 32672961 DOI: 10.1021/acs.jmedchem.9b02147] [Citation(s) in RCA: 144] [Impact Index Per Article: 36.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023]
Abstract
The data sets available to train models for in silico drug discovery efforts are often small. Indeed, the sparse availability of labeled data is a major barrier to artificial-intelligence-assisted drug discovery. One solution to this problem is to develop algorithms that can cope with relatively heterogeneous and scarce data. Transfer learning is a type of machine learning that can leverage existing, generalizable knowledge from other related tasks to enable learning of a separate task with a small set of data. Deep transfer learning is the most commonly used type of transfer learning in the field of drug discovery. This Perspective provides an overview of transfer learning and related applications to drug discovery to date. Furthermore, it provides outlooks on the future development of transfer learning for drug discovery.
Collapse
Affiliation(s)
- Chenjing Cai
- Center for Quantitative Biology, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing 100871, P. R. China
| | - Shiwei Wang
- PTN Graduate Program, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing 100871, P. R. China
| | - Youjun Xu
- BNLMS and Peking-Tsinghua Center for Life Sciences at the College of Chemistry and Molecular Engineering, Peking University, Beijing, 100871, P. R. China
| | - Weilin Zhang
- Beijing Intelligent Pharma Technology Co., Ltd., Beijing 100083, P. R. China
| | - Ke Tang
- Department of Computer Science and Engineering, Southern University of Science and Technology, Shenzhen 518055, P. R. China
| | - Qi Ouyang
- Center for Quantitative Biology, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing 100871, P. R. China.,The State Key Laboratory for Artificial Microstructures and Mesoscopic Physics, School of Physics, Peking University, Beijing 100871, P. R. China
| | - Luhua Lai
- Center for Quantitative Biology, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing 100871, P. R. China.,BNLMS and Peking-Tsinghua Center for Life Sciences at the College of Chemistry and Molecular Engineering, Peking University, Beijing, 100871, P. R. China
| | - Jianfeng Pei
- Center for Quantitative Biology, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing 100871, P. R. China
| |
Collapse
|
169
|
Amabilino S, Pogány P, Pickett SD, Green DVS. Guidelines for Recurrent Neural Network Transfer Learning-Based Molecular Generation of Focused Libraries. J Chem Inf Model 2020; 60:5699-5713. [DOI: 10.1021/acs.jcim.0c00343] [Citation(s) in RCA: 26] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Affiliation(s)
- Silvia Amabilino
- School of Chemistry, University of Bristol, Cantock’s Close, Bristol BS8 1TS, United Kingdom
| | - Peter Pogány
- Computational Sciences, GlaxoSmithKline, Gunnels Wood Road, Stevenage, Herts SG1 2NY, United Kingdom
| | - Stephen D. Pickett
- Computational Sciences, GlaxoSmithKline, Gunnels Wood Road, Stevenage, Herts SG1 2NY, United Kingdom
| | - Darren V. S. Green
- Computational Sciences, GlaxoSmithKline, Gunnels Wood Road, Stevenage, Herts SG1 2NY, United Kingdom
| |
Collapse
|
170
|
Verkhivker GM, Agajanian S, Hu G, Tao P. Allosteric Regulation at the Crossroads of New Technologies: Multiscale Modeling, Networks, and Machine Learning. Front Mol Biosci 2020; 7:136. [PMID: 32733918 PMCID: PMC7363947 DOI: 10.3389/fmolb.2020.00136] [Citation(s) in RCA: 40] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2020] [Accepted: 06/08/2020] [Indexed: 12/12/2022] Open
Abstract
Allosteric regulation is a common mechanism employed by complex biomolecular systems for regulation of activity and adaptability in the cellular environment, serving as an effective molecular tool for cellular communication. As an intrinsic but elusive property, allostery is a ubiquitous phenomenon where binding or disturbing of a distal site in a protein can functionally control its activity and is considered as the "second secret of life." The fundamental biological importance and complexity of these processes require a multi-faceted platform of synergistically integrated approaches for prediction and characterization of allosteric functional states, atomistic reconstruction of allosteric regulatory mechanisms and discovery of allosteric modulators. The unifying theme and overarching goal of allosteric regulation studies in recent years have been integration between emerging experiment and computational approaches and technologies to advance quantitative characterization of allosteric mechanisms in proteins. Despite significant advances, the quantitative characterization and reliable prediction of functional allosteric states, interactions, and mechanisms continue to present highly challenging problems in the field. In this review, we discuss simulation-based multiscale approaches, experiment-informed Markovian models, and network modeling of allostery and information-theoretical approaches that can describe the thermodynamics and hierarchy allosteric states and the molecular basis of allosteric mechanisms. The wealth of structural and functional information along with diversity and complexity of allosteric mechanisms in therapeutically important protein families have provided a well-suited platform for development of data-driven research strategies. Data-centric integration of chemistry, biology and computer science using artificial intelligence technologies has gained a significant momentum and at the forefront of many cross-disciplinary efforts. We discuss new developments in the machine learning field and the emergence of deep learning and deep reinforcement learning applications in modeling of molecular mechanisms and allosteric proteins. The experiment-guided integrated approaches empowered by recent advances in multiscale modeling, network science, and machine learning can lead to more reliable prediction of allosteric regulatory mechanisms and discovery of allosteric modulators for therapeutically important protein targets.
Collapse
Affiliation(s)
- Gennady M. Verkhivker
- Graduate Program in Computational and Data Sciences, Schmid College of Science and Technology, Chapman University, Orange, CA, United States
- Department of Biomedical and Pharmaceutical Sciences, Chapman University School of Pharmacy, Irvine, CA, United States
| | - Steve Agajanian
- Graduate Program in Computational and Data Sciences, Schmid College of Science and Technology, Chapman University, Orange, CA, United States
| | - Guang Hu
- Center for Systems Biology, Department of Bioinformatics, School of Biology and Basic Medical Sciences, Soochow University, Suzhou, China
| | - Peng Tao
- Department of Chemistry, Center for Drug Discovery, Design, and Delivery (CD4), Center for Scientific Computation, Southern Methodist University, Dallas, TX, United States
| |
Collapse
|
171
|
Coley CW, Eyke NS, Jensen KF. Autonomous Discovery in the Chemical Sciences Part II: Outlook. Angew Chem Int Ed Engl 2020; 59:23414-23436. [PMID: 31553509 DOI: 10.1002/anie.201909989] [Citation(s) in RCA: 104] [Impact Index Per Article: 26.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2019] [Indexed: 01/19/2023]
Abstract
This two-part Review examines how automation has contributed to different aspects of discovery in the chemical sciences. In this second part, we reflect on a selection of exemplary studies. It is increasingly important to articulate what the role of automation and computation has been in the scientific process and how that has or has not accelerated discovery. One can argue that even the best automated systems have yet to "discover" despite being incredibly useful as laboratory assistants. We must carefully consider how they have been and can be applied to future problems of chemical discovery in order to effectively design and interact with future autonomous platforms. The majority of this Review defines a large set of open research directions, including improving our ability to work with complex data, build empirical models, automate both physical and computational experiments for validation, select experiments, and evaluate whether we are making progress towards the ultimate goal of autonomous discovery. Addressing these practical and methodological challenges will greatly advance the extent to which autonomous systems can make meaningful discoveries.
Collapse
Affiliation(s)
- Connor W Coley
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA
| | - Natalie S Eyke
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA
| | - Klavs F Jensen
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA
| |
Collapse
|
172
|
Coley CW, Eyke NS, Jensen KF. Autonomous Discovery in the Chemical Sciences Part I: Progress. Angew Chem Int Ed Engl 2020; 59:22858-22893. [DOI: 10.1002/anie.201909987] [Citation(s) in RCA: 100] [Impact Index Per Article: 25.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2019] [Indexed: 01/05/2023]
Affiliation(s)
- Connor W. Coley
- Department of Chemical Engineering Massachusetts Institute of Technology Cambridge MA 02139 USA
| | - Natalie S. Eyke
- Department of Chemical Engineering Massachusetts Institute of Technology Cambridge MA 02139 USA
| | - Klavs F. Jensen
- Department of Chemical Engineering Massachusetts Institute of Technology Cambridge MA 02139 USA
| |
Collapse
|
173
|
Li X, Xu Y, Yao H, Lin K. Chemical space exploration based on recurrent neural networks: applications in discovering kinase inhibitors. J Cheminform 2020; 12:42. [PMID: 33430983 PMCID: PMC7278228 DOI: 10.1186/s13321-020-00446-3] [Citation(s) in RCA: 34] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2020] [Accepted: 06/04/2020] [Indexed: 01/10/2023] Open
Abstract
With the rise of artificial intelligence (AI) in drug discovery, de novo molecular generation provides new ways to explore chemical space. However, because de novo molecular generation methods rely on abundant known molecules, generated molecules may have a problem of novelty. Novelty is important in highly competitive areas of medicinal chemistry, such as the discovery of kinase inhibitors. In this study, de novo molecular generation based on recurrent neural networks was applied to discover a new chemical space of kinase inhibitors. During the application, the practicality was evaluated, and new inspiration was found. With the successful discovery of one potent Pim1 inhibitor and two lead compounds that inhibit CDK4, AI-based molecular generation shows potentials in drug discovery and development.![]()
Collapse
Affiliation(s)
- Xuanyi Li
- Department of Medicinal Chemistry, School of Pharmacy, China Pharmaceutical University, 24 Tongjiaxiang, Nanjing, 210009, China
| | - Yinqiu Xu
- Department of Medicinal Chemistry, School of Pharmacy, China Pharmaceutical University, 24 Tongjiaxiang, Nanjing, 210009, China
| | - Hequan Yao
- Department of Medicinal Chemistry, School of Pharmacy, China Pharmaceutical University, 24 Tongjiaxiang, Nanjing, 210009, China.
| | - Kejiang Lin
- Department of Medicinal Chemistry, School of Pharmacy, China Pharmaceutical University, 24 Tongjiaxiang, Nanjing, 210009, China.
| |
Collapse
|
174
|
Coley CW, Eyke NS, Jensen KF. Autonome Entdeckung in den chemischen Wissenschaften, Teil I: Fortschritt. Angew Chem Int Ed Engl 2020. [DOI: 10.1002/ange.201909987] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Affiliation(s)
- Connor W. Coley
- Department of Chemical Engineering Massachusetts Institute of Technology Cambridge MA 02139 USA
| | - Natalie S. Eyke
- Department of Chemical Engineering Massachusetts Institute of Technology Cambridge MA 02139 USA
| | - Klavs F. Jensen
- Department of Chemical Engineering Massachusetts Institute of Technology Cambridge MA 02139 USA
| |
Collapse
|
175
|
Ghanakota P, Bos PH, Konze KD, Staker J, Marques G, Marshall K, Leswing K, Abel R, Bhat S. Combining Cloud-Based Free-Energy Calculations, Synthetically Aware Enumerations, and Goal-Directed Generative Machine Learning for Rapid Large-Scale Chemical Exploration and Optimization. J Chem Inf Model 2020; 60:4311-4325. [PMID: 32484669 DOI: 10.1021/acs.jcim.0c00120] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023]
Affiliation(s)
- Phani Ghanakota
- Schrödinger, Inc., 120 West 45th Street, 17th floor, New York, New York 10036, United States
| | - Pieter H. Bos
- Schrödinger, Inc., 120 West 45th Street, 17th floor, New York, New York 10036, United States
| | - Kyle D. Konze
- Schrödinger, Inc., 120 West 45th Street, 17th floor, New York, New York 10036, United States
| | - Joshua Staker
- Schrödinger, Inc., 120 West 45th Street, 17th floor, New York, New York 10036, United States
| | - Gabriel Marques
- Schrödinger, Inc., 120 West 45th Street, 17th floor, New York, New York 10036, United States
| | - Kyle Marshall
- Schrödinger, Inc., 120 West 45th Street, 17th floor, New York, New York 10036, United States
| | - Karl Leswing
- Schrödinger, Inc., 120 West 45th Street, 17th floor, New York, New York 10036, United States
| | - Robert Abel
- Schrödinger, Inc., 120 West 45th Street, 17th floor, New York, New York 10036, United States
| | - Sathesh Bhat
- Schrödinger, Inc., 120 West 45th Street, 17th floor, New York, New York 10036, United States
| |
Collapse
|
176
|
Arús-Pous J, Patronov A, Bjerrum EJ, Tyrchan C, Reymond JL, Chen H, Engkvist O. SMILES-based deep generative scaffold decorator for de-novo drug design. J Cheminform 2020; 12:38. [PMID: 33431013 PMCID: PMC7260788 DOI: 10.1186/s13321-020-00441-8] [Citation(s) in RCA: 63] [Impact Index Per Article: 15.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2020] [Accepted: 05/16/2020] [Indexed: 12/21/2022] Open
Abstract
Molecular generative models trained with small sets of molecules represented as SMILES strings can generate large regions of the chemical space. Unfortunately, due to the sequential nature of SMILES strings, these models are not able to generate molecules given a scaffold (i.e., partially-built molecules with explicit attachment points). Herein we report a new SMILES-based molecular generative architecture that generates molecules from scaffolds and can be trained from any arbitrary molecular set. This approach is possible thanks to a new molecular set pre-processing algorithm that exhaustively slices all possible combinations of acyclic bonds of every molecule, combinatorically obtaining a large number of scaffolds with their respective decorations. Moreover, it serves as a data augmentation technique and can be readily coupled with randomized SMILES to obtain even better results with small sets. Two examples showcasing the potential of the architecture in medicinal and synthetic chemistry are described: First, models were trained with a training set obtained from a small set of Dopamine Receptor D2 (DRD2) active modulators and were able to meaningfully decorate a wide range of scaffolds and obtain molecular series predicted active on DRD2. Second, a larger set of drug-like molecules from ChEMBL was selectively sliced using synthetic chemistry constraints (RECAP rules). In this case, the resulting scaffolds with decorations were filtered only to allow those that included fragment-like decorations. This filtering process allowed models trained with this dataset to selectively decorate diverse scaffolds with fragments that were generally predicted to be synthesizable and attachable to the scaffold using known synthetic approaches. In both cases, the models were already able to decorate molecules using specific knowledge without the need to add it with other techniques, such as reinforcement learning. We envision that this architecture will become a useful addition to the already existent architectures for de novo molecular generation.
Collapse
Affiliation(s)
- Josep Arús-Pous
- Molecular AI, Hit Discovery, Discovery Sciences, BioPharmaceutical R&D, AstraZeneca, Gothenburg, Sweden. .,Department of Chemistry and Biochemistry, University of Bern, Freiestrasse 3, 3012, Bern, Switzerland.
| | - Atanas Patronov
- Molecular AI, Hit Discovery, Discovery Sciences, BioPharmaceutical R&D, AstraZeneca, Gothenburg, Sweden
| | - Esben Jannik Bjerrum
- Molecular AI, Hit Discovery, Discovery Sciences, BioPharmaceutical R&D, AstraZeneca, Gothenburg, Sweden
| | - Christian Tyrchan
- Medicinal Chemistry, Respiratory Inflammation, and Autoimmune (RIA), BioPharmaceutical R&D, AstraZeneca, Gothenburg, Sweden
| | - Jean-Louis Reymond
- Department of Chemistry and Biochemistry, University of Bern, Freiestrasse 3, 3012, Bern, Switzerland
| | - Hongming Chen
- Chemistry and Chemical Biology Centre, Guangzhou Regenerative Medicine and Health -Guangdong Laboratory, Guangzhou, China
| | - Ola Engkvist
- Molecular AI, Hit Discovery, Discovery Sciences, BioPharmaceutical R&D, AstraZeneca, Gothenburg, Sweden
| |
Collapse
|
177
|
SYBA: Bayesian estimation of synthetic accessibility of organic compounds. J Cheminform 2020; 12:35. [PMID: 33431015 PMCID: PMC7238540 DOI: 10.1186/s13321-020-00439-2] [Citation(s) in RCA: 36] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2020] [Accepted: 05/09/2020] [Indexed: 12/11/2022] Open
Abstract
SYBA (SYnthetic Bayesian Accessibility) is a fragment-based method for the rapid classification of organic compounds as easy- (ES) or hard-to-synthesize (HS). It is based on a Bernoulli naïve Bayes classifier that is used to assign SYBA score contributions to individual fragments based on their frequencies in the database of ES and HS molecules. SYBA was trained on ES molecules available in the ZINC15 database and on HS molecules generated by the Nonpher methodology. SYBA was compared with a random forest, that was utilized as a baseline method, as well as with other two methods for synthetic accessibility assessment: SAScore and SCScore. When used with their suggested thresholds, SYBA improves over random forest classification, albeit marginally, and outperforms SAScore and SCScore. However, upon the optimization of SAScore threshold (that changes from 6.0 to – 4.5), SAScore yields similar results as SYBA. Because SYBA is based merely on fragment contributions, it can be used for the analysis of the contribution of individual molecular parts to compound synthetic accessibility. SYBA is publicly available at https://github.com/lich-uct/syba under the GNU General Public License.
Collapse
|
178
|
Imrie F, Bradley AR, van der Schaar M, Deane CM. Deep Generative Models for 3D Linker Design. J Chem Inf Model 2020; 60:1983-1995. [PMID: 32195587 PMCID: PMC7189367 DOI: 10.1021/acs.jcim.9b01120] [Citation(s) in RCA: 101] [Impact Index Per Article: 25.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2019] [Indexed: 12/18/2022]
Abstract
Rational compound design remains a challenging problem for both computational methods and medicinal chemists. Computational generative methods have begun to show promising results for the design problem. However, they have not yet used the power of three-dimensional (3D) structural information. We have developed a novel graph-based deep generative model that combines state-of-the-art machine learning techniques with structural knowledge. Our method ("DeLinker") takes two fragments or partial structures and designs a molecule incorporating both. The generation process is protein-context-dependent, utilizing the relative distance and orientation between the partial structures. This 3D information is vital to successful compound design, and we demonstrate its impact on the generation process and the limitations of omitting such information. In a large-scale evaluation, DeLinker designed 60% more molecules with high 3D similarity to the original molecule than a database baseline. When considering the more relevant problem of longer linkers with at least five atoms, the outperformance increased to 200%. We demonstrate the effectiveness and applicability of this approach on a diverse range of design problems: fragment linking, scaffold hopping, and proteolysis targeting chimera (PROTAC) design. As far as we are aware, this is the first molecular generative model to incorporate 3D structural information directly in the design process. The code is available at https://github.com/oxpig/DeLinker.
Collapse
Affiliation(s)
- Fergus Imrie
- Oxford
Protein Informatics Group, Department of Statistics, University of Oxford, Oxford OX1 3LB, U.K.
| | | | - Mihaela van der Schaar
- University
of Cambridge, Cambridge CB2 1PZ, U.K.
- Alan
Turing Institute, London NW1 2DB, U.K.
| | - Charlotte M. Deane
- Oxford
Protein Informatics Group, Department of Statistics, University of Oxford, Oxford OX1 3LB, U.K.
| |
Collapse
|
179
|
van Deursen R, Ertl P, Tetko IV, Godin G. GEN: highly efficient SMILES explorer using autodidactic generative examination networks. J Cheminform 2020; 12:22. [PMID: 33430998 PMCID: PMC7146994 DOI: 10.1186/s13321-020-00425-8] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2019] [Accepted: 03/23/2020] [Indexed: 12/31/2022] Open
Abstract
Recurrent neural networks have been widely used to generate millions of de novo molecules in defined chemical spaces. Reported deep generative models are exclusively based on LSTM and/or GRU units and frequently trained using canonical SMILES. In this study, we introduce Generative Examination Networks (GEN) as a new approach to train deep generative networks for SMILES generation. In our GENs, we have used an architecture based on multiple concatenated bidirectional RNN units to enhance the validity of generated SMILES. GENs autonomously learn the target space in a few epochs and are stopped early using an independent online examination mechanism, measuring the quality of the generated set. Herein we have used online statistical quality control (SQC) on the percentage of valid molecular SMILES as examination measure to select the earliest available stable model weights. Very high levels of valid SMILES (95–98%) can be generated using multiple parallel encoding layers in combination with SMILES augmentation using unrestricted SMILES randomization. Our trained models combine an excellent novelty rate (85–90%) while generating SMILES with strong conservation of the property space (95–99%). In GENs, both the generative network and the examination mechanism are open to other architectures and quality criteria.![]()
Collapse
Affiliation(s)
- Ruud van Deursen
- Firmenich SA, Research and Development, Rue des Jeunes 1, Les Acacias, 1227, Geneva, Switzerland.
| | - Peter Ertl
- Novartis Institutes for BioMedical Research, Novartis Campus, 4056, Basel, Switzerland
| | - Igor V Tetko
- Institute of Structural Biology, Helmholtz Zentrum München-German Research Center for Environmental Health (GmbH), Ingolstädter Landstraße 1, 85764, Neuherberg, Germany.,BIGCHEM GmbH, Valerystr. 49, 85716, Unterschleißheim, Germany
| | - Guillaume Godin
- Firmenich SA, Research and Development, Rue des Jeunes 1, Les Acacias, 1227, Geneva, Switzerland.
| |
Collapse
|
180
|
McCarthy M, Lee KLK. Molecule Identification with Rotational Spectroscopy and Probabilistic Deep Learning. J Phys Chem A 2020; 124:3002-3017. [PMID: 32212702 DOI: 10.1021/acs.jpca.0c01376] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023]
Abstract
A proof-of-concept framework for identifying molecules of unknown elemental composition and structure using experimental rotational data and probabilistic deep learning is presented. Using a minimal set of input data determined experimentally, we describe four neural network architectures that yield information to assist in the identification of an unknown molecule. The first architecture translates spectroscopic parameters into Coulomb matrix eigenspectra as a method of recovering chemical and structural information encoded in the rotational spectrum. The eigenspectrum is subsequently used by three deep learning networks to constrain the range of stoichiometries, generate SMILES strings, and predict the most likely functional groups present in the molecule. In each model, we utilize dropout layers as an approximation to Bayesian sampling, which subsequently generates probabilistic predictions from otherwise deterministic models. These models are trained on a modestly sized theoretical dataset comprising ∼83 000 unique organic molecules (between 18 and 180 amu) optimized at the ωB97X-D/6-31+G(d) level of theory, where the theoretical uncertainties of the spectoscopic constants are well-understood and used to further augment training. Since chemical and structural properties depend strongly on molecular composition, we divided the dataset into four groups corresponding to pure hydrocarbons, oxygen-bearing species, nitrogen-bearing species, and both oxygen- and nitrogen-bearing species, training each type of network with one of these categories, thus creating "experts" within each domain of molecules. We demonstrate how these models can then be used for practical inference on four molecules and discuss both the strengths and shortcomings of our approach and the future directions these architectures can take.
Collapse
Affiliation(s)
- Michael McCarthy
- Center for Astrophysics
- Harvard & Smithsonian, 60 Garden Street, Cambridge, Massachusetts 02138, United States
| | - Kin Long Kelvin Lee
- Center for Astrophysics
- Harvard & Smithsonian, 60 Garden Street, Cambridge, Massachusetts 02138, United States
| |
Collapse
|
181
|
Abstract
INTRODUCTION Deep discriminative and generative neural-network models are becoming an integral part of the modern approach to ligand-based novel drug discovery. The variety of different architectures of neural networks, the methods of their training, and the procedures of generating new molecules require expert knowledge to choose the most suitable approach. AREAS COVERED Three different approaches to deep learning use in ligand-based drug discovery are considered: virtual screening, neural generative models, and mutation-based structure generation. Several architectures of neural networks for building either discriminative or generative models are considered in this paper, including deep multilayer neural networks, different kinds of convolutional neural networks, recurrent neural networks, and several types of autoencoders. Several kinds of learning frameworks are also considered, including adversarial learning and reinforcement learning. Different types of representations for generating molecules, including SMILES, graphs, and several alternative string representations are also considered. EXPERT OPINION Two kinds of problem should be solved in order to make the models built using deep neural networks, especially generative models, a valuable option in ligand-based drug discovery: the issue of interpretability and explainability of deep-learning models and the issue of synthetic accessibility of novel compounds designed by deep-learning algorithms.
Collapse
Affiliation(s)
- Igor I Baskin
- Faculty of Physics, M.V. Lomonosov Moscow State University , Moscow, Russia.,Butlerov Institute of Chemistry, Kazan Federal University , Kazan, Russia
| |
Collapse
|
182
|
Karpov P, Godin G, Tetko IV. Transformer-CNN: Swiss knife for QSAR modeling and interpretation. J Cheminform 2020; 12:17. [PMID: 33431004 PMCID: PMC7079452 DOI: 10.1186/s13321-020-00423-w] [Citation(s) in RCA: 115] [Impact Index Per Article: 28.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2019] [Accepted: 03/09/2020] [Indexed: 01/03/2023] Open
Abstract
We present SMILES-embeddings derived from the internal encoder state of a Transformer [1] model trained to canonize SMILES as a Seq2Seq problem. Using a CharNN [2] architecture upon the embeddings results in higher quality interpretable QSAR/QSPR models on diverse benchmark datasets including regression and classification tasks. The proposed Transformer-CNN method uses SMILES augmentation for training and inference, and thus the prognosis is based on an internal consensus. That both the augmentation and transfer learning are based on embeddings allows the method to provide good results for small datasets. We discuss the reasons for such effectiveness and draft future directions for the development of the method. The source code and the embeddings needed to train a QSAR model are available on https://github.com/bigchem/transformer-cnn. The repository also has a standalone program for QSAR prognosis which calculates individual atoms contributions, thus interpreting the model’s result. OCHEM [3] environment (https://ochem.eu) hosts the on-line implementation of the method proposed.
Collapse
Affiliation(s)
- Pavel Karpov
- Institute of Structural Biology, Helmholtz Zentrum München-Research Center for Environmental Health (GmbH), Ingolstädter Landstraße 1, 85764, Neuherberg, Germany. .,BIGCHEM GmbH, Ingolstädter Landstraße 1, 85764, Neuherberg, Germany.
| | - Guillaume Godin
- Firmenich International SA, Digital Lab, Geneva, Lausanne, Switzerland
| | - Igor V Tetko
- Institute of Structural Biology, Helmholtz Zentrum München-Research Center for Environmental Health (GmbH), Ingolstädter Landstraße 1, 85764, Neuherberg, Germany.,BIGCHEM GmbH, Ingolstädter Landstraße 1, 85764, Neuherberg, Germany
| |
Collapse
|
183
|
|
184
|
Bilbrey JA, Marrero CO, Sassi M, Ritzmann AM, Henson NJ, Schram M. Tracking the Chemical Evolution of Iodine Species Using Recurrent Neural Networks. ACS OMEGA 2020; 5:4588-4594. [PMID: 32175505 PMCID: PMC7066558 DOI: 10.1021/acsomega.9b04104] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/02/2019] [Accepted: 02/14/2020] [Indexed: 06/10/2023]
Abstract
We apply recurrent neural networks (RNNs) to predict the time evolution of the concentration profile of multiple species resulting from a set of interconnected chemical reactions. As a proof of concept of our approach, RNNs were trained on a synthetic dataset generated by solving the kinetic equations of a system of aqueous inorganic iodine reactions that can follow after nuclear reactor accidents. We examine the minimum dataset necessary to obtain accurate predictions and explore the ability of RNNs to interpolate and extrapolate when exposed to previously unseen data. We also investigate the limits of our RNN by evaluating the robustness of the training initialization on our dataset.
Collapse
|
185
|
Yasonik J. Multiobjective de novo drug design with recurrent neural networks and nondominated sorting. J Cheminform 2020; 12:14. [PMID: 33430996 PMCID: PMC7026957 DOI: 10.1186/s13321-020-00419-6] [Citation(s) in RCA: 36] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2019] [Accepted: 02/10/2020] [Indexed: 01/28/2023] Open
Abstract
Research productivity in the pharmaceutical industry has declined significantly in recent decades, with higher costs, longer timelines, and lower success rates of drug candidates in clinical trials. This has prioritized the scalability and multiobjectivity of drug discovery and design. De novo drug design has emerged as a promising approach; molecules are generated from scratch, thus reducing the reliance on trial and error and premade molecular repositories. However, optimizing for molecular traits remains challenging, impeding the implementation of de novo methods. In this work, we propose a de novo approach capable of optimizing multiple traits collectively. A recurrent neural network was used to generate molecules which were then ranked based on multiple properties by a nondominated sorting algorithm. The best of the molecules generated were selected and used to fine-tune the recurrent neural network through transfer learning, creating a cycle that mimics the traditional design–synthesis–test cycle. We demonstrate the efficacy of this approach through a proof of concept, optimizing for constraints on molecular weight, octanol-water partition coefficient, the number of rotatable bonds, hydrogen bond donors, and hydrogen bond acceptors simultaneously. Analysis of the molecules generated after five iterations of the cycle revealed a 14-fold improvement in the quality of generated molecules, along with improvements to the accuracy of the recurrent neural network and the structural diversity of the molecules generated. This cycle notably does not require large amounts of training data nor any handwritten scoring functions. Altogether, this approach uniquely combines scalable generation with multiobjective optimization of molecules.
Collapse
|
186
|
de Souza Neto LR, Moreira-Filho JT, Neves BJ, Maidana RLBR, Guimarães ACR, Furnham N, Andrade CH, Silva FP. In silico Strategies to Support Fragment-to-Lead Optimization in Drug Discovery. Front Chem 2020; 8:93. [PMID: 32133344 PMCID: PMC7040036 DOI: 10.3389/fchem.2020.00093] [Citation(s) in RCA: 101] [Impact Index Per Article: 25.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2019] [Accepted: 01/30/2020] [Indexed: 12/16/2022] Open
Abstract
Fragment-based drug (or lead) discovery (FBDD or FBLD) has developed in the last two decades to become a successful key technology in the pharmaceutical industry for early stage drug discovery and development. The FBDD strategy consists of screening low molecular weight compounds against macromolecular targets (usually proteins) of clinical relevance. These small molecular fragments can bind at one or more sites on the target and act as starting points for the development of lead compounds. In developing the fragments attractive features that can translate into compounds with favorable physical, pharmacokinetics and toxicity (ADMET-absorption, distribution, metabolism, excretion, and toxicity) properties can be integrated. Structure-enabled fragment screening campaigns use a combination of screening by a range of biophysical techniques, such as differential scanning fluorimetry, surface plasmon resonance, and thermophoresis, followed by structural characterization of fragment binding using NMR or X-ray crystallography. Structural characterization is also used in subsequent analysis for growing fragments of selected screening hits. The latest iteration of the FBDD workflow employs a high-throughput methodology of massively parallel screening by X-ray crystallography of individually soaked fragments. In this review we will outline the FBDD strategies and explore a variety of in silico approaches to support the follow-up fragment-to-lead optimization of either: growing, linking, and merging. These fragment expansion strategies include hot spot analysis, druggability prediction, SAR (structure-activity relationships) by catalog methods, application of machine learning/deep learning models for virtual screening and several de novo design methods for proposing synthesizable new compounds. Finally, we will highlight recent case studies in fragment-based drug discovery where in silico methods have successfully contributed to the development of lead compounds.
Collapse
Affiliation(s)
- Lauro Ribeiro de Souza Neto
- LaBECFar – Laboratório de Bioquímica Experimental e Computacional de Fármacos, Instituto Oswaldo Cruz, Fundação Oswaldo Cruz, Rio de Janeiro, Brazil
| | - José Teófilo Moreira-Filho
- LabMol – Laboratory for Molecular Modeling and Drug Design, Faculdade de Farmácia, Universidade Federal de Goiás, Goiânia, Brazil
| | - Bruno Junior Neves
- LabMol – Laboratory for Molecular Modeling and Drug Design, Faculdade de Farmácia, Universidade Federal de Goiás, Goiânia, Brazil
- Laboratory of Cheminformatics, Centro Universitário de Anápolis – UniEVANGÉLICA, Anápolis, Brazil
| | - Rocío Lucía Beatriz Riveros Maidana
- LaBECFar – Laboratório de Bioquímica Experimental e Computacional de Fármacos, Instituto Oswaldo Cruz, Fundação Oswaldo Cruz, Rio de Janeiro, Brazil
- Laboratório de Genômica Funcional e Bioinformática, Instituto Oswaldo Cruz, Fundação Oswaldo Cruz, Rio de Janeiro, Brazil
| | - Ana Carolina Ramos Guimarães
- Laboratório de Genômica Funcional e Bioinformática, Instituto Oswaldo Cruz, Fundação Oswaldo Cruz, Rio de Janeiro, Brazil
| | - Nicholas Furnham
- Department of Infection Biology, London School of Hygiene and Tropical Medicine, London, United Kingdom
| | - Carolina Horta Andrade
- LabMol – Laboratory for Molecular Modeling and Drug Design, Faculdade de Farmácia, Universidade Federal de Goiás, Goiânia, Brazil
| | - Floriano Paes Silva
- LaBECFar – Laboratório de Bioquímica Experimental e Computacional de Fármacos, Instituto Oswaldo Cruz, Fundação Oswaldo Cruz, Rio de Janeiro, Brazil
| |
Collapse
|
187
|
Grisoni F, Moret M, Lingwood R, Schneider G. Bidirectional Molecule Generation with Recurrent Neural Networks. J Chem Inf Model 2020; 60:1175-1183. [PMID: 31904964 DOI: 10.1021/acs.jcim.9b00943] [Citation(s) in RCA: 89] [Impact Index Per Article: 22.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/08/2023]
Abstract
Recurrent neural networks (RNNs) are able to generate de novo molecular designs using simplified molecular input line entry systems (SMILES) string representations of the chemical structure. RNN-based structure generation is usually performed unidirectionally, by growing SMILES strings from left to right. However, there is no natural start or end of a small molecule, and SMILES strings are intrinsically nonunivocal representations of molecular graphs. These properties motivate bidirectional structure generation. Here, bidirectional generative RNNs for SMILES-based molecule design are introduced. To this end, two established bidirectional methods were implemented, and a new method for SMILES string generation and data augmentation is introduced-the bidirectional molecule design by alternate learning (BIMODAL). These three bidirectional strategies were compared to the unidirectional forward RNN approach for SMILES string generation, in terms of the (i) novelty, (ii) scaffold diversity, and (iii) chemical-biological relevance of the computer-generated molecules. The results positively advocate bidirectional strategies for SMILES-based molecular de novo design, with BIMODAL showing superior results to the unidirectional forward RNN for most of the criteria in the tested conditions. The code of the methods and the pretrained models can be found at URL https://github.com/ETHmodlab/BIMODAL.
Collapse
Affiliation(s)
- Francesca Grisoni
- Department of Chemistry and Applied Biosciences, RETHINK, ETH Zurich, Vladimir-Prelog-Weg 4, 8093 Zurich, Switzerland
| | - Michael Moret
- Department of Chemistry and Applied Biosciences, RETHINK, ETH Zurich, Vladimir-Prelog-Weg 4, 8093 Zurich, Switzerland
| | - Robin Lingwood
- Department of Chemistry and Applied Biosciences, RETHINK, ETH Zurich, Vladimir-Prelog-Weg 4, 8093 Zurich, Switzerland
| | - Gisbert Schneider
- Department of Chemistry and Applied Biosciences, RETHINK, ETH Zurich, Vladimir-Prelog-Weg 4, 8093 Zurich, Switzerland
| |
Collapse
|
188
|
Griffiths RR, Hernández-Lobato JM. Constrained Bayesian optimization for automatic chemical design using variational autoencoders. Chem Sci 2020; 11:577-586. [PMID: 32190274 PMCID: PMC7067240 DOI: 10.1039/c9sc04026a] [Citation(s) in RCA: 86] [Impact Index Per Article: 21.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2019] [Accepted: 11/15/2019] [Indexed: 12/15/2022] Open
Abstract
Automatic Chemical Design is a framework for generating novel molecules with optimized properties. The original scheme, featuring Bayesian optimization over the latent space of a variational autoencoder, suffers from the pathology that it tends to produce invalid molecular structures. First, we demonstrate empirically that this pathology arises when the Bayesian optimization scheme queries latent space points far away from the data on which the variational autoencoder has been trained. Secondly, by reformulating the search procedure as a constrained Bayesian optimization problem, we show that the effects of this pathology can be mitigated, yielding marked improvements in the validity of the generated molecules. We posit that constrained Bayesian optimization is a good approach for solving this kind of training set mismatch in many generative tasks involving Bayesian optimization over the latent space of a variational autoencoder.
Collapse
Affiliation(s)
- Ryan-Rhys Griffiths
- Cavendish Laboratory , Department of Physics , University of Cambridge , UK .
| | - José Miguel Hernández-Lobato
- Department of Engineering , University of Cambridge , UK .
- Alan Turing Institute , London , UK
- Microsoft Research , Cambridge , UK
| |
Collapse
|
189
|
Maziarka Ł, Pocha A, Kaczmarczyk J, Rataj K, Danel T, Warchoł M. Mol-CycleGAN: a generative model for molecular optimization. J Cheminform 2020; 12:2. [PMID: 33431006 PMCID: PMC6950853 DOI: 10.1186/s13321-019-0404-1] [Citation(s) in RCA: 95] [Impact Index Per Article: 23.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2019] [Accepted: 12/16/2019] [Indexed: 01/08/2023] Open
Abstract
Designing a molecule with desired properties is one of the biggest challenges in drug development, as it requires optimization of chemical compound structures with respect to many complex properties. To improve the compound design process, we introduce Mol-CycleGAN-a CycleGAN-based model that generates optimized compounds with high structural similarity to the original ones. Namely, given a molecule our model generates a structurally similar one with an optimized value of the considered property. We evaluate the performance of the model on selected optimization objectives related to structural properties (presence of halogen groups, number of aromatic rings) and to a physicochemical property (penalized logP). In the task of optimization of penalized logP of drug-like molecules our model significantly outperforms previous results.
Collapse
Affiliation(s)
- Łukasz Maziarka
- Ardigen, Podole 76, 30-394 Cracow, Poland
- Faculty of Mathematics and Computer Science, Jagiellonian University, Łojasiewicza 6, 30-348 Cracow, Poland
| | - Agnieszka Pocha
- Faculty of Mathematics and Computer Science, Jagiellonian University, Łojasiewicza 6, 30-348 Cracow, Poland
| | | | | | - Tomasz Danel
- Ardigen, Podole 76, 30-394 Cracow, Poland
- Faculty of Mathematics and Computer Science, Jagiellonian University, Łojasiewicza 6, 30-348 Cracow, Poland
| | | |
Collapse
|
190
|
Chen G, Shen Z, Iyer A, Ghumman UF, Tang S, Bi J, Chen W, Li Y. Machine-Learning-Assisted De Novo Design of Organic Molecules and Polymers: Opportunities and Challenges. Polymers (Basel) 2020; 12:E163. [PMID: 31936321 PMCID: PMC7023065 DOI: 10.3390/polym12010163] [Citation(s) in RCA: 57] [Impact Index Per Article: 14.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2019] [Revised: 12/27/2019] [Accepted: 01/02/2020] [Indexed: 12/18/2022] Open
Abstract
Organic molecules and polymers have a broad range of applications in biomedical, chemical, and materials science fields. Traditional design approaches for organic molecules and polymers are mainly experimentally-driven, guided by experience, intuition, and conceptual insights. Though they have been successfully applied to discover many important materials, these methods are facing significant challenges due to the tremendous demand of new materials and vast design space of organic molecules and polymers. Accelerated and inverse materials design is an ideal solution to these challenges. With advancements in high-throughput computation, artificial intelligence (especially machining learning, ML), and the growth of materials databases, ML-assisted materials design is emerging as a promising tool to flourish breakthroughs in many areas of materials science and engineering. To date, using ML-assisted approaches, the quantitative structure property/activity relation for material property prediction can be established more accurately and efficiently. In addition, materials design can be revolutionized and accelerated much faster than ever, through ML-enabled molecular generation and inverse molecular design. In this perspective, we review the recent progresses in ML-guided design of organic molecules and polymers, highlight several successful examples, and examine future opportunities in biomedical, chemical, and materials science fields. We further discuss the relevant challenges to solve in order to fully realize the potential of ML-assisted materials design for organic molecules and polymers. In particular, this study summarizes publicly available materials databases, feature representations for organic molecules, open-source tools for feature generation, methods for molecular generation, and ML models for prediction of material properties, which serve as a tutorial for researchers who have little experience with ML before and want to apply ML for various applications. Last but not least, it draws insights into the current limitations of ML-guided design of organic molecules and polymers. We anticipate that ML-assisted materials design for organic molecules and polymers will be the driving force in the near future, to meet the tremendous demand of new materials with tailored properties in different fields.
Collapse
Affiliation(s)
- Guang Chen
- Department of Mechanical Engineering, University of Connecticut, Storrs, CT 06269, USA; (G.C.); (Z.S.)
| | - Zhiqiang Shen
- Department of Mechanical Engineering, University of Connecticut, Storrs, CT 06269, USA; (G.C.); (Z.S.)
| | - Akshay Iyer
- Department of Mechanical Engineering, Northwestern University, Evanston, IL 60208, USA; (A.I.); (U.F.G.)
| | - Umar Farooq Ghumman
- Department of Mechanical Engineering, Northwestern University, Evanston, IL 60208, USA; (A.I.); (U.F.G.)
| | - Shan Tang
- State Key Laboratory of Structural Analysis for Industrial Equipment, Department of Engineering Mechanics, and International Research Center for Computational Mechanics, Dalian University of Technology, Dalian 116023, China;
| | - Jinbo Bi
- Department of Computer Science and Engineering, University of Connecticut, Storrs, CT 06269, USA;
| | - Wei Chen
- Department of Mechanical Engineering, Northwestern University, Evanston, IL 60208, USA; (A.I.); (U.F.G.)
| | - Ying Li
- Department of Mechanical Engineering, University of Connecticut, Storrs, CT 06269, USA; (G.C.); (Z.S.)
- Polymer Program, Institute of Materials Science, University of Connecticut, Storrs, CT 06269, USA
| |
Collapse
|
191
|
Kato Y, Hamada S, Goto H. Validation Study of QSAR/DNN Models Using the Competition Datasets. Mol Inform 2020; 39:e1900154. [PMID: 31802634 PMCID: PMC7050538 DOI: 10.1002/minf.201900154] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2019] [Accepted: 12/02/2019] [Indexed: 11/30/2022]
Abstract
Since the QSAR/DNN model showed predominant predictive performance over other conventional methods in the Kaggle QSAR competition, many artificial neural network (ANN) methods have been applied to drug and material discovery. Appearance of artificial intelligence (AI), which is combined various general purpose ANN platforms with large-scale open access chemical databases, has attracting great interest and expectation in a wide range of molecular sciences. In this study, we investigate various DNN settings in order to reach a high-level of predictive performance comparable to the champion team of the competition, even with a general purpose ANN platform, and introduce the Meister setting for constructing a good QSAR/DNNs model. Here, we have used the most commonly available DNN model and constructed many QSAR/DNN models trained with various DNN settings by using the 15 datasets employed in the competition. As a result, it was confirmed that we can constructed the QSAR/DNN model that shows the same level of R2 performance as the champion team. The difference from the DNN setting recommended by the champion team was to reduce the mini-batch size. We have also explained that the R2 performance of each target depends on the molecular activity type, which is related to the complexity of biological mechanisms and chemical processes observed in molecular activity measurements.
Collapse
Affiliation(s)
- Yoshiki Kato
- Department of Computer Science and EngineeringToyohashi University of Technology1-1 Hibarigaoka, Tempaku choToyohashi, Aichi441-8580Japan
| | - Shinji Hamada
- Department of Computer Science and EngineeringToyohashi University of Technology1-1 Hibarigaoka, Tempaku choToyohashi, Aichi441-8580Japan
| | - Hitoshi Goto
- Department of Computer Science and EngineeringToyohashi University of Technology1-1 Hibarigaoka, Tempaku choToyohashi, Aichi441-8580Japan
| |
Collapse
|
192
|
Fare C, Turcani L, Pyzer-Knapp EO. Powerful, transferable representations for molecules through intelligent task selection in deep multitask networks. Phys Chem Chem Phys 2020; 22:13041-13048. [DOI: 10.1039/d0cp02319a] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/23/2023]
Abstract
We develop and test a framework for selecting appropriate chemical datasets to create molecular representations tailored for specific tasks.
Collapse
Affiliation(s)
- Clyde Fare
- IBM Research UK
- Sci-Tech Daresbury
- Warrington
- UK
| | | | | |
Collapse
|
193
|
Hong SH, Ryu S, Lim J, Kim WY. Molecular Generative Model Based on an Adversarially Regularized Autoencoder. J Chem Inf Model 2019; 60:29-36. [PMID: 31820983 DOI: 10.1021/acs.jcim.9b00694] [Citation(s) in RCA: 34] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023]
Abstract
Deep generative models are attracting great attention as a new promising approach for molecular design. A variety of models reported so far are based on either a variational autoencoder (VAE) or a generative adversarial network (GAN), but they have limitations such as low validity and uniqueness. Here, we propose a new type of model based on an adversarially regularized autoencoder (ARAE). It basically uses latent variables like VAE, but the distribution of the latent variables is estimated by adversarial training like in GAN. The latter is intended to avoid both the insufficiently flexible approximation of posterior distribution in VAE and the difficulty in handling discrete variables in GAN. Our benchmark study showed that ARAE indeed outperformed conventional models in terms of validity, uniqueness, and novelty per generated molecule. We also demonstrated a successful conditional generation of drug-like molecules with ARAE for the control of both cases of single and multiple properties. As a potential real-world application, we could generate epidermal growth factor receptor inhibitors sharing the scaffolds of known active molecules while satisfying drug-like conditions simultaneously.
Collapse
|
194
|
Schneider P, Walters WP, Plowright AT, Sieroka N, Listgarten J, Goodnow RA, Fisher J, Jansen JM, Duca JS, Rush TS, Zentgraf M, Hill JE, Krutoholow E, Kohler M, Blaney J, Funatsu K, Luebkemann C, Schneider G. Rethinking drug design in the artificial intelligence era. Nat Rev Drug Discov 2019. [DOI: 78495111110.1038/s41573-019-0050-3' target='_blank'>'"<>78495111110.1038/s41573-019-0050-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [78495111110.1038/s41573-019-0050-3','', '10.1002/minf.201700111')">Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/29/2022]
78495111110.1038/s41573-019-0050-3" />
|
195
|
Rethinking drug design in the artificial intelligence era. Nat Rev Drug Discov 2019; 19:353-364. [DOI: 10.1038/s41573-019-0050-3] [Citation(s) in RCA: 222] [Impact Index Per Article: 44.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/28/2019] [Indexed: 12/17/2022]
|
196
|
Lim J, Hwang SY, Moon S, Kim S, Kim WY. Scaffold-based molecular design with a graph generative model. Chem Sci 2019; 11:1153-1164. [PMID: 34084372 PMCID: PMC8146476 DOI: 10.1039/c9sc04503a] [Citation(s) in RCA: 67] [Impact Index Per Article: 13.4] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2019] [Accepted: 12/03/2019] [Indexed: 01/02/2023] Open
Abstract
Searching for new molecules in areas like drug discovery often starts from the core structures of known molecules. Such a method has called for a strategy of designing derivative compounds retaining a particular scaffold as a substructure. On this account, our present work proposes a graph generative model that targets its use in scaffold-based molecular design. Our model accepts a molecular scaffold as input and extends it by sequentially adding atoms and bonds. The generated molecules are then guaranteed to contain the scaffold with certainty, and their properties can be controlled by conditioning the generation process on desired properties. The learned rule of extending molecules can well generalize to arbitrary kinds of scaffolds, including those unseen during learning. In the conditional generation of molecules, our model can simultaneously control multiple chemical properties despite the search space constrained by fixing the substructure. As a demonstration, we applied our model to designing inhibitors of the epidermal growth factor receptor and show that our model can employ a simple semi-supervised extension to broaden its applicability to situations where only a small amount of data is available.
Collapse
Affiliation(s)
- Jaechang Lim
- Department of Chemistry, KAIST 291 Daehak-ro, Yuseong-gu Daejeon 34141 Republic of Korea
| | - Sang-Yeon Hwang
- Department of Chemistry, KAIST 291 Daehak-ro, Yuseong-gu Daejeon 34141 Republic of Korea
| | - Seokhyun Moon
- Department of Chemistry, KAIST 291 Daehak-ro, Yuseong-gu Daejeon 34141 Republic of Korea
| | - Seungsu Kim
- School of Computing, KAIST 291 Daehak-ro, Yuseong-gu Daejeon 34141 Republic of Korea
| | - Woo Youn Kim
- Department of Chemistry, KAIST 291 Daehak-ro, Yuseong-gu Daejeon 34141 Republic of Korea
- KI for Artificial Intelligence, KAIST 291 Daehak-ro, Yuseong-gu Daejeon 34141 Republic of Korea
| |
Collapse
|
197
|
Fukunaga I, Sawada R, Shibata T, Kaitoh K, Sakai Y, Yamanishi Y. Prediction of the Health Effects of Food Peptides and Elucidation of the Mode-of-action Using Multi-task Graph Convolutional Neural Network. Mol Inform 2019; 39:e1900134. [PMID: 31778042 DOI: 10.1002/minf.201900134] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2019] [Accepted: 11/13/2019] [Indexed: 12/29/2022]
Abstract
Food proteins work not only as nutrients but also modulators for the physiological functions of the human body. The physiological functions of food proteins are basically regulated by peptides encrypted in food protein sequences (food peptides). In this study, we propose a novel deep learning-based method to predict the health effects of food peptides and elucidate the mode-of-action. In the algorithm, we estimate potential target proteins of food peptides using a multi-task graph convolutional neural network, and predict its health effects using information about therapeutic targets for diseases. We constructed predictive models based on 21,103 peptide-protein interactions involving 10,950 peptides and 2,533 proteins, and applied the models to food peptides (e. g., lactotripeptide, isoleucyltyrosine and sardine peptide) defined in food for specified health use. The models suggested potential effects such as blood-pressure lowering effects, blood glucose level lowering effects, and anti-cancer effects for several food peptides. The interactions of food peptides with target proteins were confirmed by docking simulations.
Collapse
Affiliation(s)
- Itsuki Fukunaga
- Department of Bioscience and Bioinformatics, Faculty of Computer Science and Systems Engineering, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka, Fukuoka, 820-8502, Japan
| | - Ryusuke Sawada
- Department of Bioscience and Bioinformatics, Faculty of Computer Science and Systems Engineering, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka, Fukuoka, 820-8502, Japan
| | - Tomokazu Shibata
- Department of Bioscience and Bioinformatics, Faculty of Computer Science and Systems Engineering, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka, Fukuoka, 820-8502, Japan
| | - Kazuma Kaitoh
- Department of Bioscience and Bioinformatics, Faculty of Computer Science and Systems Engineering, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka, Fukuoka, 820-8502, Japan
| | - Yukie Sakai
- Department of Bioscience and Bioinformatics, Faculty of Computer Science and Systems Engineering, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka, Fukuoka, 820-8502, Japan
| | - Yoshihiro Yamanishi
- Department of Bioscience and Bioinformatics, Faculty of Computer Science and Systems Engineering, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka, Fukuoka, 820-8502, Japan
| |
Collapse
|
198
|
Cova TFGG, Pais AACC. Deep Learning for Deep Chemistry: Optimizing the Prediction of Chemical Patterns. Front Chem 2019; 7:809. [PMID: 32039134 PMCID: PMC6988795 DOI: 10.3389/fchem.2019.00809] [Citation(s) in RCA: 60] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2019] [Accepted: 11/11/2019] [Indexed: 12/14/2022] Open
Abstract
Computational Chemistry is currently a synergistic assembly between ab initio calculations, simulation, machine learning (ML) and optimization strategies for describing, solving and predicting chemical data and related phenomena. These include accelerated literature searches, analysis and prediction of physical and quantum chemical properties, transition states, chemical structures, chemical reactions, and also new catalysts and drug candidates. The generalization of scalability to larger chemical problems, rather than specialization, is now the main principle for transforming chemical tasks in multiple fronts, for which systematic and cost-effective solutions have benefited from ML approaches, including those based on deep learning (e.g. quantum chemistry, molecular screening, synthetic route design, catalysis, drug discovery). The latter class of ML algorithms is capable of combining raw input into layers of intermediate features, enabling bench-to-bytes designs with the potential to transform several chemical domains. In this review, the most exciting developments concerning the use of ML in a range of different chemical scenarios are described. A range of different chemical problems and respective rationalization, that have hitherto been inaccessible due to the lack of suitable analysis tools, is thus detailed, evidencing the breadth of potential applications of these emerging multidimensional approaches. Focus is given to the models, algorithms and methods proposed to facilitate research on compound design and synthesis, materials design, prediction of binding, molecular activity, and soft matter behavior. The information produced by pairing Chemistry and ML, through data-driven analyses, neural network predictions and monitoring of chemical systems, allows (i) prompting the ability to understand the complexity of chemical data, (ii) streamlining and designing experiments, (ii) discovering new molecular targets and materials, and also (iv) planning or rethinking forthcoming chemical challenges. In fact, optimization engulfs all these tasks directly.
Collapse
Affiliation(s)
- Tânia F. G. G. Cova
- Coimbra Chemistry Centre, CQC, Department of Chemistry, Faculty of Sciences and Technology, University of Coimbra, Coimbra, Portugal
| | - Alberto A. C. C. Pais
- Coimbra Chemistry Centre, CQC, Department of Chemistry, Faculty of Sciences and Technology, University of Coimbra, Coimbra, Portugal
| |
Collapse
|
199
|
Kwon Y, Yoo J, Choi YS, Son WJ, Lee D, Kang S. Efficient learning of non-autoregressive graph variational autoencoders for molecular graph generation. J Cheminform 2019; 11:70. [PMID: 33430985 PMCID: PMC6873411 DOI: 10.1186/s13321-019-0396-x] [Citation(s) in RCA: 32] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2019] [Accepted: 11/13/2019] [Indexed: 11/10/2022] Open
Abstract
With the advancements in deep learning, deep generative models combined with graph neural networks have been successfully employed for data-driven molecular graph generation. Early methods based on the non-autoregressive approach have been effective in generating molecular graphs quickly and efficiently but have suffered from low performance. In this paper, we present an improved learning method involving a graph variational autoencoder for efficient molecular graph generation in a non-autoregressive manner. We introduce three additional learning objectives and incorporate them into the training of the model: approximate graph matching, reinforcement learning, and auxiliary property prediction. We demonstrate the effectiveness of the proposed method by evaluating it for molecular graph generation tasks using QM9 and ZINC datasets. The model generates molecular graphs with high chemical validity and diversity compared with existing non-autoregressive methods. It can also conditionally generate molecular graphs satisfying various target conditions.
Collapse
Affiliation(s)
- Youngchun Kwon
- Samsung Advanced Institute of Technology, Samsung Electronics Co. Ltd., 130 Samsung-ro, Yeongtong-gu, Suwon, Republic of Korea
- Department of Computer Science and Engineering, Seoul National University, 1 Gwanak-ro, Gwanak-gu, Seoul, Republic of Korea
| | - Jiho Yoo
- Samsung Advanced Institute of Technology, Samsung Electronics Co. Ltd., 130 Samsung-ro, Yeongtong-gu, Suwon, Republic of Korea
| | - Youn-Suk Choi
- Samsung Advanced Institute of Technology, Samsung Electronics Co. Ltd., 130 Samsung-ro, Yeongtong-gu, Suwon, Republic of Korea
| | - Won-Joon Son
- Samsung Advanced Institute of Technology, Samsung Electronics Co. Ltd., 130 Samsung-ro, Yeongtong-gu, Suwon, Republic of Korea
| | - Dongseon Lee
- Samsung Advanced Institute of Technology, Samsung Electronics Co. Ltd., 130 Samsung-ro, Yeongtong-gu, Suwon, Republic of Korea
| | - Seokho Kang
- Department of Systems Management Engineering, Sungkyunkwan University, 2066 Seobu-ro, Jangan-gu, Suwon, Republic of Korea.
| |
Collapse
|
200
|
Peng SP, Zhao Y. Convolutional Neural Networks for the Design and Analysis of Non-Fullerene Acceptors. J Chem Inf Model 2019; 59:4993-5001. [DOI: 10.1021/acs.jcim.9b00732] [Citation(s) in RCA: 25] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
- Shi-Ping Peng
- State Key Laboratory for Physical Chemistry of Solid Surfaces, Collaborative Innovation Center of Chemistry for Energy Materials, Fujian Provincial Key Lab of Theoretical and Computational Chemistry, and College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, China
| | - Yi Zhao
- State Key Laboratory for Physical Chemistry of Solid Surfaces, Collaborative Innovation Center of Chemistry for Energy Materials, Fujian Provincial Key Lab of Theoretical and Computational Chemistry, and College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, China
| |
Collapse
|