1
|
Suzuki T, Ma D, Yasuo N, Sekijima M. Mothra: Multiobjective de novo Molecular Generation Using Monte Carlo Tree Search. J Chem Inf Model 2024; 64:7291-7302. [PMID: 39317969 DOI: 10.1021/acs.jcim.4c00759] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/26/2024]
Abstract
In the field of drug discovery, identifying compounds that satisfy multiple criteria, such as target protein affinity, pharmacokinetics, and membrane permeability, is challenging because of the vast chemical space. Until now, multiobjective optimization via generative models has often involved linear combinations of different reward functions. Linear combinations solve multiobjective optimization problems by turning multiobjective optimization into a single-objective task and causing problems with weighting for each objective. Herein, we propose a scalable multiobjective molecular generative model developed using deep learning techniques. This model integrates the capabilities of recurrent neural networks for molecular generation and Pareto multiobjective Monte Carlo tree search to determine the optimal search direction. Through this integration, our model can generate compounds using enhanced evaluation functions that include important aspects like target protein affinity, drug similarity, and toxicity. The proposed model addresses the limitations of previous linear combination methods, and its effectiveness is demonstrated via extensive experimentation. The improvements achieved in the evaluation metrics underscore the potential utility of our approach toward drug discovery applications. In addition, we provide the source code for our model such that researchers can easily access and use our framework in their own investigations. The source code and pretrained model for Mothra, developed in this study, along with the Docker image for the Pareto front explorer and compound picker, designed to streamline the selection and visualization of optimal chemical compounds, are released under the GNU General Public License v3.0 and available at https://github.com/sekijima-lab/Mothra.
Collapse
Affiliation(s)
- Takamasa Suzuki
- Department of Computer Science, Tokyo Institute of Technology, Yokohama, Kanagawa 226-8501Japan
| | - Dian Ma
- Department of Computer Science, Tokyo Institute of Technology, Yokohama, Kanagawa 226-8501Japan
| | - Nobuaki Yasuo
- Tokyo Tech Academy for Convergence of Materials and Informatics (TAC-MI), Tokyo Institute of Technology, Tokyo 152-8550, Japan
| | - Masakazu Sekijima
- Department of Computer Science, Tokyo Institute of Technology, Yokohama, Kanagawa 226-8501Japan
| |
Collapse
|
2
|
Quinn TR, Giblin KA, Thomson C, Boerth JA, Bommakanti G, Braybrooke E, Chan C, Chinn AJ, Code E, Cui C, Fan Y, Grimster NP, Kohara K, Lamb ML, Ma L, Mfuh AM, Robb GR, Robbins KJ, Schimpl M, Tang H, Ware J, Wrigley GL, Xue L, Zhang Y, Zhu H, Hughes SJ. Accelerated Discovery of Carbamate Cbl-b Inhibitors Using Generative AI Models and Structure-Based Drug Design. J Med Chem 2024; 67:14210-14233. [PMID: 39132828 DOI: 10.1021/acs.jmedchem.4c01034] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/13/2024]
Abstract
Casitas B-lymphoma proto-oncogene-b (Cbl-b) is a RING finger E3 ligase that has an important role in effector T cell function, acting as a negative regulator of T cell, natural killer (NK) cell, and B cell activation. A discovery effort toward Cbl-b inhibitors was pursued in which a generative AI design engine, REINVENT, was combined with a medicinal chemistry structure-based design to discover novel inhibitors of Cbl-b. Key to the success of this effort was the evolution of the "Design" phase of the Design-Make-Test-Analyze cycle to involve iterative rounds of an in silico structure-based drug design, strongly guided by physics-based affinity prediction and machine learning DMPK predictive models, prior to selection for synthesis. This led to the accelerated discovery of a potent series of carbamate Cbl-b inhibitors.
Collapse
Affiliation(s)
- Taylor R Quinn
- Early TDE Discovery, Oncology R&D, AstraZeneca, 35 Gatehouse Drive, Waltham, Massachusetts 02451, United States
| | - Kathryn A Giblin
- Early TDE Discovery, Oncology R&D, AstraZeneca, 1 Francis Crick Avenue, Cambridge CB2 0AA, U.K
| | - Clare Thomson
- Early TDE Discovery, Oncology R&D, AstraZeneca, 1 Francis Crick Avenue, Cambridge CB2 0AA, U.K
| | - Jeffrey A Boerth
- Early TDE Discovery, Oncology R&D, AstraZeneca, 35 Gatehouse Drive, Waltham, Massachusetts 02451, United States
| | - Gayathri Bommakanti
- Early TDE Discovery, Oncology R&D, AstraZeneca, 35 Gatehouse Drive, Waltham, Massachusetts 02451, United States
| | - Erin Braybrooke
- Early TDE Discovery, Oncology R&D, AstraZeneca, 1 Francis Crick Avenue, Cambridge CB2 0AA, U.K
| | - Christina Chan
- Early TDE Discovery, Oncology R&D, AstraZeneca, 1 Francis Crick Avenue, Cambridge CB2 0AA, U.K
| | - Alex J Chinn
- Early TDE Discovery, Oncology R&D, AstraZeneca, 35 Gatehouse Drive, Waltham, Massachusetts 02451, United States
| | - Erin Code
- Discovery Sciences, R&D, AstraZeneca, 35 Gatehouse Drive, Waltham, Massachusetts 02451, United States
| | - Caifeng Cui
- Pharmaron Beijing Co., Ltd., 6 Taihe Road, BDA, Beijing 100176, P. R. China
| | - Yukai Fan
- Pharmaron Beijing Co., Ltd., 6 Taihe Road, BDA, Beijing 100176, P. R. China
| | - Neil P Grimster
- Early TDE Discovery, Oncology R&D, AstraZeneca, 35 Gatehouse Drive, Waltham, Massachusetts 02451, United States
| | - Keishi Kohara
- Early TDE Discovery, Oncology R&D, AstraZeneca, 1 Francis Crick Avenue, Cambridge CB2 0AA, U.K
| | - Michelle L Lamb
- Early TDE Discovery, Oncology R&D, AstraZeneca, 35 Gatehouse Drive, Waltham, Massachusetts 02451, United States
| | - Lina Ma
- Pharmaron Beijing Co., Ltd., 6 Taihe Road, BDA, Beijing 100176, P. R. China
| | - Adelphe M Mfuh
- Early TDE Discovery, Oncology R&D, AstraZeneca, 35 Gatehouse Drive, Waltham, Massachusetts 02451, United States
| | - Graeme R Robb
- Early TDE Discovery, Oncology R&D, AstraZeneca, 1 Francis Crick Avenue, Cambridge CB2 0AA, U.K
| | - Kevin J Robbins
- Early TDE Discovery, Oncology R&D, AstraZeneca, 35 Gatehouse Drive, Waltham, Massachusetts 02451, United States
| | - Marianne Schimpl
- Discovery Sciences, R&D, AstraZeneca, 1 Francis Crick Avenue, Cambridge CB2 0AA, U.K
| | - Haoran Tang
- Discovery Sciences, R&D, AstraZeneca, 1 Francis Crick Avenue, Cambridge CB2 0AA, U.K
| | - Jamie Ware
- Discovery Sciences, R&D, AstraZeneca, 1 Francis Crick Avenue, Cambridge CB2 0AA, U.K
| | - Gail L Wrigley
- Early TDE Discovery, Oncology R&D, AstraZeneca, 1 Francis Crick Avenue, Cambridge CB2 0AA, U.K
| | - Lin Xue
- Pharmaron Beijing Co., Ltd., 6 Taihe Road, BDA, Beijing 100176, P. R. China
| | - Yun Zhang
- Early TDE Discovery, Oncology R&D, AstraZeneca, 35 Gatehouse Drive, Waltham, Massachusetts 02451, United States
| | - Huimin Zhu
- Pharmaron Beijing Co., Ltd., 6 Taihe Road, BDA, Beijing 100176, P. R. China
| | - Samantha J Hughes
- Early TDE Discovery, Oncology R&D, AstraZeneca, 1 Francis Crick Avenue, Cambridge CB2 0AA, U.K
| |
Collapse
|
3
|
Gao X, Baimacheva N, Aires-de-Sousa J. Exploring Molecular Heteroencoders with Latent Space Arithmetic: Atomic Descriptors and Molecular Operators. Molecules 2024; 29:3969. [PMID: 39203047 PMCID: PMC11357237 DOI: 10.3390/molecules29163969] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2024] [Revised: 08/04/2024] [Accepted: 08/06/2024] [Indexed: 09/03/2024] Open
Abstract
A variational heteroencoder based on recurrent neural networks, trained with SMILES linear notations of molecular structures, was used to derive the following atomic descriptors: delta latent space vectors (DLSVs) obtained from the original SMILES of the whole molecule and the SMILES of the same molecule with the target atom replaced. Different replacements were explored, namely, changing the atomic element, replacement with a character of the model vocabulary not used in the training set, or the removal of the target atom from the SMILES. Unsupervised mapping of the DLSV descriptors with t-distributed stochastic neighbor embedding (t-SNE) revealed a remarkable clustering according to the atomic element, hybridization, atomic type, and aromaticity. Atomic DLSV descriptors were used to train machine learning (ML) models to predict 19F NMR chemical shifts. An R2 of up to 0.89 and mean absolute errors of up to 5.5 ppm were obtained for an independent test set of 1046 molecules with random forests or a gradient-boosting regressor. Intermediate representations from a Transformer model yielded comparable results. Furthermore, DLSVs were applied as molecular operators in the latent space: the DLSV of a halogenation (H→F substitution) was summed to the LSVs of 4135 new molecules with no fluorine atom and decoded into SMILES, yielding 99% of valid SMILES, with 75% of the SMILES incorporating fluorine and 56% of the structures incorporating fluorine with no other structural change.
Collapse
Affiliation(s)
- Xinyue Gao
- Faculty of Sciences, Université Paris Cité, 75013 Paris, France
| | - Natalia Baimacheva
- Faculty of Chemistry, University of Strasbourg, 4, Blaise Pascal Str., 67081 Strasbourg, France
| | - Joao Aires-de-Sousa
- LAQV and REQUIMTE, Chemistry Department, NOVA School of Science and Technology, Universidade Nova de Lisboa, 2829-516 Caparica, Portugal
| |
Collapse
|
4
|
Urbina F, Jones T, Harris JS, Snyder SH, Lane TR, Ekins S. Predicting the Hallucinogenic Potential of Molecules Using Artificial Intelligence. ACS Chem Neurosci 2024; 15:3078-3089. [PMID: 39092989 PMCID: PMC11338697 DOI: 10.1021/acschemneuro.4c00405] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/04/2024] Open
Abstract
The development of new drugs addressing serious mental health and other disorders should avoid the psychedelic experience. Analogs of psychedelic drugs can have clinical utility and are termed "psychoplastogens". These represent promising candidates for treating opioid use disorder to reduce drug dependence, with rarely reported serious adverse effects. This drug abuse cessation is linked to the induction of neuritogenesis and increased neuroplasticity, a hallmark of psychedelic molecules, such as lysergic acid diethylamine. Some, but not all psychoplastogens may act through the G-protein coupled receptor (GPCR) 5HT2A whereas others may display very different polypharmacology making prediction of hallucinogenic potential challenging. In the process of developing tools to help design new psychoplastogens, we have used artificial intelligence in the form of machine learning classification models for predicting psychedelic effects using a published in vitro data set from PsychLight (support vector classification (SVC), area under the curve (AUC) 0.74) and in vivo human data derived from books from Shulgin and Shulgin (SVC, AUC, 0.72) with nested five-fold cross validation. We have also explored conformal predictors with ECFP6 and electrostatic descriptors in an effort to optimize them. These models have been used to predict known 5HT2A agonists to assess their potential to act as psychedelics and induce hallucinations for PsychLight (SVC, AUC 0.97) and Shulgin and Shulgin (random forest, AUC 0.71). We have tested these models with head twitch data from the mouse. This predictive capability is desirable to reliably design new psychoplastogens that lack in vivo hallucinogenic potential and help assess existing and future molecules for this potential. These efforts also provide useful insights into understanding the psychedelic structure activity relationship.
Collapse
Affiliation(s)
- Fabio Urbina
- Collaborations Pharmaceuticals, Inc., 840 Main Campus Drive, Lab 3510, Raleigh, NC 27606, USA
| | - Thane Jones
- Collaborations Pharmaceuticals, Inc., 840 Main Campus Drive, Lab 3510, Raleigh, NC 27606, USA
| | - Joshua S. Harris
- Collaborations Pharmaceuticals, Inc., 840 Main Campus Drive, Lab 3510, Raleigh, NC 27606, USA
| | - Scott H. Snyder
- Collaborations Pharmaceuticals, Inc., 840 Main Campus Drive, Lab 3510, Raleigh, NC 27606, USA
| | - Thomas R. Lane
- Collaborations Pharmaceuticals, Inc., 840 Main Campus Drive, Lab 3510, Raleigh, NC 27606, USA
| | - Sean Ekins
- Collaborations Pharmaceuticals, Inc., 840 Main Campus Drive, Lab 3510, Raleigh, NC 27606, USA
| |
Collapse
|
5
|
Cremer J, Le T, Noé F, Clevert DA, Schütt KT. PILOT: equivariant diffusion for pocket-conditioned de novo ligand generation with multi-objective guidance via importance sampling. Chem Sci 2024:d4sc03523b. [PMID: 39211741 PMCID: PMC11348832 DOI: 10.1039/d4sc03523b] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2024] [Accepted: 08/19/2024] [Indexed: 09/04/2024] Open
Abstract
The generation of ligands that both are tailored to a given protein pocket and exhibit a range of desired chemical properties is a major challenge in structure-based drug design. Here, we propose an in silico approach for the de novo generation of 3D ligand structures using the equivariant diffusion model PILOT, combining pocket conditioning with a large-scale pre-training and property guidance. Its multi-objective trajectory-based importance sampling strategy is designed to direct the model towards molecules that not only exhibit desired characteristics such as increased binding affinity for a given protein pocket but also maintains high synthetic accessibility. This ensures the practicality of sampled molecules, thus maximizing their potential for the drug discovery pipeline. PILOT significantly outperforms existing methods across various metrics on the common benchmark dataset CrossDocked2020. Moreover, we employ PILOT to generate novel ligands for unseen protein pockets from the Kinodata-3D dataset, which encompasses a substantial portion of the human kinome. The generated structures exhibit predicted IC50 values indicative of potent biological activity, which highlights the potential of PILOT as a powerful tool for structure-based drug design.
Collapse
Affiliation(s)
- Julian Cremer
- Machine Learning & Computational Sciences, Pfizer Worldwide R&D Berlin Germany
- Computational Science Laboratory, Universitat Pompeu Fabra, PRBB Spain
| | - Tuan Le
- Machine Learning & Computational Sciences, Pfizer Worldwide R&D Berlin Germany
- Department of Mathematics and Computer Science, Freie Universität Berlin Germany
| | - Frank Noé
- Department of Mathematics and Computer Science, Freie Universität Berlin Germany
- Microsoft Research AI4Science, Microsoft Berlin Germany
| | - Djork-Arné Clevert
- Machine Learning & Computational Sciences, Pfizer Worldwide R&D Berlin Germany
| | - Kristof T Schütt
- Machine Learning & Computational Sciences, Pfizer Worldwide R&D Berlin Germany
| |
Collapse
|
6
|
Xia X, Liu Y, Zheng C, Zhang X, Wu Q, Gao X, Zeng X, Su Y. Evolutionary Multiobjective Molecule Optimization in an Implicit Chemical Space. J Chem Inf Model 2024; 64:5161-5174. [PMID: 38870455 PMCID: PMC11235097 DOI: 10.1021/acs.jcim.4c00031] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2024] [Revised: 05/08/2024] [Accepted: 05/13/2024] [Indexed: 06/15/2024]
Abstract
Optimization techniques play a pivotal role in advancing drug development, serving as the foundation of numerous generative methods tailored to efficiently design optimized molecules derived from existing lead compounds. However, existing methods often encounter difficulties in generating diverse, novel, and high-property molecules that simultaneously optimize multiple drug properties. To overcome this bottleneck, we propose a multiobjective molecule optimization framework (MOMO). MOMO employs a specially designed Pareto-based multiproperty evaluation strategy at the molecular sequence level to guide the evolutionary search in an implicit chemical space. A comparative analysis of MOMO with five state-of-the-art methods across two benchmark multiproperty molecule optimization tasks reveals that MOMO markedly outperforms them in terms of diversity, novelty, and optimized properties. The practical applicability of MOMO in drug discovery has also been validated on four challenging tasks in the real-world discovery problem. These results suggest that MOMO can provide a useful tool to facilitate molecule optimization problems with multiple properties.
Collapse
Affiliation(s)
- Xin Xia
- The
Key Laboratory of Intelligent Computing and Signal Processing of Ministry
of Education, School of Artificial Intelligence, Anhui University, Hefei 230601, China
- Institute
of Artificial Intelligence, Hefei Comprehensive
National Science Center, 5089 Wangjiang West Road, Hefei 230088, AnhuiChina
| | - Yiping Liu
- College
of Computer Science and Electronic Engineering, Hunan University, Changsha 410012, China
| | - Chunhou Zheng
- The
Key Laboratory of Intelligent Computing and Signal Processing of Ministry
of Education, School of Artificial Intelligence, Anhui University, Hefei 230601, China
| | - Xingyi Zhang
- The
Key Laboratory of Intelligent Computing and Signal Processing of Ministry
of Education, School of Artificial Intelligence, Anhui University, Hefei 230601, China
| | - Qingwen Wu
- The
Key Laboratory of Intelligent Computing and Signal Processing of Ministry
of Education, School of Artificial Intelligence, Anhui University, Hefei 230601, China
| | - Xin Gao
- Computer
Science Program, Computer, Electrical and Mathematical Sciences and
Engineering Division, Computational Bioscience Research Center, King Abdullah University of Science and Technology
(KAUST), Thuwal 23955-6900, Kingdom
of Saudi Arabia
| | - Xiangxiang Zeng
- College
of Computer Science and Electronic Engineering, Hunan University, Changsha 410012, China
| | - Yansen Su
- The
Key Laboratory of Intelligent Computing and Signal Processing of Ministry
of Education, School of Artificial Intelligence, Anhui University, Hefei 230601, China
- Institute
of Artificial Intelligence, Hefei Comprehensive
National Science Center, 5089 Wangjiang West Road, Hefei 230088, AnhuiChina
| |
Collapse
|
7
|
Dutschmann TM, Schlenker V, Baumann K. Chemoinformatic regression methods and their applicability domain. Mol Inform 2024; 43:e202400018. [PMID: 38803302 DOI: 10.1002/minf.202400018] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2024] [Revised: 03/24/2024] [Accepted: 03/25/2024] [Indexed: 05/29/2024]
Abstract
The growing interest in chemoinformatic model uncertainty calls for a summary of the most widely used regression techniques and how to estimate their reliability. Regression models learn a mapping from the space of explanatory variables to the space of continuous output values. Among other limitations, the predictive performance of the model is restricted by the training data used for model fitting. Identification of unusual objects by outlier detection methods can improve model performance. Additionally, proper model evaluation necessitates defining the limitations of the model, often called the applicability domain. Comparable to certain classifiers, some regression techniques come with built-in methods or augmentations to quantify their (un)certainty, while others rely on generic procedures. The theoretical background of their working principles and how to deduce specific and general definitions for their domain of applicability shall be explained.
Collapse
Affiliation(s)
- Thomas-Martin Dutschmann
- Institute of Medicinal and Pharmaceutical Chemistry, University of Technology Braunschweig, 38106, Braunschweig, Germany
| | - Valerie Schlenker
- Institute of Medicinal and Pharmaceutical Chemistry, University of Technology Braunschweig, 38106, Braunschweig, Germany
| | - Knut Baumann
- Institute of Medicinal and Pharmaceutical Chemistry, University of Technology Braunschweig, 38106, Braunschweig, Germany
| |
Collapse
|
8
|
Aksamit N, Hou J, Li Y, Ombuki-Berman B. Integrating transformers and many-objective optimization for drug design. BMC Bioinformatics 2024; 25:208. [PMID: 38849719 PMCID: PMC11161990 DOI: 10.1186/s12859-024-05822-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2024] [Accepted: 05/30/2024] [Indexed: 06/09/2024] Open
Abstract
BACKGROUND Drug design is a challenging and important task that requires the generation of novel and effective molecules that can bind to specific protein targets. Artificial intelligence algorithms have recently showed promising potential to expedite the drug design process. However, existing methods adopt multi-objective approaches which limits the number of objectives. RESULTS In this paper, we expand this thread of research from the many-objective perspective, by proposing a novel framework that integrates a latent Transformer-based model for molecular generation, with a drug design system that incorporates absorption, distribution, metabolism, excretion, and toxicity prediction, molecular docking, and many-objective metaheuristics. We compared the performance of two latent Transformer models (ReLSO and FragNet) on a molecular generation task and show that ReLSO outperforms FragNet in terms of reconstruction and latent space organization. We then explored six different many-objective metaheuristics based on evolutionary algorithms and particle swarm optimization on a drug design task involving potential drug candidates to human lysophosphatidic acid receptor 1, a cancer-related protein target. CONCLUSION We show that multi-objective evolutionary algorithm based on dominance and decomposition performs the best in terms of finding molecules that satisfy many objectives, such as high binding affinity and low toxicity, and high drug-likeness. Our framework demonstrates the potential of combining Transformers and many-objective computational intelligence for drug design.
Collapse
Affiliation(s)
- Nicholas Aksamit
- Department of Computer Science, Brock University, 1812 Sir Isaac Brock Way, St. Catharines, ON, L2S 3A1, Canada
| | - Jinqiang Hou
- Department of Chemistry, Lakehead University, 955 Oliver Road, Thunder Bay, ON, P7B 5E1, Canada
- Thunder Bay Regional Health Research Institute, 980 Oliver Road, Thunder Bay, ON, P7B 6V4, Canada
| | - Yifeng Li
- Department of Computer Science, Brock University, 1812 Sir Isaac Brock Way, St. Catharines, ON, L2S 3A1, Canada.
- Department of Biological Sciences, Brock University, 1812 Sir Isaac Brock Way, St. Catharines, ON, L2S 3A1, Canada.
| | - Beatrice Ombuki-Berman
- Department of Computer Science, Brock University, 1812 Sir Isaac Brock Way, St. Catharines, ON, L2S 3A1, Canada.
| |
Collapse
|
9
|
Pang C, Qiao J, Zeng X, Zou Q, Wei L. Deep Generative Models in De Novo Drug Molecule Generation. J Chem Inf Model 2024; 64:2174-2194. [PMID: 37934070 DOI: 10.1021/acs.jcim.3c01496] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2023]
Abstract
The discovery of new drugs has important implications for human health. Traditional methods for drug discovery rely on experiments to optimize the structure of lead molecules, which are time-consuming and high-cost. Recently, artificial intelligence has exhibited promising and efficient performance for drug-like molecule generation. In particular, deep generative models achieve great success in de novo generation of drug-like molecules with desired properties, showing massive potential for novel drug discovery. In this study, we review the recent progress of molecule generation using deep generative models, mainly focusing on molecule representations, public databases, data processing tools, and advanced artificial intelligence based molecule generation frameworks. In particular, we present a comprehensive comparison of state-of-the-art deep generative models for molecule generation and a summary of commonly used molecular design strategies. We identify research gaps and challenges of molecule generation such as the need for better databases, missing 3D information in molecular representation, and the lack of high-precision evaluation metrics. We suggest future directions for molecular generation and drug discovery.
Collapse
Affiliation(s)
- Chao Pang
- School of Software, Shandong University, Jinan 250100, China
- Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Shandong University, Jinan 250100, China
| | - Jianbo Qiao
- School of Software, Shandong University, Jinan 250100, China
- Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Shandong University, Jinan 250100, China
| | - Xiangxiang Zeng
- College of Information Science and Engineering, Hunan University, Changsha 410082, China
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Leyi Wei
- School of Software, Shandong University, Jinan 250100, China
- Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Shandong University, Jinan 250100, China
| |
Collapse
|
10
|
Unke OT, Stöhr M, Ganscha S, Unterthiner T, Maennel H, Kashubin S, Ahlin D, Gastegger M, Medrano Sandonas L, Berryman JT, Tkatchenko A, Müller KR. Biomolecular dynamics with machine-learned quantum-mechanical force fields trained on diverse chemical fragments. SCIENCE ADVANCES 2024; 10:eadn4397. [PMID: 38579003 DOI: 10.1126/sciadv.adn4397] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/10/2023] [Accepted: 02/29/2024] [Indexed: 04/07/2024]
Abstract
The GEMS method enables molecular dynamics simulations of large heterogeneous systems at ab initio quality.
Collapse
Affiliation(s)
- Oliver T Unke
- Google DeepMind, Tucholskystraße 2, 10117 Berlin, Germany and Brandschenkestrasse 110, 8002 Zürich, Switzerland
- Machine Learning Group, Technische Universität Berlin, 10587 Berlin, Germany
- DFG Cluster of Excellence "Unifying Systems in Catalysis" (UniSysCat), Technische Universität Berlin, 10623 Berlin, Germany
| | - Martin Stöhr
- Department of Physics and Materials Science, University of Luxembourg, L-1511 Luxembourg City, Luxembourg
| | - Stefan Ganscha
- Google DeepMind, Tucholskystraße 2, 10117 Berlin, Germany and Brandschenkestrasse 110, 8002 Zürich, Switzerland
| | - Thomas Unterthiner
- Google DeepMind, Tucholskystraße 2, 10117 Berlin, Germany and Brandschenkestrasse 110, 8002 Zürich, Switzerland
| | - Hartmut Maennel
- Google DeepMind, Tucholskystraße 2, 10117 Berlin, Germany and Brandschenkestrasse 110, 8002 Zürich, Switzerland
| | - Sergii Kashubin
- Google DeepMind, Tucholskystraße 2, 10117 Berlin, Germany and Brandschenkestrasse 110, 8002 Zürich, Switzerland
| | - Daniel Ahlin
- Google DeepMind, Tucholskystraße 2, 10117 Berlin, Germany and Brandschenkestrasse 110, 8002 Zürich, Switzerland
| | - Michael Gastegger
- Machine Learning Group, Technische Universität Berlin, 10587 Berlin, Germany
- DFG Cluster of Excellence "Unifying Systems in Catalysis" (UniSysCat), Technische Universität Berlin, 10623 Berlin, Germany
- BASLEARN - TU Berlin/BASF Joint Lab for Machine Learning, Technische Universität Berlin, 10587 Berlin, Germany
| | - Leonardo Medrano Sandonas
- Department of Physics and Materials Science, University of Luxembourg, L-1511 Luxembourg City, Luxembourg
| | - Joshua T Berryman
- Department of Physics and Materials Science, University of Luxembourg, L-1511 Luxembourg City, Luxembourg
| | - Alexandre Tkatchenko
- Department of Physics and Materials Science, University of Luxembourg, L-1511 Luxembourg City, Luxembourg
| | - Klaus-Robert Müller
- Google DeepMind, Tucholskystraße 2, 10117 Berlin, Germany and Brandschenkestrasse 110, 8002 Zürich, Switzerland
- Machine Learning Group, Technische Universität Berlin, 10587 Berlin, Germany
- Department of Artificial Intelligence, Korea University, Anam-dong, Seongbuk-gu, Seoul 02841, Korea
- Max Planck Institute for Informatics, Stuhlsatzenhausweg, 66123 Saarbrücken, Germany
- BIFOLD - Berlin Institute for the Foundations of Learning and Data, Berlin, Germany
| |
Collapse
|
11
|
Kneiding H, Nova A, Balcells D. Directional multiobjective optimization of metal complexes at the billion-system scale. NATURE COMPUTATIONAL SCIENCE 2024; 4:263-273. [PMID: 38553635 DOI: 10.1038/s43588-024-00616-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/01/2023] [Accepted: 02/29/2024] [Indexed: 04/14/2024]
Abstract
The discovery of transition metal complexes (TMCs) with optimal properties requires large ligand libraries and efficient multiobjective optimization algorithms. Here we provide the tmQMg-L library, containing 30k diverse and synthesizable ligands with robustly assigned charges and metal coordination modes. tmQMg-L enabled the generation of 1.37 million palladium TMCs, which were used to develop and benchmark the Pareto-Lighthouse multiobjective genetic algorithm (PL-MOGA). With fine control over aim and scope, this algorithm maximized both the polarizability and highest occupied molecular orbital-lowest unoccupied molecular orbital gap of the TMCs within selected regions of the Pareto front, without requiring prior knowledge on the objective limits. Instead of genetic operations on small ligand fragments, the PL-MOGA did whole-ligand mutation and crossover operations, which in chemical spaces containing billions of systems, yielded thousands of highly diverse TMCs in an interpretable manner.
Collapse
Affiliation(s)
- Hannes Kneiding
- Hylleraas Centre for Quantum Molecular Sciences, Department of Chemistry, University of Oslo, Oslo, Norway
| | - Ainara Nova
- Hylleraas Centre for Quantum Molecular Sciences, Department of Chemistry, University of Oslo, Oslo, Norway
- Centre for Materials Science and Nanotechnology, Department of Chemistry, University of Oslo, Oslo, Norway
| | - David Balcells
- Hylleraas Centre for Quantum Molecular Sciences, Department of Chemistry, University of Oslo, Oslo, Norway.
| |
Collapse
|
12
|
Zhang Y, Tong Y, Xia X, Wu Q, Su Y. A domain-label-guided translation model for molecular optimization. Methods 2024; 224:71-78. [PMID: 38395182 DOI: 10.1016/j.ymeth.2024.02.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2023] [Revised: 02/11/2024] [Accepted: 02/17/2024] [Indexed: 02/25/2024] Open
Abstract
Molecular optimization, which aims to improve molecular properties by modifying complex molecular structures, is a crucial and challenging task in drug discovery. In recent years, translation models provide a promising way to transform low-property molecules to high-property molecules, which enables molecular optimization to achieve remarkable progress. However, most existing models require matched molecular pairs, which are prone to be limited by the datasets. Although some models do not require matched molecular pairs, their performance is usually sacrificed due to the lack of useful supervising information. To address this issue, a domain-label-guided translation model is proposed in this paper, namely DLTM. In the model, the domain label information of molecules is exploited as a control condition to obtain different embedding representations, enabling the model to generate diverse molecules. Besides, the model adopts a classifier network to identify the property categories of transformed molecules, guiding the model to generate molecules with desired properties. The performance of DLTM is verified on two optimization tasks, namely the quantitative estimation of drug-likeness and penalized logP. Experimental results show that the proposed DLTM is superior to the compared baseline models.
Collapse
Affiliation(s)
- Yajie Zhang
- School of Computer Science and Technology, Anhui University, Hefei, 230601, China.
| | - Yongqi Tong
- School of Computer Science and Technology, Anhui University, Hefei, 230601, China.
| | - Xin Xia
- Institute of Artificial Intelligence, Hefei Comprehensive National Science Center, Hefei, 230088, China; School of Artificial Intelligence, Anhui University, Hefei, 230601, China.
| | - Qingwen Wu
- Affiliated Hospital of Jining Medical University, Jining, 272007, China.
| | - Yansen Su
- Institute of Artificial Intelligence, Hefei Comprehensive National Science Center, Hefei, 230088, China; School of Artificial Intelligence, Anhui University, Hefei, 230601, China.
| |
Collapse
|
13
|
Zhang C, Xie L, Lu X, Mao R, Xu L, Xu X. Developing an Improved Cycle Architecture for AI-Based Generation of New Structures Aimed at Drug Discovery. Molecules 2024; 29:1499. [PMID: 38611779 PMCID: PMC11013495 DOI: 10.3390/molecules29071499] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2024] [Revised: 03/18/2024] [Accepted: 03/21/2024] [Indexed: 04/14/2024] Open
Abstract
Drug discovery involves a crucial step of optimizing molecules with the desired structural groups. In the domain of computer-aided drug discovery, deep learning has emerged as a prominent technique in molecular modeling. Deep generative models, based on deep learning, play a crucial role in generating novel molecules when optimizing molecules. However, many existing molecular generative models have limitations as they solely process input information in a forward way. To overcome this limitation, we propose an improved generative model called BD-CycleGAN, which incorporates BiLSTM (bidirectional long short-term memory) and Mol-CycleGAN (molecular cycle generative adversarial network) to preserve the information of molecular input. To evaluate the proposed model, we assess its performance by analyzing the structural distribution and evaluation matrices of generated molecules in the process of structural transformation. The results demonstrate that the BD-CycleGAN model achieves a higher success rate and exhibits increased diversity in molecular generation. Furthermore, we demonstrate its application in molecular docking, where it successfully increases the docking score for the generated molecules. The proposed BD-CycleGAN architecture harnesses the power of deep learning to facilitate the generation of molecules with desired structural features, thus offering promising advancements in the field of drug discovery processes.
Collapse
Affiliation(s)
| | | | | | | | - Lei Xu
- Institute of Bioinformatics and Medical Engineering, School of Electrical and Information Engineering, Jiangsu University of Technology, Changzhou 213001, China; (C.Z.); (L.X.); (X.L.); (R.M.)
| | - Xiaojun Xu
- Institute of Bioinformatics and Medical Engineering, School of Electrical and Information Engineering, Jiangsu University of Technology, Changzhou 213001, China; (C.Z.); (L.X.); (X.L.); (R.M.)
| |
Collapse
|
14
|
Shen L, Fang J, Liu L, Yang F, Jenkins JL, Kutchukian PS, Wang H. Pocket Crafter: a 3D generative modeling based workflow for the rapid generation of hit molecules in drug discovery. J Cheminform 2024; 16:33. [PMID: 38515171 PMCID: PMC10958880 DOI: 10.1186/s13321-024-00829-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2023] [Accepted: 03/16/2024] [Indexed: 03/23/2024] Open
Abstract
We present a user-friendly molecular generative pipeline called Pocket Crafter, specifically designed to facilitate hit finding activity in the drug discovery process. This workflow utilized a three-dimensional (3D) generative modeling method Pocket2Mol, for the de novo design of molecules in spatial perspective for the targeted protein structures, followed by filters for chemical-physical properties and drug-likeness, structure-activity relationship analysis, and clustering to generate top virtual hit scaffolds. In our WDR5 case study, we acquired a focused set of 2029 compounds after a targeted searching within Novartis archived library based on the virtual scaffolds. Subsequently, we experimentally profiled these compounds, resulting in a novel chemical scaffold series that demonstrated activity in biochemical and biophysical assays. Pocket Crafter successfully prototyped an effective end-to-end 3D generative chemistry-based workflow for the exploration of new chemical scaffolds, which represents a promising approach in early drug discovery for hit identification.
Collapse
Affiliation(s)
- Lingling Shen
- Novartis Biomedical Research, Cambridge, MA, 02139, USA.
| | - Jian Fang
- Novartis Biomedical Research, Cambridge, MA, 02139, USA
| | - Lulu Liu
- Novartis Biomedical Research, Cambridge, MA, 02139, USA
| | - Fei Yang
- Novartis Biomedical Research, Cambridge, MA, 02139, USA
| | | | | | - He Wang
- Novartis Biomedical Research, Cambridge, MA, 02139, USA.
| |
Collapse
|
15
|
Claussen ER, Renfrew PD, Müller CL, Drew K. Scaffold Matcher: A CMA-ES based algorithm for identifying hotspot aligned peptidomimetic scaffolds. Proteins 2024; 92:343-355. [PMID: 37874196 PMCID: PMC10873094 DOI: 10.1002/prot.26619] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2023] [Accepted: 10/06/2023] [Indexed: 10/25/2023]
Abstract
The design of protein interaction inhibitors is a promising approach to address aberrant protein interactions that cause disease. One strategy in designing inhibitors is to use peptidomimetic scaffolds that mimic the natural interaction interface. A central challenge in using peptidomimetics as protein interaction inhibitors, however, is determining how best the molecular scaffold aligns to the residues of the interface it is attempting to mimic. Here we present the Scaffold Matcher algorithm that aligns a given molecular scaffold onto hotspot residues from a protein interaction interface. To optimize the degrees of freedom of the molecular scaffold we implement the covariance matrix adaptation evolution strategy (CMA-ES), a state-of-the-art derivative-free optimization algorithm in Rosetta. To evaluate the performance of the CMA-ES, we used 26 peptides from the FlexPepDock Benchmark and compared with three other algorithms in Rosetta, specifically, Rosetta's default minimizer, a Monte Carlo protocol of small backbone perturbations, and a Genetic algorithm. We test the algorithms' performance on their ability to align a molecular scaffold to a series of hotspot residues (i.e., constraints) along native peptides. Of the 4 methods, CMA-ES was able to find the lowest energy conformation for all 26 benchmark peptides. Additionally, as a proof of concept, we apply the Scaffold Match algorithm with CMA-ES to align a peptidomimetic oligooxopiperazine scaffold to the hotspot residues of the substrate of the main protease of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). Our implementation of CMA-ES into Rosetta allows for an alternative optimization method to be used on macromolecular modeling problems with rough energy landscapes. Finally, our Scaffold Matcher algorithm allows for the identification of initial conformations of interaction inhibitors that can be further designed and optimized as high-affinity reagents.
Collapse
Affiliation(s)
- Erin R. Claussen
- Department of Biological Sciences, University of Illinois
at Chicago, Chicago, Il, 60607, USA
| | - P. Douglas Renfrew
- Center for Computational Biology, Flatiron Institute, New
York, NY, 10010, USA
| | - Christian L. Müller
- Ludwig-Maximilians-Universität München
- Helmholtz Munich, München
- Center for Computational Mathematics, Flatiron Institute,
New York
| | - Kevin Drew
- Department of Biological Sciences, University of Illinois
at Chicago, Chicago, Il, 60607, USA
| |
Collapse
|
16
|
Kerstjens A, De Winter H. Molecule auto-correction to facilitate molecular design. J Comput Aided Mol Des 2024; 38:10. [PMID: 38363377 PMCID: PMC10873457 DOI: 10.1007/s10822-024-00549-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2023] [Accepted: 01/11/2024] [Indexed: 02/17/2024]
Abstract
Ensuring that computationally designed molecules are chemically reasonable is at best cumbersome. We present a molecule correction algorithm that morphs invalid molecular graphs into structurally related valid analogs. The algorithm is implemented as a tree search, guided by a set of policies to minimize its cost. We showcase how the algorithm can be applied to molecular design, either as a post-processing step or as an integral part of molecule generators.
Collapse
Affiliation(s)
- Alan Kerstjens
- Laboratory of Medicinal Chemistry, Department of Pharmaceutical Sciences, University of Antwerp, Universiteitslaan 1, 2610, Wilrijk, Belgium
| | - Hans De Winter
- Laboratory of Medicinal Chemistry, Department of Pharmaceutical Sciences, University of Antwerp, Universiteitslaan 1, 2610, Wilrijk, Belgium.
| |
Collapse
|
17
|
Kyro GW, Morgunov A, Brent RI, Batista VS. ChemSpaceAL: An Efficient Active Learning Methodology Applied to Protein-Specific Molecular Generation. J Chem Inf Model 2024; 64:653-665. [PMID: 38287889 DOI: 10.1021/acs.jcim.3c01456] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/31/2024]
Abstract
The incredible capabilities of generative artificial intelligence models have inevitably led to their application in the domain of drug discovery. Within this domain, the vastness of chemical space motivates the development of more efficient methods for identifying regions with molecules that exhibit desired characteristics. In this work, we present a computationally efficient active learning methodology and demonstrate its applicability to targeted molecular generation. When applied to c-Abl kinase, a protein with FDA-approved small-molecule inhibitors, the model learns to generate molecules similar to the inhibitors without prior knowledge of their existence and even reproduces two of them exactly. We also show that the methodology is effective for a protein without any commercially available small-molecule inhibitors, the HNH domain of the CRISPR-associated protein 9 (Cas9) enzyme. To facilitate implementation and reproducibility, we made all of our software available through the open-source ChemSpaceAL Python package.
Collapse
Affiliation(s)
- Gregory W Kyro
- Department of Chemistry, Yale University, New Haven, Connecticut 06511-8499, United States
| | - Anton Morgunov
- Department of Chemistry, Yale University, New Haven, Connecticut 06511-8499, United States
| | - Rafael I Brent
- Department of Chemistry, Yale University, New Haven, Connecticut 06511-8499, United States
| | - Victor S Batista
- Department of Chemistry, Yale University, New Haven, Connecticut 06511-8499, United States
| |
Collapse
|
18
|
Führer F, Gruber A, Diedam H, Göller AH, Menz S, Schneckener S. A deep neural network: mechanistic hybrid model to predict pharmacokinetics in rat. J Comput Aided Mol Des 2024; 38:7. [PMID: 38294570 DOI: 10.1007/s10822-023-00547-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2023] [Accepted: 12/21/2023] [Indexed: 02/01/2024]
Abstract
An important aspect in the development of small molecules as drugs or agrochemicals is their systemic availability after intravenous and oral administration. The prediction of the systemic availability from the chemical structure of a potential candidate is highly desirable, as it allows to focus the drug or agrochemical development on compounds with a favorable kinetic profile. However, such predictions are challenging as the availability is the result of the complex interplay between molecular properties, biology and physiology and training data is rare. In this work we improve the hybrid model developed earlier (Schneckener in J Chem Inf Model 59:4893-4905, 2019). We reduce the median fold change error for the total oral exposure from 2.85 to 2.35 and for intravenous administration from 1.95 to 1.62. This is achieved by training on a larger data set, improving the neural network architecture as well as the parametrization of mechanistic model. Further, we extend our approach to predict additional endpoints and to handle different covariates, like sex and dosage form. In contrast to a pure machine learning model, our model is able to predict new end points on which it has not been trained. We demonstrate this feature by predicting the exposure over the first 24 h, while the model has only been trained on the total exposure.
Collapse
Affiliation(s)
- Florian Führer
- Engineering & Technology, Applied Mathematics, Bayer AG, 51368, Leverkusen, Germany.
| | - Andrea Gruber
- Pharmaceuticals, R&D, Preclinical Modeling & Simulation, Bayer AG, 13353, Berlin, Germany
| | - Holger Diedam
- Crop Science, Product Supply, SC Simulation & Analysis, Bayer AG, 40789, Monheim, Germany
| | - Andreas H Göller
- Pharmaceuticals, R&D, Molecular Design, Bayer AG, 42096, Wuppertal, Germany
| | - Stephan Menz
- Pharmaceuticals, R&D, Preclinical Modeling & Simulation, Bayer AG, 13353, Berlin, Germany
| | | |
Collapse
|
19
|
Yi JC, Yang ZY, Zhao WT, Yang ZJ, Zhang XC, Wu CK, Lu AP, Cao DS. ChemMORT: an automatic ADMET optimization platform using deep learning and multi-objective particle swarm optimization. Brief Bioinform 2024; 25:bbae008. [PMID: 38385872 PMCID: PMC10883642 DOI: 10.1093/bib/bbae008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2023] [Revised: 12/17/2023] [Accepted: 01/02/2024] [Indexed: 02/23/2024] Open
Abstract
Drug discovery and development constitute a laborious and costly undertaking. The success of a drug hinges not only good efficacy but also acceptable absorption, distribution, metabolism, elimination, and toxicity (ADMET) properties. Overall, up to 50% of drug development failures have been contributed from undesirable ADMET profiles. As a multiple parameter objective, the optimization of the ADMET properties is extremely challenging owing to the vast chemical space and limited human expert knowledge. In this study, a freely available platform called Chemical Molecular Optimization, Representation and Translation (ChemMORT) is developed for the optimization of multiple ADMET endpoints without the loss of potency (https://cadd.nscc-tj.cn/deploy/chemmort/). ChemMORT contains three modules: Simplified Molecular Input Line Entry System (SMILES) Encoder, Descriptor Decoder and Molecular Optimizer. The SMILES Encoder can generate the molecular representation with a 512-dimensional vector, and the Descriptor Decoder is able to translate the above representation to the corresponding molecular structure with high accuracy. Based on reversible molecular representation and particle swarm optimization strategy, the Molecular Optimizer can be used to effectively optimize undesirable ADMET properties without the loss of bioactivity, which essentially accomplishes the design of inverse QSAR. The constrained multi-objective optimization of the poly (ADP-ribose) polymerase-1 inhibitor is provided as the case to explore the utility of ChemMORT.
Collapse
Affiliation(s)
- Jia-Cai Yi
- School of Computer Science, National University of Defense Technology, Changsha 410073, Hunan, PR China
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha 410013, Hunan, P. R. China
| | - Zi-Yi Yang
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha 410013, Hunan, P. R. China
| | - Wen-Tao Zhao
- School of Computer Science, National University of Defense Technology, Changsha 410073, Hunan, PR China
| | - Zhi-Jiang Yang
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha 410013, Hunan, P. R. China
| | - Xiao-Chen Zhang
- School of Computer Science, National University of Defense Technology, Changsha 410073, Hunan, PR China
| | - Cheng-Kun Wu
- State Key Laboratory of High-Performance Computing, Changsha 410073, Hunan, PR China
| | - Ai-Ping Lu
- Institute for Advancing Translational Medicine in Bone and Joint Diseases, School of Chinese Medicine, Hong Kong Baptist University, Hong Kong SAR, P. R. China
| | - Dong-Sheng Cao
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha 410013, Hunan, P. R. China
- Institute for Advancing Translational Medicine in Bone and Joint Diseases, School of Chinese Medicine, Hong Kong Baptist University, Hong Kong SAR, P. R. China
| |
Collapse
|
20
|
Talevi A. Computer-Aided Drug Discovery and Design: Recent Advances and Future Prospects. Methods Mol Biol 2024; 2714:1-20. [PMID: 37676590 DOI: 10.1007/978-1-0716-3441-7_1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/08/2023]
Abstract
Computer-aided drug discovery and design involve the use of information technologies to identify and develop, on a rational ground, chemical compounds that align a set of desired physicochemical and biological properties. In its most common form, it involves the identification and/or modification of an active scaffold (or the combination of known active scaffolds), although de novo drug design from scratch is also possible. Traditionally, the drug discovery and design processes have focused on the molecular determinants of the interactions between drug candidates and their known or intended pharmacological target(s). Nevertheless, in modern times, drug discovery and design are conceived as a particularly complex multiparameter optimization task, due to the complicated, often conflicting, property requirements.This chapter provides an updated overview of in silico approaches for identifying active scaffolds and guiding the subsequent optimization process. Recent groundbreaking advances in the field have also analyzed the integration of state-of-the-art machine learning approaches in every step of the drug discovery process (from prediction of target structure to customized molecular docking scoring functions), integration of multilevel omics data, and the use of a diversity of computational approaches to assist target validation and assess plausible binding pockets.
Collapse
Affiliation(s)
- Alan Talevi
- Laboratory of Bioactive Compound Research and Development (LIDeB), Faculty of Exact Sciences, National University of La Plata (UNLP), La Plata, Argentina.
- Argentinean National Council of Scientific and Technical Research (CONICET), La Plata, Argentina.
| |
Collapse
|
21
|
Liu DF, Zhang YX, Dong WZ, Feng QK, Zhong SL, Dang ZM. High-Temperature Polymer Dielectrics Designed Using an Invertible Molecular Graph Generative Model. J Chem Inf Model 2023; 63:7669-7675. [PMID: 38061777 DOI: 10.1021/acs.jcim.3c01572] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2023]
Abstract
Generating new molecules with the desired physical or chemical properties is the key challenge of computational material design. Deep learning techniques are being actively applied in the field of data-driven material informatics and provide a promising way to accelerate the discovery of innovative materials. In this work, we utilize an invertible graph generative model to generate hypothetical promising high-temperature polymer dielectrics. A molecular graph generative model based on the invertible normalizing flow is trained on a data set containing 250k polymer molecular graphs (mostly generated by an RNN-based generative model) to learn the invertible transformations between latent distributions and molecular graph structures. When generating molecular graphs, a sample vector is drawn from the latent space, and then an adjacency tensor and node attribute matrix are generated through two invertible flows in two steps and assembled into a molecular graph. The model has the merits of exact likelihood training and an efficient one-shot generation process. The learned latent space is used to generate polymers with a high glass-transition temperature (Tg) and a wide band gap (Eg) for the application of high-temperature energy storage film capacitors. This work contributes to the efficient design of high-temperature polymer dielectrics by using deep generative models.
Collapse
Affiliation(s)
- Di-Fan Liu
- State Key Laboratory of Power Systems, Department of Electrical Engineering, Tsinghua University, Beijing 100084, People's Republic of China
| | - Yong-Xin Zhang
- State Key Laboratory of Power Systems, Department of Electrical Engineering, Tsinghua University, Beijing 100084, People's Republic of China
| | - Wen-Zhuo Dong
- State Key Laboratory of Power Systems, Department of Electrical Engineering, Tsinghua University, Beijing 100084, People's Republic of China
| | - Qi-Kun Feng
- State Key Laboratory of Power Systems, Department of Electrical Engineering, Tsinghua University, Beijing 100084, People's Republic of China
| | - Shao-Long Zhong
- State Key Laboratory of Power Systems, Department of Electrical Engineering, Tsinghua University, Beijing 100084, People's Republic of China
| | - Zhi-Min Dang
- State Key Laboratory of Power Systems, Department of Electrical Engineering, Tsinghua University, Beijing 100084, People's Republic of China
| |
Collapse
|
22
|
Xu M, Chen H. Tree-Invent: A Novel Multipurpose Molecular Generative Model Constrained with a Topological Tree. J Chem Inf Model 2023; 63:7067-7082. [PMID: 37962855 DOI: 10.1021/acs.jcim.3c01626] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2023]
Abstract
De novo molecular design plays an important role in drug discovery. Here, a novel generative model, Tree-Invent, was proposed to integrate topological constraints in the generation of a molecular graph. In this model, a molecular graph is represented as a topological tree in which a ring system, a nonring atom, and a chemical bond are regarded as the ring node, single node, and edge, respectively. The molecule generation is driven by three independent submodels for carrying out operations of node addition, ring generation, and node connection. One unique feature of the generative model is that the topological tree structure can be specified as a constraint for structure generation, which provides more precise control of structure generation. Combined with reinforcement learning, the Tree-Invent model could efficiently explore targeted chemical space. Moreover, the Tree-Invent model is flexible enough to be used in versatile molecule design settings such as scaffold decoration, scaffold hopping, and linker generation.
Collapse
Affiliation(s)
- Mingyuan Xu
- Guangzhou National Laboratory, No. 9 XingDaoHuanBei Road, Guangzhou International Bio Island, Guangzhou, Guangdong 510005, China
| | - Hongming Chen
- Guangzhou National Laboratory, No. 9 XingDaoHuanBei Road, Guangzhou International Bio Island, Guangzhou, Guangdong 510005, China
| |
Collapse
|
23
|
Ilnicka A, Schneider G. Designing molecules with autoencoder networks. NATURE COMPUTATIONAL SCIENCE 2023; 3:922-933. [PMID: 38177601 DOI: 10.1038/s43588-023-00548-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/01/2023] [Accepted: 10/03/2023] [Indexed: 01/06/2024]
Abstract
Autoencoders are versatile tools in molecular informatics. These unsupervised neural networks serve diverse tasks such as data-driven molecular representation and constructive molecular design. This Review explores their algorithmic foundations and applications in drug discovery, highlighting the most active areas of development and the contributions autoencoder networks have made in advancing this field. We also explore the challenges and prospects concerning the utilization of autoencoders and the various adaptations of this neural network architecture in molecular design.
Collapse
Affiliation(s)
- Agnieszka Ilnicka
- Department of Chemistry and Applied Biosciences, ETH Zurich, Zurich, Switzerland
| | - Gisbert Schneider
- Department of Chemistry and Applied Biosciences, ETH Zurich, Zurich, Switzerland.
| |
Collapse
|
24
|
Beckers M, Sturm N, Sirockin F, Fechner N, Stiefl N. Prediction of Small-Molecule Developability Using Large-Scale In Silico ADMET Models. J Med Chem 2023; 66:14047-14060. [PMID: 37815201 DOI: 10.1021/acs.jmedchem.3c01083] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/11/2023]
Abstract
Early in silico assessment of the potential of a series of compounds to deliver a drug is one of the major challenges in computer-assisted drug design. The goal is to identify the right chemical series of compounds out of a large chemical space to then subsequently prioritize the molecules with the highest potential to become a drug. Although multiple approaches to assess compounds have been developed over decades, the quality of these predictors is often not good enough and compounds that agree with the respective estimates are not necessarily druglike. Here, we report a novel deep learning approach that leverages large-scale predictions of ∼100 ADMET assays to assess the potential of a compound to become a relevant drug candidate. The resulting score, which we termed bPK score, substantially outperforms previous approaches and showed strong discriminative performance on data sets where previous approaches did not.
Collapse
Affiliation(s)
- Maximilian Beckers
- Novartis Institutes for BioMedical Research, Novartis Pharma AG, Postfach, 4002 Basel, Switzerland
| | - Noé Sturm
- Novartis Institutes for BioMedical Research, Novartis Pharma AG, Postfach, 4002 Basel, Switzerland
| | - Finton Sirockin
- Novartis Institutes for BioMedical Research, Novartis Pharma AG, Postfach, 4002 Basel, Switzerland
| | - Nikolas Fechner
- Novartis Institutes for BioMedical Research, Novartis Pharma AG, Postfach, 4002 Basel, Switzerland
| | - Nikolaus Stiefl
- Novartis Institutes for BioMedical Research, Novartis Pharma AG, Postfach, 4002 Basel, Switzerland
| |
Collapse
|
25
|
Liu N, Jin H, Zhang L, Liu Z. Plug-in Models: A Promising Direction for Molecular Generation. HEALTH DATA SCIENCE 2023; 3:0092. [PMID: 38487202 PMCID: PMC10880158 DOI: 10.34133/hds.0092] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/09/2023] [Accepted: 09/26/2023] [Indexed: 03/17/2024]
Affiliation(s)
- Ningfeng Liu
- State Key Laboratory of Natural and Biomimetic Drugs,
School of Pharmaceutical Sciences, Peking University, 100191 Beijing, P. R. China
| | - Hongwei Jin
- State Key Laboratory of Natural and Biomimetic Drugs,
School of Pharmaceutical Sciences, Peking University, 100191 Beijing, P. R. China
| | - Liangren Zhang
- State Key Laboratory of Natural and Biomimetic Drugs,
School of Pharmaceutical Sciences, Peking University, 100191 Beijing, P. R. China
| | - Zhenming Liu
- State Key Laboratory of Natural and Biomimetic Drugs,
School of Pharmaceutical Sciences, Peking University, 100191 Beijing, P. R. China
| |
Collapse
|
26
|
Wei L, Fu N, Song Y, Wang Q, Hu J. Probabilistic generative transformer language models for generative design of molecules. J Cheminform 2023; 15:88. [PMID: 37749655 PMCID: PMC10518939 DOI: 10.1186/s13321-023-00759-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2022] [Accepted: 09/10/2023] [Indexed: 09/27/2023] Open
Abstract
Self-supervised neural language models have recently found wide applications in the generative design of organic molecules and protein sequences as well as representation learning for downstream structure classification and functional prediction. However, most of the existing deep learning models for molecule design usually require a big dataset and have a black-box architecture, which makes it difficult to interpret their design logic. Here we propose the Generative Molecular Transformer (GMTransformer), a probabilistic neural network model for generative design of molecules. Our model is built on the blank filling language model originally developed for text processing, which has demonstrated unique advantages in learning the "molecules grammars" with high-quality generation, interpretability, and data efficiency. Benchmarked on the MOSES datasets, our models achieve high novelty and Scaf compared to other baselines. The probabilistic generation steps have the potential in tinkering with molecule design due to their capability of recommending how to modify existing molecules with explanation, guided by the learned implicit molecule chemistry. The source code and datasets can be accessed freely at https://github.com/usccolumbia/GMTransformer.
Collapse
Affiliation(s)
- Lai Wei
- Department of Computer Science and Engineering, University of South Carolina, Columbia, SC, 29201, USA
| | - Nihang Fu
- Department of Computer Science and Engineering, University of South Carolina, Columbia, SC, 29201, USA
| | - Yuqi Song
- Department of Computer Science and Engineering, University of South Carolina, Columbia, SC, 29201, USA
| | - Qian Wang
- Department of Chemistry and Biochemistry, University of South Carolina, Columbia, SC, 29201, USA
| | - Jianjun Hu
- Department of Computer Science and Engineering, University of South Carolina, Columbia, SC, 29201, USA.
| |
Collapse
|
27
|
Parrot M, Tajmouati H, da Silva VBR, Atwood BR, Fourcade R, Gaston-Mathé Y, Do Huu N, Perron Q. Integrating synthetic accessibility with AI-based generative drug design. J Cheminform 2023; 15:83. [PMID: 37726842 PMCID: PMC10507964 DOI: 10.1186/s13321-023-00742-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2023] [Accepted: 08/03/2023] [Indexed: 09/21/2023] Open
Abstract
Generative models are frequently used for de novo design in drug discovery projects to propose new molecules. However, the question of whether or not the generated molecules can be synthesized is not systematically taken into account during generation, even though being able to synthesize the generated molecules is a fundamental requirement for such methods to be useful in practice. Methods have been developed to estimate molecule "synthesizability", but, so far, there is no consensus on whether or not a molecule is synthesizable. In this paper we introduce the Retro-Score (RScore), which computes a synthetic accessibility score of molecules by performing a full retrosynthetic analysis through our data-driven synthetic planning software Spaya, and its dedicated API: Spaya-API (https://spaya.ai). We start by comparing several synthetic accessibility scores to a binary "chemist score" as estimated by chemists on a bench of generated molecules, as a first experimental validation that the RScore is a reliable synthetic accessibility score. We then describe a pipeline to generate molecules that validate a list of targets while still being easy to synthesize. We further this idea by performing experiments comparing molecular generator outputs across a range of constraints and conditions. We show that the RScore can be learned by a Neural Network, which leads to a new score: RSPred. We demonstrate that using the RScore or RSPred as a constraint during molecular generation enables our molecular generators to produce more synthesizable solutions, with higher diversity. The open-source Python code containing all the scores and the experiments can be found on ( https://github.com/iktos/generation-under-synthetic-constraint ).
Collapse
Affiliation(s)
- Maud Parrot
- Iktos, 65 rue de Prony, 75017, Paris, France
| | | | | | | | | | | | | | | |
Collapse
|
28
|
Feng H, Wang R, Zhan CG, Wei GW. Multiobjective Molecular Optimization for Opioid Use Disorder Treatment Using Generative Network Complex. J Med Chem 2023; 66:12479-12498. [PMID: 37623046 PMCID: PMC11037444 DOI: 10.1021/acs.jmedchem.3c01053] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/26/2023]
Abstract
Opioid use disorder (OUD) has emerged as a significant global public health issue, necessitating the discovery of new medications. In this study, we propose a deep generative model that combines a stochastic differential equation (SDE)-based diffusion model with a pretrained autoencoder. The molecular generator enables efficient generation of molecules that target multiple opioid receptors, including mu, kappa, and delta. Additionally, we assess the ADMET (absorption, distribution, metabolism, excretion, and toxicity) properties of the generated molecules to identify druglike compounds. We develop a molecular optimization approach to enhance the pharmacokinetic properties of some lead compounds. Advanced binding affinity predictors were built using molecular fingerprints, including autoencoder embeddings, transformer embeddings, and topological Laplacians. Our process yields druglike molecules that can be used in highly focused experimental studies to further evaluate their pharmacological effects. Our machine learning platform serves as a valuable tool for designing effective molecules to address OUD.
Collapse
Affiliation(s)
- Hongsong Feng
- Department of Mathematics, Michigan State University, East Lansing, Michigan 48824, United States
| | - Rui Wang
- Department of Mathematics, Michigan State University, East Lansing, Michigan 48824, United States
| | - Chang-Guo Zhan
- Department of Pharmaceutical Sciences, University of Kentucky, Lexington, Kentucky 40506, United States
| | - Guo-Wei Wei
- Department of Mathematics, Michigan State University, East Lansing, Michigan 48824, United States
- Department of Electrical and Computer Engineering, Michigan State University, East Lansing, Michigan 48824, United States
- Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, Michigan 48824, United States
| |
Collapse
|
29
|
Jiang M, Rocktäschel T, Grefenstette E. General intelligence requires rethinking exploration. ROYAL SOCIETY OPEN SCIENCE 2023; 10:230539. [PMID: 37351488 PMCID: PMC10282580 DOI: 10.1098/rsos.230539] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 04/24/2023] [Accepted: 05/30/2023] [Indexed: 06/24/2023]
Abstract
We are at the cusp of a transition from 'learning from data' to 'learning what data to learn from' as a central focus of artificial intelligence (AI) research. While the first-order learning problem is not completely solved, large models under unified architectures, such as transformers, have shifted the learning bottleneck from how to effectively train models to how to effectively acquire and use task-relevant data. This problem, which we frame as exploration, is a universal aspect of learning in open-ended domains like the real world. Although the study of exploration in AI is largely limited to the field of reinforcement learning, we argue that exploration is essential to all learning systems, including supervised learning. We propose the problem of generalized exploration to conceptually unify exploration-driven learning between supervised learning and reinforcement learning, allowing us to highlight key similarities across learning settings and open research challenges. Importantly, generalized exploration is a necessary objective for maintaining open-ended learning processes, which in continually learning to discover and solve new problems, provides a promising path to more general intelligence.
Collapse
Affiliation(s)
- Minqi Jiang
- AI Centre, Department of Computer Science, University College London, London, UK
| | - Tim Rocktäschel
- AI Centre, Department of Computer Science, University College London, London, UK
| | - Edward Grefenstette
- AI Centre, Department of Computer Science, University College London, London, UK
| |
Collapse
|
30
|
Anstine D, Isayev O. Generative Models as an Emerging Paradigm in the Chemical Sciences. J Am Chem Soc 2023; 145:8736-8750. [PMID: 37052978 PMCID: PMC10141264 DOI: 10.1021/jacs.2c13467] [Citation(s) in RCA: 44] [Impact Index Per Article: 44.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2022] [Indexed: 04/14/2023]
Abstract
Traditional computational approaches to design chemical species are limited by the need to compute properties for a vast number of candidates, e.g., by discriminative modeling. Therefore, inverse design methods aim to start from the desired property and optimize a corresponding chemical structure. From a machine learning viewpoint, the inverse design problem can be addressed through so-called generative modeling. Mathematically, discriminative models are defined by learning the probability distribution function of properties given the molecular or material structure. In contrast, a generative model seeks to exploit the joint probability of a chemical species with target characteristics. The overarching idea of generative modeling is to implement a system that produces novel compounds that are expected to have a desired set of chemical features, effectively sidestepping issues found in the forward design process. In this contribution, we overview and critically analyze popular generative algorithms like generative adversarial networks, variational autoencoders, flow, and diffusion models. We highlight key differences between each of the models, provide insights into recent success stories, and discuss outstanding challenges for realizing generative modeling discovered solutions in chemical applications.
Collapse
Affiliation(s)
- Dylan
M. Anstine
- Department
of Chemistry, Mellon College of Science, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213, United States
| | - Olexandr Isayev
- Department
of Chemistry, Mellon College of Science, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213, United States
| |
Collapse
|
31
|
Choi J, Seo S, Choi S, Piao S, Park C, Ryu SJ, Kim BJ, Park S. ReBADD-SE: Multi-objective molecular optimisation using SELFIES fragment and off-policy self-critical sequence training. Comput Biol Med 2023; 157:106721. [PMID: 36913852 DOI: 10.1016/j.compbiomed.2023.106721] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2022] [Revised: 02/11/2023] [Accepted: 02/26/2023] [Indexed: 03/02/2023]
Abstract
The discovery of drugs to selectively remove disease-related cells is challenging in computer-aided drug design. Many studies have proposed multi-objective molecular generation methods and demonstrated their superiority using the public benchmark dataset for kinase inhibitor generation tasks. However, the dataset does not contain many molecules that violate Lipinski's rule of five. Thus, it remains unclear whether existing methods are effective in generating molecules violating the rule, such as navitoclax. To address this, we analysed the limitations of existing methods and propose a multi-objective molecular generation method with a novel parsing algorithm for molecular string representation and a modified reinforcement learning method for the efficient training of multi-objective molecular optimisation. The proposed model had success rates of 84% in GSK3b+JNK3 inhibitor generation and 99% in Bcl-2 family inhibitor generation tasks.
Collapse
Affiliation(s)
- Jonghwan Choi
- Department of Computer Science, Yonsei University, Yonsei-ro 50, Seodaemun-gu, 03722, Seoul, Republic of Korea; UBLBio Corporation, Yeongtong-ro 237, Suwon, 16679, Gyeonggi-do, Republic of Korea.
| | - Sangmin Seo
- Department of Computer Science, Yonsei University, Yonsei-ro 50, Seodaemun-gu, 03722, Seoul, Republic of Korea; UBLBio Corporation, Yeongtong-ro 237, Suwon, 16679, Gyeonggi-do, Republic of Korea
| | - Seungyeon Choi
- Department of Computer Science, Yonsei University, Yonsei-ro 50, Seodaemun-gu, 03722, Seoul, Republic of Korea
| | - Shengmin Piao
- Department of Computer Science, Yonsei University, Yonsei-ro 50, Seodaemun-gu, 03722, Seoul, Republic of Korea
| | - Chihyun Park
- Department of Computer Science and Engineering, Kangwon National University, Chuncheon-si, 24341, Kangwon-do, Republic of Korea; UBLBio Corporation, Yeongtong-ro 237, Suwon, 16679, Gyeonggi-do, Republic of Korea
| | - Sung Jin Ryu
- UBLBio Corporation, Yeongtong-ro 237, Suwon, 16679, Gyeonggi-do, Republic of Korea
| | - Byung Ju Kim
- UBLBio Corporation, Yeongtong-ro 237, Suwon, 16679, Gyeonggi-do, Republic of Korea
| | - Sanghyun Park
- Department of Computer Science, Yonsei University, Yonsei-ro 50, Seodaemun-gu, 03722, Seoul, Republic of Korea.
| |
Collapse
|
32
|
Fromer JC, Coley CW. Computer-aided multi-objective optimization in small molecule discovery. PATTERNS (NEW YORK, N.Y.) 2023; 4:100678. [PMID: 36873904 PMCID: PMC9982302 DOI: 10.1016/j.patter.2023.100678] [Citation(s) in RCA: 25] [Impact Index Per Article: 25.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/12/2023]
Abstract
Molecular discovery is a multi-objective optimization problem that requires identifying a molecule or set of molecules that balance multiple, often competing, properties. Multi-objective molecular design is commonly addressed by combining properties of interest into a single objective function using scalarization, which imposes assumptions about relative importance and uncovers little about the trade-offs between objectives. In contrast to scalarization, Pareto optimization does not require knowledge of relative importance and reveals the trade-offs between objectives. However, it introduces additional considerations in algorithm design. In this review, we describe pool-based and de novo generative approaches to multi-objective molecular discovery with a focus on Pareto optimization algorithms. We show how pool-based molecular discovery is a relatively direct extension of multi-objective Bayesian optimization and how the plethora of different generative models extend from single-objective to multi-objective optimization in similar ways using non-dominated sorting in the reward function (reinforcement learning) or to select molecules for retraining (distribution learning) or propagation (genetic algorithms). Finally, we discuss some remaining challenges and opportunities in the field, emphasizing the opportunity to adopt Bayesian optimization techniques into multi-objective de novo design.
Collapse
Affiliation(s)
- Jenna C Fromer
- Department of Chemical Engineering, MIT, Cambridge, MA 02139, USA
| | - Connor W Coley
- Department of Chemical Engineering, MIT, Cambridge, MA 02139, USA.,Department of Electrical Engineering and Computer Science, MIT, Cambridge, MA 02139, USA
| |
Collapse
|
33
|
|
34
|
Sundin I, Voronov A, Xiao H, Papadopoulos K, Bjerrum EJ, Heinonen M, Patronov A, Kaski S, Engkvist O. Human-in-the-loop assisted de novo molecular design. J Cheminform 2022; 14:86. [PMID: 36578043 PMCID: PMC9795720 DOI: 10.1186/s13321-022-00667-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2022] [Accepted: 12/03/2022] [Indexed: 12/29/2022] Open
Abstract
A de novo molecular design workflow can be used together with technologies such as reinforcement learning to navigate the chemical space. A bottleneck in the workflow that remains to be solved is how to integrate human feedback in the exploration of the chemical space to optimize molecules. A human drug designer still needs to design the goal, expressed as a scoring function for the molecules that captures the designer's implicit knowledge about the optimization task. Little support for this task exists and, consequently, a chemist usually resorts to iteratively building the objective function of multi-parameter optimization (MPO) in de novo design. We propose a principled approach to use human-in-the-loop machine learning to help the chemist to adapt the MPO scoring function to better match their goal. An advantage is that the method can learn the scoring function directly from the user's feedback while they browse the output of the molecule generator, instead of the current manual tuning of the scoring function with trial and error. The proposed method uses a probabilistic model that captures the user's idea and uncertainty about the scoring function, and it uses active learning to interact with the user. We present two case studies for this: In the first use-case, the parameters of an MPO are learned, and in the second use-case a non-parametric component of the scoring function to capture human domain knowledge is developed. The results show the effectiveness of the methods in two simulated example cases with an oracle, achieving significant improvement in less than 200 feedback queries, for the goals of a high QED score and identifying potent molecules for the DRD2 receptor, respectively. We further demonstrate the performance gains with a medicinal chemist interacting with the system.
Collapse
Affiliation(s)
- Iiris Sundin
- grid.5373.20000000108389418Department of Computer Science, Aalto University, Espoo, Finland
| | - Alexey Voronov
- grid.418151.80000 0001 1519 6403Molecular AI, Discovery Sciences, R&D, AstraZeneca, Gothenburg, Sweden
| | - Haoping Xiao
- grid.5373.20000000108389418Department of Computer Science, Aalto University, Espoo, Finland
| | - Kostas Papadopoulos
- grid.418151.80000 0001 1519 6403Molecular AI, Discovery Sciences, R&D, AstraZeneca, Gothenburg, Sweden ,Present Address: Odyssey Therapeutics, Cambridge, MA USA
| | - Esben Jannik Bjerrum
- grid.418151.80000 0001 1519 6403Molecular AI, Discovery Sciences, R&D, AstraZeneca, Gothenburg, Sweden ,Present Address: Odyssey Therapeutics, Cambridge, MA USA
| | - Markus Heinonen
- grid.5373.20000000108389418Department of Computer Science, Aalto University, Espoo, Finland
| | - Atanas Patronov
- grid.418151.80000 0001 1519 6403Molecular AI, Discovery Sciences, R&D, AstraZeneca, Gothenburg, Sweden ,Present Address: Odyssey Therapeutics, Cambridge, MA USA
| | - Samuel Kaski
- grid.5373.20000000108389418Department of Computer Science, Aalto University, Espoo, Finland ,grid.5379.80000000121662407Department of Computer Science, University of Manchester, Manchester, UK
| | - Ola Engkvist
- grid.418151.80000 0001 1519 6403Molecular AI, Discovery Sciences, R&D, AstraZeneca, Gothenburg, Sweden ,grid.5371.00000 0001 0775 6028Department of Computer Science and Engineering, Chalmers University of Technology, Gothenburg, Sweden
| |
Collapse
|
35
|
Hu F, Wang D, Huang H, Hu Y, Yin P. Bridging the Gap between Target-Based and Cell-Based Drug Discovery with a Graph Generative Multitask Model. J Chem Inf Model 2022; 62:6046-6056. [PMID: 36401569 DOI: 10.1021/acs.jcim.2c01180] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
The development of new drugs is crucial for protecting humans from disease. In the past several decades, target-based screening has been one of the most popular methods for developing new drugs. This method efficiently screens potential inhibitors of a target protein in vitro, but it frequently fails in vivo due to insufficient activity of the selected drugs. There is a need for accurate computational methods to bridge this gap. Here, we present a novel graph multi-task deep learning model to identify compounds with both target inhibitory and cell active (MATIC) properties. On a carefully curated SARS-CoV-2 data set, the proposed MATIC model shows advantages compared with the traditional method in screening effective compounds in vivo. Following this, we investigated the interpretability of the model and discovered that the learned features for target inhibition (in vitro) or cell active (in vivo) tasks are different with molecular property correlations and atom functional attention. Based on these findings, we utilized a Monte Carlo-based reinforcement learning generative model to generate novel multiproperty compounds with both in vitro and in vivo efficacy, thus bridging the gap between target-based and cell-based drug discovery. The tool is freely accessible at https://github.com/SIAT-code/MATIC.
Collapse
Affiliation(s)
- Fan Hu
- Guangdong-Hong Kong-Macao Joint Laboratory of Human-Machine Intelligence-Synergy Systems, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen518055, China
| | - Dongqi Wang
- Guangdong-Hong Kong-Macao Joint Laboratory of Human-Machine Intelligence-Synergy Systems, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen518055, China
| | - Huazhen Huang
- Guangdong-Hong Kong-Macao Joint Laboratory of Human-Machine Intelligence-Synergy Systems, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen518055, China
| | - Yishen Hu
- Guangdong-Hong Kong-Macao Joint Laboratory of Human-Machine Intelligence-Synergy Systems, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen518055, China
| | - Peng Yin
- Guangdong-Hong Kong-Macao Joint Laboratory of Human-Machine Intelligence-Synergy Systems, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen518055, China
| |
Collapse
|
36
|
Bajorath J, Chávez-Hernández AL, Duran-Frigola M, Fernández-de Gortari E, Gasteiger J, López-López E, Maggiora GM, Medina-Franco JL, Méndez-Lucio O, Mestres J, Miranda-Quintana RA, Oprea TI, Plisson F, Prieto-Martínez FD, Rodríguez-Pérez R, Rondón-Villarreal P, Saldívar-Gonzalez FI, Sánchez-Cruz N, Valli M. Chemoinformatics and artificial intelligence colloquium: progress and challenges in developing bioactive compounds. J Cheminform 2022; 14:82. [PMID: 36461094 PMCID: PMC9716667 DOI: 10.1186/s13321-022-00661-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2022] [Accepted: 11/25/2022] [Indexed: 12/03/2022] Open
Abstract
We report the main conclusions of the first Chemoinformatics and Artificial Intelligence Colloquium, Mexico City, June 15-17, 2022. Fifteen lectures were presented during a virtual public event with speakers from industry, academia, and non-for-profit organizations. Twelve hundred and ninety students and academics from more than 60 countries. During the meeting, applications, challenges, and opportunities in drug discovery, de novo drug design, ADME-Tox (absorption, distribution, metabolism, excretion and toxicity) property predictions, organic chemistry, peptides, and antibiotic resistance were discussed. The program along with the recordings of all sessions are freely available at https://www.difacquim.com/english/events/2022-colloquium/ .
Collapse
Affiliation(s)
- Jürgen Bajorath
- Department of Life Science Informatics, B-IT, LIMES Program Unit Chemical Biology and Medicinal Chemistry, Rheinische Friedrich-Wilhelms-Universität, Friedrich-Hirzebruch-Allee 5/6, 53113, Bonn, Germany
| | - Ana L Chávez-Hernández
- DIFACQUIM Research Group, Department of Pharmacy, School of Chemistry, National Autonomous University of Mexico, 04510, Mexico City, Mexico
| | - Miquel Duran-Frigola
- Ersilia Open Source Initiative, Cambridge, UK
- Joint IRB-BSC-CRG Programme in Computational Biology, Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology, Barcelona, Catalonia, Spain
| | - Eli Fernández-de Gortari
- Nanosafety Laboratory, International Iberian Nanotechnology Laboratory, 4715-330, Braga, Portugal
| | - Johann Gasteiger
- Computer-Chemie-Centrum, University of Erlangen-Nuremberg, Erlangen, Germany
| | - Edgar López-López
- DIFACQUIM Research Group, Department of Pharmacy, School of Chemistry, National Autonomous University of Mexico, 04510, Mexico City, Mexico
- Department of Pharmacology, Center for Research and Advanced Studies of the National Polytechnic Institute (CINVESTAV), 07360, Mexico City, Mexico
| | | | - José L Medina-Franco
- DIFACQUIM Research Group, Department of Pharmacy, School of Chemistry, National Autonomous University of Mexico, 04510, Mexico City, Mexico.
| | | | - Jordi Mestres
- Chemotargets SL, Baldiri Reixac 4, Parc Cientific de Barcelona (PCB), 08028, Barcelona, Catalonia, Spain
- Research Group on Systems Pharmacology, Research Program on Biomedical Informatics (GRIB), IMIM Hospital del Mar Medical Research Institute and University Pompeu Fabra, Parc de Recerca Biomedica (PRBB), 08003, Barcelona, Catalonia, Spain
| | | | - Tudor I Oprea
- Department of Internal Medicine, University of New Mexico School of Medicine, Albuquerque, NM, 87131, USA
- Department of Rheumatology and Inflammation Research, Institute of Medicine, Sahlgrenska Academy at Gothenburg University, 40530, Gothenburg, Sweden
- Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, 2200, Copenhagen, Denmark
- Roivant Discovery Sciences, Inc., 451 D Street, Boston, MA, 02210, USA
| | - Fabien Plisson
- Department of Biotechnology and Biochemistry, Center for Research and Advanced Studies of the National Polytechnic Institute (CINVESTAV-IPN), Irapuato Unit, 36824, Irapuato, Gto, Mexico
| | | | | | - Paola Rondón-Villarreal
- Universidad de Santander, Facultad de Ciencias Médicas y de la Salud, Instituto de Investigación Masira, Calle 70 No. 55-210, 680003, Santander, Bucaramanga, Colombia
| | - Fernanda I Saldívar-Gonzalez
- DIFACQUIM Research Group, Department of Pharmacy, School of Chemistry, National Autonomous University of Mexico, 04510, Mexico City, Mexico
| | - Norberto Sánchez-Cruz
- Chemotargets SL, Baldiri Reixac 4, Parc Cientific de Barcelona (PCB), 08028, Barcelona, Catalonia, Spain
- Instituto de Química, Unidad Mérida, Universidad Nacional Autónoma de México, Carretera Mérida-Tetiz Km. 4.5, Yucatán, 97357, Ucú, Mexico
| | - Marilia Valli
- Nuclei of Bioassays, Biosynthesis and Ecophysiology of Natural Products (NuBBE), Department of Organic Chemistry, Institute of Chemistry, São Paulo State University-UNESP, Araraquara, Brazil
| |
Collapse
|
37
|
Abouchekeir S, Vu A, Mukaidaisi M, Grantham K, Tchagang A, Li Y. Adversarial deep evolutionary learning for drug design. Biosystems 2022; 222:104790. [DOI: 10.1016/j.biosystems.2022.104790] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2022] [Revised: 09/21/2022] [Accepted: 09/28/2022] [Indexed: 11/27/2022]
|
38
|
Urbina F, Ekins S. The Commoditization of AI for Molecule Design. ARTIFICIAL INTELLIGENCE IN THE LIFE SCIENCES 2022; 2:100031. [PMID: 36211981 PMCID: PMC9541920 DOI: 10.1016/j.ailsci.2022.100031] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/03/2023]
Abstract
Anyone involved in designing or finding molecules in the life sciences over the past few years has witnessed a dramatic change in how we now work due to the COVID-19 pandemic. Computational technologies like artificial intelligence (AI) seemed to become ubiquitous in 2020 and have been increasingly applied as scientists worked from home and were separated from the laboratory and their colleagues. This shift may be more permanent as the future of molecule design across different industries will increasingly require machine learning models for design and optimization of molecules as they become "designed by AI". AI and machine learning has essentially become a commodity within the pharmaceutical industry. This perspective will briefly describe our personal opinions of how machine learning has evolved and is being applied to model different molecule properties that crosses industries in their utility and ultimately suggests the potential for tight integration of AI into equipment and automated experimental pipelines. It will also describe how many groups have implemented generative models covering different architectures, for de novo design of molecules. We also highlight some of the companies at the forefront of using AI to demonstrate how machine learning has impacted and influenced our work. Finally, we will peer into the future and suggest some of the areas that represent the most interesting technologies that may shape the future of molecule design, highlighting how we can help increase the efficiency of the design-make-test cycle which is currently a major focus across industries.
Collapse
Affiliation(s)
- Fabio Urbina
- Collaborations Pharmaceuticals, Inc., 840 Main Campus Drive, Lab 3510, Raleigh, NC 27606, USA
| | - Sean Ekins
- Collaborations Pharmaceuticals, Inc., 840 Main Campus Drive, Lab 3510, Raleigh, NC 27606, USA
| |
Collapse
|
39
|
Nguyen MT, Nguyen T, Tran T. Learning to discover medicines. INTERNATIONAL JOURNAL OF DATA SCIENCE AND ANALYTICS 2022; 16:1-16. [PMID: 36440369 PMCID: PMC9676887 DOI: 10.1007/s41060-022-00371-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2022] [Accepted: 11/05/2022] [Indexed: 11/19/2022]
Abstract
Discovering new medicines is the hallmark of the human endeavor to live a better and longer life. Yet the pace of discovery has slowed down as we need to venture into more wildly unexplored biomedical space to find one that matches today's high standard. Modern AI-enabled by powerful computing, large biomedical databases, and breakthroughs in deep learning offers a new hope to break this loop as AI is rapidly maturing, ready to make a huge impact in the area. In this paper, we review recent advances in AI methodologies that aim to crack this challenge. We organize the vast and rapidly growing literature on AI for drug discovery into three relatively stable sub-areas: (a) representation learning over molecular sequences and geometric graphs; (b) data-driven reasoning where we predict molecular properties and their binding, optimize existing compounds, generate de novo molecules, and plan the synthesis of target molecules; and (c) knowledge-based reasoning where we discuss the construction and reasoning over biomedical knowledge graphs. We will also identify open challenges and chart possible research directions for the years to come.
Collapse
Affiliation(s)
- Minh-Tri Nguyen
- Applied Artificial Intelligence Institute, Deakin University, Burwood, VIC Australia
| | - Thin Nguyen
- Applied Artificial Intelligence Institute, Deakin University, Burwood, VIC Australia
| | - Truyen Tran
- Applied Artificial Intelligence Institute, Deakin University, Burwood, VIC Australia
| |
Collapse
|
40
|
Yoshizawa T, Ishida S, Sato T, Ohta M, Honma T, Terayama K. Selective Inhibitor Design for Kinase Homologs Using Multiobjective Monte Carlo Tree Search. J Chem Inf Model 2022; 62:5351-5360. [DOI: 10.1021/acs.jcim.2c00787] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Affiliation(s)
- Tatsuya Yoshizawa
- Graduate School of Medical Life Science, Yokohama City University, Tsurumi-ku, Yokohama230-0045, Japan
| | - Shoichi Ishida
- Graduate School of Medical Life Science, Yokohama City University, Tsurumi-ku, Yokohama230-0045, Japan
| | - Tomohiro Sato
- RIKEN Center for Biosystems Dynamics Research, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama230-0045, Japan
| | - Masateru Ohta
- HPC- and AI-driven Drug Development Platform Division, Center for Computational Science, RIKEN, Yokohama230-0045, Japan
| | - Teruki Honma
- RIKEN Center for Biosystems Dynamics Research, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama230-0045, Japan
| | - Kei Terayama
- Graduate School of Medical Life Science, Yokohama City University, Tsurumi-ku, Yokohama230-0045, Japan
| |
Collapse
|
41
|
Bon M, Bilsland A, Bower J, McAulay K. Fragment-based drug discovery-the importance of high-quality molecule libraries. Mol Oncol 2022; 16:3761-3777. [PMID: 35749608 PMCID: PMC9627785 DOI: 10.1002/1878-0261.13277] [Citation(s) in RCA: 33] [Impact Index Per Article: 16.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2022] [Revised: 05/16/2022] [Accepted: 06/23/2022] [Indexed: 12/24/2022] Open
Abstract
Fragment-based drug discovery (FBDD) is now established as a complementary approach to high-throughput screening (HTS). Contrary to HTS, where large libraries of drug-like molecules are screened, FBDD screens involve smaller and less complex molecules which, despite a low affinity to protein targets, display more 'atom-efficient' binding interactions than larger molecules. Fragment hits can, therefore, serve as a more efficient start point for subsequent optimisation, particularly for hard-to-drug targets. Since the number of possible molecules increases exponentially with molecular size, small fragment libraries allow for a proportionately greater coverage of their respective 'chemical space' compared with larger HTS libraries comprising larger molecules. However, good library design is essential to ensure optimal chemical and pharmacophore diversity, molecular complexity, and physicochemical characteristics. In this review, we describe our views on fragment library design, and on what constitutes a good fragment from a medicinal and computational chemistry perspective. We highlight emerging chemical and computational technologies in FBDD and discuss strategies for optimising fragment hits. The impact of novel FBDD approaches is already being felt, with the recent approval of the covalent KRASG12C inhibitor sotorasib highlighting the utility of FBDD against targets that were long considered undruggable.
Collapse
Affiliation(s)
- Marta Bon
- Cancer Research HorizonsCancer Research UK Beatson InstituteGlasgowUK
| | - Alan Bilsland
- Cancer Research HorizonsCancer Research UK Beatson InstituteGlasgowUK
| | - Justin Bower
- Cancer Research HorizonsCancer Research UK Beatson InstituteGlasgowUK
| | - Kirsten McAulay
- Cancer Research HorizonsCancer Research UK Beatson InstituteGlasgowUK
| |
Collapse
|
42
|
Krenn M, Ai Q, Barthel S, Carson N, Frei A, Frey NC, Friederich P, Gaudin T, Gayle AA, Jablonka KM, Lameiro RF, Lemm D, Lo A, Moosavi SM, Nápoles-Duarte JM, Nigam A, Pollice R, Rajan K, Schatzschneider U, Schwaller P, Skreta M, Smit B, Strieth-Kalthoff F, Sun C, Tom G, Falk von Rudorff G, Wang A, White AD, Young A, Yu R, Aspuru-Guzik A. SELFIES and the future of molecular string representations. PATTERNS (NEW YORK, N.Y.) 2022; 3:100588. [PMID: 36277819 PMCID: PMC9583042 DOI: 10.1016/j.patter.2022.100588] [Citation(s) in RCA: 42] [Impact Index Per Article: 21.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
Artificial intelligence (AI) and machine learning (ML) are expanding in popularity for broad applications to challenging tasks in chemistry and materials science. Examples include the prediction of properties, the discovery of new reaction pathways, or the design of new molecules. The machine needs to read and write fluently in a chemical language for each of these tasks. Strings are a common tool to represent molecular graphs, and the most popular molecular string representation, Smiles, has powered cheminformatics since the late 1980s. However, in the context of AI and ML in chemistry, Smiles has several shortcomings-most pertinently, most combinations of symbols lead to invalid results with no valid chemical interpretation. To overcome this issue, a new language for molecules was introduced in 2020 that guarantees 100% robustness: SELF-referencing embedded string (Selfies). Selfies has since simplified and enabled numerous new applications in chemistry. In this perspective, we look to the future and discuss molecular string representations, along with their respective opportunities and challenges. We propose 16 concrete future projects for robust molecular representations. These involve the extension toward new chemical domains, exciting questions at the interface of AI and robust languages, and interpretability for both humans and machines. We hope that these proposals will inspire several follow-up works exploiting the full potential of molecular string representations for the future of AI in chemistry and materials science.
Collapse
Affiliation(s)
- Mario Krenn
- Max Planck Institute for the Science of Light (MPL), Erlangen, Germany
| | - Qianxiang Ai
- Department of Chemistry, Fordham University, The Bronx, NY, USA
| | - Senja Barthel
- Department of Mathematics, Vrije Universiteit Amsterdam, Amsterdam, the Netherlands
| | - Nessa Carson
- Syngenta Jealott’s Hill International Research Centre, Bracknell, Berkshire, UK
| | - Angelo Frei
- Department of Chemistry, Imperial College London, Molecular Sciences Research Hub, White City Campus, Wood Lane, London, UK
| | - Nathan C. Frey
- Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Pascal Friederich
- Institute of Theoretical Informatics, Karlsruhe Institute of Technology, Karlsruhe, Germany
- Institute of Nanotechnology, Karlsruhe Institute of Technology, Eggenstein-Leopoldshafen, Germany
| | - Théophile Gaudin
- Department of Computer Science, University of Toronto, Toronto, ON, Canada
- IBM Research Europe, Zürich, Switzerland
| | | | - Kevin Maik Jablonka
- Laboratory of Molecular Simulation (LSMO), Institut des Sciences et Ingénierie Chimiques, Ecole Polytechnique Fédérale de Lausanne (EPFL), Sion, Valais, Switzerland
| | - Rafael F. Lameiro
- Medicinal and Biological Chemistry Group, São Carlos Institute of Chemistry, University of São Paulo, São Paulo, Brazil
| | - Dominik Lemm
- Faculty of Physics, University of Vienna, Vienna, Austria
| | - Alston Lo
- Department of Computer Science, University of Toronto, Toronto, ON, Canada
| | - Seyed Mohamad Moosavi
- Department of Mathematics and Computer Science, Freie Universität Berlin, Berlin, Germany
| | | | - AkshatKumar Nigam
- Department of Computer Science, Stanford University, Stanford, CA, USA
| | - Robert Pollice
- Department of Computer Science, University of Toronto, Toronto, ON, Canada
- Chemical Physics Theory Group, Department of Chemistry, University of Toronto, Toronto, ON, Canada
| | - Kohulan Rajan
- Institute for Inorganic and Analytical Chemistry, Friedrich-Schiller Universität Jena, Jena, Germany
| | - Ulrich Schatzschneider
- Institut für Anorganische Chemie, Julius-Maximilians-Universität Würzburg, Würzburg, Germany
| | - Philippe Schwaller
- IBM Research Europe, Zürich, Switzerland
- Laboratory of Artificial Chemical Intelligence (LIAC), Institut des Sciences et Ingénierie Chimiques, Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
- National Centre of Competence in Research (NCCR) Catalysis, Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
| | - Marta Skreta
- Department of Computer Science, University of Toronto, Toronto, ON, Canada
- Vector Institute for Artificial Intelligence, Toronto, ON, Canada
| | - Berend Smit
- Laboratory of Molecular Simulation (LSMO), Institut des Sciences et Ingénierie Chimiques, Ecole Polytechnique Fédérale de Lausanne (EPFL), Sion, Valais, Switzerland
| | - Felix Strieth-Kalthoff
- Chemical Physics Theory Group, Department of Chemistry, University of Toronto, Toronto, ON, Canada
| | - Chong Sun
- Department of Computer Science, University of Toronto, Toronto, ON, Canada
| | - Gary Tom
- Department of Computer Science, University of Toronto, Toronto, ON, Canada
- Chemical Physics Theory Group, Department of Chemistry, University of Toronto, Toronto, ON, Canada
| | | | - Andrew Wang
- Chemical Physics Theory Group, Department of Chemistry, University of Toronto, Toronto, ON, Canada
- Solar Fuels Group, Department of Chemistry, University of Toronto, Toronto, ON, Canada
| | - Andrew D. White
- Department of Chemical Engineering, University of Rochester, Rochester, NY, USA
| | - Adamo Young
- Department of Computer Science, University of Toronto, Toronto, ON, Canada
- Vector Institute for Artificial Intelligence, Toronto, ON, Canada
| | - Rose Yu
- Department of Computer Science and Engineering, University of California, San Diego, La Jolla, CA, USA
| | - Alán Aspuru-Guzik
- Department of Computer Science, University of Toronto, Toronto, ON, Canada
- Chemical Physics Theory Group, Department of Chemistry, University of Toronto, Toronto, ON, Canada
- Vector Institute for Artificial Intelligence, Toronto, ON, Canada
- Department of Chemical Engineering and Applied Chemistry, University of Toronto, Toronto, ON, Canada
- Department of Materials Science, University of Toronto, Toronto, ON, Canada
- Canadian Institute for Advanced Research (CIFAR) Lebovic Fellow, Toronto, ON, Canada
| |
Collapse
|
43
|
Thomas M, O’Boyle NM, Bender A, de Graaf C. Augmented Hill-Climb increases reinforcement learning efficiency for language-based de novo molecule generation. J Cheminform 2022; 14:68. [PMID: 36192789 PMCID: PMC9531503 DOI: 10.1186/s13321-022-00646-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2022] [Accepted: 09/23/2022] [Indexed: 11/10/2022] Open
Abstract
A plethora of AI-based techniques now exists to conduct de novo molecule generation that can devise molecules conditioned towards a particular endpoint in the context of drug design. One popular approach is using reinforcement learning to update a recurrent neural network or language-based de novo molecule generator. However, reinforcement learning can be inefficient, sometimes requiring up to 105 molecules to be sampled to optimize more complex objectives, which poses a limitation when using computationally expensive scoring functions like docking or computer-aided synthesis planning models. In this work, we propose a reinforcement learning strategy called Augmented Hill-Climb based on a simple, hypothesis-driven hybrid between REINVENT and Hill-Climb that improves sample-efficiency by addressing the limitations of both currently used strategies. We compare its ability to optimize several docking tasks with REINVENT and benchmark this strategy against other commonly used reinforcement learning strategies including REINFORCE, REINVENT (version 1 and 2), Hill-Climb and best agent reminder. We find that optimization ability is improved ~ 1.5-fold and sample-efficiency is improved ~ 45-fold compared to REINVENT while still delivering appealing chemistry as output. Diversity filters were used, and their parameters were tuned to overcome observed failure modes that take advantage of certain diversity filter configurations. We find that Augmented Hill-Climb outperforms the other reinforcement learning strategies used on six tasks, especially in the early stages of training or for more difficult objectives. Lastly, we show improved performance not only on recurrent neural networks but also on a reinforcement learning stabilized transformer architecture. Overall, we show that Augmented Hill-Climb improves sample-efficiency for language-based de novo molecule generation conditioning via reinforcement learning, compared to the current state-of-the-art. This makes more computationally expensive scoring functions, such as docking, more accessible on a relevant timescale.
Collapse
Affiliation(s)
- Morgan Thomas
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Cambridge, CB2 1EW UK
| | - Noel M. O’Boyle
- Computational Chemistry, Sosei Heptares, Steinmetz Building, Granta Park, Great Abington, Cambridge, CB21 6DG UK
| | - Andreas Bender
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Cambridge, CB2 1EW UK
| | - Chris de Graaf
- Computational Chemistry, Sosei Heptares, Steinmetz Building, Granta Park, Great Abington, Cambridge, CB21 6DG UK
| |
Collapse
|
44
|
Interpretable Machine Learning Models for Molecular Design of Tyrosine Kinase Inhibitors Using Variational Autoencoders and Perturbation-Based Approach of Chemical Space Exploration. Int J Mol Sci 2022; 23:ijms231911262. [PMID: 36232566 PMCID: PMC9569663 DOI: 10.3390/ijms231911262] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2022] [Revised: 09/21/2022] [Accepted: 09/21/2022] [Indexed: 11/17/2022] Open
Abstract
In the current study, we introduce an integrative machine learning strategy for the autonomous molecular design of protein kinase inhibitors using variational autoencoders and a novel cluster-based perturbation approach for exploration of the chemical latent space. The proposed strategy combines autoencoder-based embedding of small molecules with a cluster-based perturbation approach for efficient navigation of the latent space and a feature-based kinase inhibition likelihood classifier that guides optimization of the molecular properties and targeted molecular design. In the proposed generative approach, molecules sharing similar structures tend to cluster in the latent space, and interpolating between two molecules in the latent space enables smooth changes in the molecular structures and properties. The results demonstrated that the proposed strategy can efficiently explore the latent space of small molecules and kinase inhibitors along interpretable directions to guide the generation of novel family-specific kinase molecules that display a significant scaffold diversity and optimal biochemical properties. Through assessment of the latent-based and chemical feature-based binary and multiclass classifiers, we developed a robust probabilistic evaluator of kinase inhibition likelihood that is specifically tailored to guide the molecular design of novel SRC kinase molecules. The generated molecules originating from LCK and ABL1 kinase inhibitors yielded ~40% of novel and valid SRC kinase compounds with high kinase inhibition likelihood probability values (p > 0.75) and high similarity (Tanimoto coefficient > 0.6) to the known SRC inhibitors. By combining the molecular perturbation design with the kinase inhibition likelihood analysis and similarity assessments, we showed that the proposed molecular design strategy can produce novel valid molecules and transform known inhibitors of different kinase families into potential chemical probes of the SRC kinase with excellent physicochemical profiles and high similarity to the known SRC kinase drugs. The results of our study suggest that task-specific manipulation of a biased latent space may be an important direction for more effective task-oriented and target-specific autonomous chemical design models.
Collapse
|
45
|
Li C, Wang C, Sun M, Zeng Y, Yuan Y, Gou Q, Wang G, Guo Y, Pu X. Correlated RNN Framework to Quickly Generate Molecules with Desired Properties for Energetic Materials in the Low Data Regime. J Chem Inf Model 2022; 62:4873-4887. [PMID: 35998331 DOI: 10.1021/acs.jcim.2c00997] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023]
Abstract
Motivated by the challenging of deep learning on the low data regime and the urgent demand for intelligent design on highly energetic materials, we explore a correlated deep learning framework, which consists of three recurrent neural networks (RNNs) correlated by the transfer learning strategy, to efficiently generate new energetic molecules with a high detonation velocity in the case of very limited data available. To avoid the dependence on the external big data set, data augmentation by fragment shuffling of 303 energetic compounds is utilized to produce 500,000 molecules to pretrain RNN, through which the model can learn sufficient structure knowledge. Then the pretrained RNN is fine-tuned by focusing on the 303 energetic compounds to generate 7153 molecules similar to the energetic compounds. In order to more reliably screen the molecules with a high detonation velocity, the SMILE enumeration augmentation coupled with the pretrained knowledge is utilized to build an RNN-based prediction model, through which R2 is boosted from 0.4446 to 0.9572. The comparable performance with the transfer learning strategy based on an existing big database (ChEMBL) to produce the energetic molecules and drug-like ones further supports the effectiveness and generality of our strategy in the low data regime. High-precision quantum mechanics calculations further confirm that 35 new molecules present a higher detonation velocity and lower synthetic accessibility than the classic explosive RDX, along with good thermal stability. In particular, three new molecules are comparable to caged CL-20 in the detonation velocity. All the source codes and the data set are freely available at https://github.com/wangchenghuidream/RNNMGM.
Collapse
Affiliation(s)
- Chuan Li
- College of Computer Science, Sichuan University, Chengdu 610064, China
| | - Chenghui Wang
- College of Computer Science, Sichuan University, Chengdu 610064, China
| | - Ming Sun
- College of Chemistry, Sichuan University, Chengdu 610064, China
| | - Yan Zeng
- College of Computer Science, Sichuan University, Chengdu 610064, China
| | - Yuan Yuan
- College of Management, Southwest University for Nationalities, Chengdu 610041, China
| | - Qiaolin Gou
- College of Chemistry, Sichuan University, Chengdu 610064, China
| | - Guangchuan Wang
- College of Computer Science, Sichuan University, Chengdu 610064, China
| | - Yanzhi Guo
- College of Chemistry, Sichuan University, Chengdu 610064, China
| | - Xuemei Pu
- College of Chemistry, Sichuan University, Chengdu 610064, China
| |
Collapse
|
46
|
Nigam A, Pollice R, Aspuru-Guzik A. Parallel tempered genetic algorithm guided by deep neural networks for inverse molecular design. DIGITAL DISCOVERY 2022; 1:390-404. [PMID: 36091415 PMCID: PMC9358752 DOI: 10.1039/d2dd00003b] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/19/2022] [Accepted: 05/03/2022] [Indexed: 12/30/2022]
Abstract
Inverse molecular design involves algorithms that sample molecules with specific target properties from a multitude of candidates and can be posed as an optimization problem. High-dimensional optimization tasks in the natural sciences are commonly tackled via population-based metaheuristic optimization algorithms such as evolutionary algorithms. However, often unavoidable expensive property evaluation can limit the widespread use of such approaches as the associated cost can become prohibitive. Herein, we present JANUS, a genetic algorithm inspired by parallel tempering. It propagates two populations, one for exploration and another for exploitation, improving optimization by reducing property evaluations. JANUS is augmented by a deep neural network that approximates molecular properties and relies on active learning for enhanced molecular sampling. It uses the SELFIES representation and the STONED algorithm for the efficient generation of structures, and outperforms other generative models in common inverse molecular design tasks achieving state-of-the-art target metrics across multiple benchmarks. As neither most of the benchmarks nor the structure generator in JANUS account for synthesizability, a significant fraction of the proposed molecules is synthetically infeasible demonstrating that this aspect needs to be considered when evaluating the performance of molecular generative models.
Collapse
Affiliation(s)
- AkshatKumar Nigam
- Department of Computer Science, Stanford University USA
- Department of Computer Science, University of Toronto Canada
- Department of Chemistry, University of Toronto Canada
| | - Robert Pollice
- Department of Computer Science, University of Toronto Canada
- Department of Chemistry, University of Toronto Canada
| | - Alán Aspuru-Guzik
- Department of Computer Science, University of Toronto Canada
- Department of Chemistry, University of Toronto Canada
- Vector Institute for Artificial Intelligence Toronto Canada
- Lebovic Fellow, Canadian Institute for Advanced Research (CIFAR) 661 University Ave Toronto Ontario M5G Canada
| |
Collapse
|
47
|
Zhang J, Chen H. De Novo Molecule Design Using Molecular Generative Models Constrained by Ligand-Protein Interactions. J Chem Inf Model 2022; 62:3291-3306. [PMID: 35793555 DOI: 10.1021/acs.jcim.2c00177] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
In recent years, molecular deep generative models have attracted much attention for its application in de novo drug design. The data-driven molecular deep generative model approximates the high dimensional distribution of the chemical space through learning from a large number of molecular structural data. So far, most of the molecular generative models rely on purely 2D ligand information in structure generation. Here, we propose a novel molecular deep generative model which adopts a recurrent neural network architecture coupled with a ligand-protein interaction fingerprint as constraints. The fingerprint was constructed on ligand docking poses and represents the 3D binding mode of ligands in the protein pocket. In the current work, generative models constrained with interaction fingerprints were trained and compared with normal RNN models. It has been shown that models trained with constraints of ligand-protein interaction fingerprint have a clear tendency to generating compounds maintaining similar binding modes. Our results demonstrate the potential application of the interaction fingerprint-constrained generative model for the targeted molecule generation and guided exploration on the drug-like chemical space.
Collapse
Affiliation(s)
- Jie Zhang
- Guangdong Provincial Key Laboratory of Laboratory Animals, Guangdong Laboratory Animals Monitoring Institute, Guangzhou 510663, P. R. China.,State Key Laboratory of Respiratory Disease, Guangzhou Institutes of Biomedicine and Health, Chinese Academy of Sciences, Guangzhou 510530, P. R. China.,Bioland Laboratory (Guangzhou Regenerative Medicine and Health─Guangdong Laboratory), Guangzhou 510530, P. R. China
| | - Hongming Chen
- Bioland Laboratory (Guangzhou Regenerative Medicine and Health─Guangdong Laboratory), Guangzhou 510530, P. R. China.,Guangzhou International Bio Island, Guangzhou Laboratory, No. 9 XinDaoHuanBei Road, Guangzhou 510005, China
| |
Collapse
|
48
|
Gao K, Wang R, Chen J, Cheng L, Frishcosy J, Huzumi Y, Qiu Y, Schluckbier T, Wei X, Wei GW. Methodology-Centered Review of Molecular Modeling, Simulation, and Prediction of SARS-CoV-2. Chem Rev 2022; 122:11287-11368. [PMID: 35594413 PMCID: PMC9159519 DOI: 10.1021/acs.chemrev.1c00965] [Citation(s) in RCA: 33] [Impact Index Per Article: 16.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
Despite tremendous efforts in the past two years, our understanding of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), virus-host interactions, immune response, virulence, transmission, and evolution is still very limited. This limitation calls for further in-depth investigation. Computational studies have become an indispensable component in combating coronavirus disease 2019 (COVID-19) due to their low cost, their efficiency, and the fact that they are free from safety and ethical constraints. Additionally, the mechanism that governs the global evolution and transmission of SARS-CoV-2 cannot be revealed from individual experiments and was discovered by integrating genotyping of massive viral sequences, biophysical modeling of protein-protein interactions, deep mutational data, deep learning, and advanced mathematics. There exists a tsunami of literature on the molecular modeling, simulations, and predictions of SARS-CoV-2 and related developments of drugs, vaccines, antibodies, and diagnostics. To provide readers with a quick update about this literature, we present a comprehensive and systematic methodology-centered review. Aspects such as molecular biophysics, bioinformatics, cheminformatics, machine learning, and mathematics are discussed. This review will be beneficial to researchers who are looking for ways to contribute to SARS-CoV-2 studies and those who are interested in the status of the field.
Collapse
Affiliation(s)
- Kaifu Gao
- Department
of Mathematics, Michigan State University, East Lansing, Michigan 48824, United States
| | - Rui Wang
- Department
of Mathematics, Michigan State University, East Lansing, Michigan 48824, United States
| | - Jiahui Chen
- Department
of Mathematics, Michigan State University, East Lansing, Michigan 48824, United States
| | - Limei Cheng
- Clinical
Pharmacology and Pharmacometrics, Bristol
Myers Squibb, Princeton, New Jersey 08536, United States
| | - Jaclyn Frishcosy
- Department
of Mathematics, Michigan State University, East Lansing, Michigan 48824, United States
| | - Yuta Huzumi
- Department
of Mathematics, Michigan State University, East Lansing, Michigan 48824, United States
| | - Yuchi Qiu
- Department
of Mathematics, Michigan State University, East Lansing, Michigan 48824, United States
| | - Tom Schluckbier
- Department
of Mathematics, Michigan State University, East Lansing, Michigan 48824, United States
| | - Xiaoqi Wei
- Department
of Mathematics, Michigan State University, East Lansing, Michigan 48824, United States
| | - Guo-Wei Wei
- Department
of Mathematics, Michigan State University, East Lansing, Michigan 48824, United States
- Department
of Electrical and Computer Engineering, Michigan State University, East Lansing, Michigan 48824, United States
- Department
of Biochemistry and Molecular Biology, Michigan
State University, East Lansing, Michigan 48824, United States
| |
Collapse
|
49
|
Yang Y, Wu Z, Yao X, Kang Y, Hou T, Hsieh CY, Liu H. Exploring Low-Toxicity Chemical Space with Deep Learning for Molecular Generation. J Chem Inf Model 2022; 62:3191-3199. [PMID: 35713712 DOI: 10.1021/acs.jcim.2c00671] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Creating a wide range of new compounds that not only have ideal pharmacological properties but also easily pass long-term toxicity evaluation is still a challenging task in current drug discovery. In this study, we developed a conditional generative model by combining a semisupervised variational autoencoder (SSVAE) with an MGA toxicity predictor. Our aim is to generate molecules with low toxicity, good drug-like properties, and structural diversity. For multiobjective optimization, we have developed a method with hierarchical constraints on the toxicity space of small molecules to generate drug-like small molecules, which can also minimize the effect on the diversity of generated results. The evaluation results of the metrics indicate that the developed model has good effectiveness, novelty, and diversity. The generated molecules by this model are mainly distributed in low-toxicity regions, which suggests that our model can efficiently constrain the generation of toxic structures. In contrast to simply filtering toxic ones after generation, the low-toxicity molecular generative model can generate molecules with structural diversity. Our strategy can be used in target-based drug discovery to improve the quality of generated molecules with low-toxicity, drug-like, and highly active properties.
Collapse
Affiliation(s)
- Yuwei Yang
- School of Pharmacy, Lanzhou University, Lanzhou 730000, China
| | - Zhenxing Wu
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, P. R. China
| | - Xiaojun Yao
- College of Chemistry and Chemical Engineering, Lanzhou University, Lanzhou 730000, China
| | - Yu Kang
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, P. R. China
| | - Tingjun Hou
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, P. R. China
| | - Chang-Yu Hsieh
- Tencent Quantum Laboratory, Tencent, Shenzhen 518000, China
| | - Huanxiang Liu
- School of Pharmacy, Lanzhou University, Lanzhou 730000, China.,Faculty of Applied Science, Macao Polytechnic University, Macao, SAR 999078, China
| |
Collapse
|
50
|
Urbina F, Lowden CT, Culberson JC, Ekins S. MegaSyn: Integrating Generative Molecular Design, Automated Analog Designer, and Synthetic Viability Prediction. ACS OMEGA 2022; 7:18699-18713. [PMID: 35694522 PMCID: PMC9178760 DOI: 10.1021/acsomega.2c01404] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/08/2022] [Accepted: 05/11/2022] [Indexed: 05/04/2023]
Abstract
Generative machine learning models have become widely adopted in drug discovery and other fields to produce new molecules and explore molecular space, with the goal of discovering novel compounds with optimized properties. These generative models are frequently combined with transfer learning or scoring of the physicochemical properties to steer generative design, yet often, they are not capable of addressing a wide variety of potential problems, as well as converge into similar molecular space when combined with a scoring function for the desired properties. In addition, these generated compounds may not be synthetically feasible, reducing their capabilities and limiting their usefulness in real-world scenarios. Here, we introduce a suite of automated tools called MegaSyn representing three components: a new hill-climb algorithm, which makes use of SMILES-based recurrent neural network (RNN) generative models, analog generation software, and retrosynthetic analysis coupled with fragment analysis to score molecules for their synthetic feasibility. We show that by deconstructing the targeted molecules and focusing on substructures, combined with an ensemble of generative models, MegaSyn generally performs well for the specific tasks of generating new scaffolds as well as targeted analogs, which are likely synthesizable and druglike. We now describe the development, benchmarking, and testing of this suite of tools and propose how they might be used to optimize molecules or prioritize promising lead compounds using these RNN examples provided by multiple test case examples.
Collapse
Affiliation(s)
- Fabio Urbina
- Collaborations
Pharmaceuticals, Inc., 840 Main Campus Drive, Lab 3510, Raleigh, North Carolina 27606, United States
| | - Christopher T. Lowden
- Workflow
Informatics Corporation, 9316 Bramden Court, Wake Forest, North Carolina 27587, United States
| | - J. Christopher Culberson
- Workflow
Informatics Corporation, 9316 Bramden Court, Wake Forest, North Carolina 27587, United States
| | - Sean Ekins
- Collaborations
Pharmaceuticals, Inc., 840 Main Campus Drive, Lab 3510, Raleigh, North Carolina 27606, United States
| |
Collapse
|