1
|
Sridharan B, Sinha A, Bardhan J, Modee R, Ehara M, Priyakumar UD. Deep reinforcement learning in chemistry: A review. J Comput Chem 2024; 45:1886-1898. [PMID: 38698628 DOI: 10.1002/jcc.27354] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2023] [Revised: 03/17/2024] [Accepted: 03/20/2024] [Indexed: 05/05/2024]
Abstract
Reinforcement learning (RL) has been applied to various domains in computational chemistry and has found wide-spread success. In this review, we first motivate the application of RL to chemistry and list some broad application domains, for example, molecule generation, geometry optimization, and retrosynthetic pathway search. We set up some of the formalism associated with reinforcement learning that should help the reader translate their chemistry problems into a form where RL can be used to solve them. We then discuss the solution formulations and algorithms proposed in recent literature for these problems, the advantages of one over the other, together with the necessary details of the RL algorithms they employ. This article should help the reader understand the state of RL applications in chemistry, learn about some relevant actively-researched open problems, gain insight into how RL can be used to approach them and hopefully inspire innovative RL applications in Chemistry.
Collapse
Affiliation(s)
- Bhuvanesh Sridharan
- Center for Computational Natural Sciences and Bioinformatics, International Institute of Information Technology, Hyderabad, India
| | - Animesh Sinha
- Center for Computational Natural Sciences and Bioinformatics, International Institute of Information Technology, Hyderabad, India
| | - Jai Bardhan
- Center for Computational Natural Sciences and Bioinformatics, International Institute of Information Technology, Hyderabad, India
| | - Rohit Modee
- Center for Computational Natural Sciences and Bioinformatics, International Institute of Information Technology, Hyderabad, India
| | - Masahiro Ehara
- Research Center for Computational Science, Institute for Molecular Science, Okazaki, Japan
| | - U Deva Priyakumar
- Center for Computational Natural Sciences and Bioinformatics, International Institute of Information Technology, Hyderabad, India
| |
Collapse
|
2
|
Gormley AJ. Machine learning in drug delivery. J Control Release 2024; 373:23-30. [PMID: 38909704 DOI: 10.1016/j.jconrel.2024.06.045] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2024] [Revised: 06/17/2024] [Accepted: 06/19/2024] [Indexed: 06/25/2024]
Abstract
For decades, drug delivery scientists have been performing trial-and-error experimentation to manually sample parameter spaces and optimize release profiles through rational design. To enable this approach, scientists spend much of their career learning nuanced drug-material interactions that drive system behavior. In relatively simple systems, rational design criteria allow us to fine tune release profiles and enable efficacious therapies. However, as materials and drugs become increasingly sophisticated and their interactions have non-linear and compounding effects, the field is suffering the Curse of Dimensionality which prevents us from comprehending complex structure-function relationships. In the past, we have embraced this complexity by implementing high-throughput screens to increase the probability of finding ideal compositions. However, this brute force method was inefficient and led many to abandon these fishing expeditions. Fortunately, methods in data science including artificial intelligence / machine learning (AI/ML) are providing ideal analytical tools to model this complex data and ascertain quantitative structure-function relationships. In this Oration, I speak to the potential value of data science in drug delivery with particular focus on polymeric delivery systems. Here, I do not suggest that AI/ML will simply replace mechanistic understanding of complex systems. Rather, I propose that AI/ML should be yet another useful tool in the lab to navigate complex parameter spaces. The recent hype around AI/ML is breathtaking and potentially over inflated, but the value of these methods is poised to revolutionize how we perform science. Therefore, I encourage readers to consider adopting these skills and applying data science methods to their own problems. If done successfully, I believe we will all realize a paradigm shift in our approach to drug delivery.
Collapse
Affiliation(s)
- Adam J Gormley
- Associate Professor, Department of Biomedical Engineering, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, United States.
| |
Collapse
|
3
|
Retchin M, Wang Y, Takaba K, Chodera JD. DrugGym: A testbed for the economics of autonomous drug discovery. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.05.28.596296. [PMID: 38854082 PMCID: PMC11160604 DOI: 10.1101/2024.05.28.596296] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2024]
Abstract
Drug discovery is stochastic. The effectiveness of candidate compounds in satisfying design objectives is unknown ahead of time, and the tools used for prioritization-predictive models and assays-are inaccurate and noisy. In a typical discovery campaign, thousands of compounds may be synthesized and tested before design objectives are achieved, with many others ideated but deprioritized. These challenges are well-documented, but assessing potential remedies has been difficult. We introduce DrugGym, a framework for modeling the stochastic process of drug discovery. Emulating biochemical assays with realistic surrogate models, we simulate the progression from weak hits to sub-micromolar leads with viable ADME. We use this testbed to examine how different ideation, scoring, and decision-making strategies impact statistical measures of utility, such as the probability of program success within predefined budgets and the expected costs to achieve target candidate profile (TCP) goals. We also assess the influence of affinity model inaccuracy, chemical creativity, batch size, and multi-step reasoning. Our findings suggest that reducing affinity model inaccuracy from 2 to 0.5 pIC50 units improves budget-constrained success rates tenfold. DrugGym represents a realistic testbed for machine learning methods applied to the hit-to-lead phase. Source code is available at www.drug-gym.org.
Collapse
Affiliation(s)
- Michael Retchin
- Tri-Institutional PhD Program in Computational Biology and Medicine, Weill Cornell Medical College, Cornell University, New York, NY 10065
| | - Yuanqing Wang
- Computational and Systems Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, NY 10065
- Simons Center for Computational Chemistry and Center for Data Science, New York University, New York, NY 10004
| | - Kenichiro Takaba
- Computational and Systems Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, NY 10065
- Pharmaceutical Research Center, Advanced Drug Discovery, Asahi Kasei Pharma Corporation, Shizuoka 410-2321, Japan
| | - John D. Chodera
- Tri-Institutional PhD Program in Computational Biology and Medicine, Weill Cornell Medical College, Cornell University, New York, NY 10065
- Computational and Systems Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, NY 10065
| |
Collapse
|
4
|
Jiang J, Wu J, Luo J, Yang X, Huang Z. MOBCA: Multi-Objective Besiege and Conquer Algorithm. Biomimetics (Basel) 2024; 9:316. [PMID: 38921196 PMCID: PMC11201474 DOI: 10.3390/biomimetics9060316] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2024] [Revised: 05/17/2024] [Accepted: 05/22/2024] [Indexed: 06/27/2024] Open
Abstract
The besiege and conquer algorithm has shown excellent performance in single-objective optimization problems. However, there is no literature on the research of the BCA algorithm on multi-objective optimization problems. Therefore, this paper proposes a new multi-objective besiege and conquer algorithm to solve multi-objective optimization problems. The grid mechanism, archiving mechanism, and leader selection mechanism are integrated into the BCA to estimate the Pareto optimal solution and approach the Pareto optimal frontier. The proposed algorithm is tested with MOPSO, MOEA/D, and NSGAIII on the benchmark function IMOP and ZDT. The experiment results show that the proposed algorithm can obtain competitive results in terms of the accuracy of the Pareto optimal solution.
Collapse
Affiliation(s)
- Jianhua Jiang
- Center for Artificial Intelligence, Jilin University of Finance and Economics, Changchun 130117, China; (J.W.); (J.L.); (X.Y.)
- Jilin Province Key Laboratory of Fintech, Jilin University of Finance and Economics, Changchun 130117, China
| | - Jiaqi Wu
- Center for Artificial Intelligence, Jilin University of Finance and Economics, Changchun 130117, China; (J.W.); (J.L.); (X.Y.)
- Jilin Province Key Laboratory of Fintech, Jilin University of Finance and Economics, Changchun 130117, China
| | - Jinmeng Luo
- Center for Artificial Intelligence, Jilin University of Finance and Economics, Changchun 130117, China; (J.W.); (J.L.); (X.Y.)
- Jilin Province Key Laboratory of Fintech, Jilin University of Finance and Economics, Changchun 130117, China
| | - Xi Yang
- Center for Artificial Intelligence, Jilin University of Finance and Economics, Changchun 130117, China; (J.W.); (J.L.); (X.Y.)
- Jilin Province Key Laboratory of Fintech, Jilin University of Finance and Economics, Changchun 130117, China
| | - Zulu Huang
- College of Foreign Languages, Jilin Agricultural University, Changchun 130118, China;
| |
Collapse
|
5
|
Chandraghatgi R, Ji HF, Rosen GL, Sokhansanj BA. Streamlining Computational Fragment-Based Drug Discovery through Evolutionary Optimization Informed by Ligand-Based Virtual Prescreening. J Chem Inf Model 2024; 64:3826-3840. [PMID: 38696451 PMCID: PMC11197033 DOI: 10.1021/acs.jcim.4c00234] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2024] [Revised: 04/18/2024] [Accepted: 04/19/2024] [Indexed: 05/04/2024]
Abstract
Recent advances in computational methods provide the promise of dramatically accelerating drug discovery. While mathematical modeling and machine learning have become vital in predicting drug-target interactions and properties, there is untapped potential in computational drug discovery due to the vast and complex chemical space. This paper builds on our recently published computational fragment-based drug discovery (FBDD) method called fragment databases from screened ligand drug discovery (FDSL-DD). FDSL-DD uses in silico screening to identify ligands from a vast library, fragmenting them while attaching specific attributes based on predicted binding affinity and interaction with the target subdomain. In this paper, we further propose a two-stage optimization method that utilizes the information from prescreening to optimize computational ligand synthesis. We hypothesize that using prescreening information for optimization shrinks the search space and focuses on promising regions, thereby improving the optimization for candidate ligands. The first optimization stage assembles these fragments into larger compounds using genetic algorithms, followed by a second stage of iterative refinement to produce compounds with enhanced bioactivity. To demonstrate broad applicability, the methodology is demonstrated on three diverse protein targets found in human solid cancers, bacterial antimicrobial resistance, and the SARS-CoV-2 virus. Combined, the proposed FDSL-DD and a two-stage optimization approach yield high-affinity ligand candidates more efficiently than other state-of-the-art computational FBDD methods. We further show that a multiobjective optimization method accounting for drug-likeness can still produce potential candidate ligands with a high binding affinity. Overall, the results demonstrate that integrating detailed chemical information with a constrained search framework can markedly optimize the initial drug discovery process, offering a more precise and efficient route to developing new therapeutics.
Collapse
Affiliation(s)
- Rohan Chandraghatgi
- Department
of Biology, Drexel University, Philadelphia, Pennsylvania 19104, United States
| | - Hai-Feng Ji
- Department
of Chemistry, Drexel University, Philadelphia, Pennsylvania 19104, United States
| | - Gail L. Rosen
- Department
of Electrical & Computer Engineering, Drexel University, Philadelphia, Pennsylvania 19104, United States
| | - Bahrad A. Sokhansanj
- Department
of Electrical & Computer Engineering, Drexel University, Philadelphia, Pennsylvania 19104, United States
| |
Collapse
|
6
|
Shen X, Zeng T, Chen N, Li J, Wu R. NIMO: A Natural Product-Inspired Molecular Generative Model Based on Conditional Transformer. Molecules 2024; 29:1867. [PMID: 38675687 PMCID: PMC11053988 DOI: 10.3390/molecules29081867] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2024] [Revised: 04/11/2024] [Accepted: 04/13/2024] [Indexed: 04/28/2024] Open
Abstract
Natural products (NPs) have diverse biological activity and significant medicinal value. The structural diversity of NPs is the mainstay of drug discovery. Expanding the chemical space of NPs is an urgent need. Inspired by the concept of fragment-assembled pseudo-natural products, we developed a computational tool called NIMO, which is based on the transformer neural network model. NIMO employs two tailor-made motif extraction methods to map a molecular graph into a semantic motif sequence. All these generated motif sequences are used to train our molecular generative models. Various NIMO models were trained under different task scenarios by recognizing syntactic patterns and structure-property relationships. We further explored the performance of NIMO in structure-guided, activity-oriented, and pocket-based molecule generation tasks. Our results show that NIMO had excellent performance for molecule generation from scratch and structure optimization from a scaffold.
Collapse
Affiliation(s)
- Xiaojuan Shen
- School of Pharmaceutical Sciences, Sun Yat-sen University, Guangzhou 510006, China; (X.S.); (T.Z.); (N.C.)
| | - Tao Zeng
- School of Pharmaceutical Sciences, Sun Yat-sen University, Guangzhou 510006, China; (X.S.); (T.Z.); (N.C.)
| | - Nianhang Chen
- School of Pharmaceutical Sciences, Sun Yat-sen University, Guangzhou 510006, China; (X.S.); (T.Z.); (N.C.)
| | - Jiabo Li
- ChemXAI Inc., 53 Barry Lane, Syosset, NY 11791, USA
| | - Ruibo Wu
- School of Pharmaceutical Sciences, Sun Yat-sen University, Guangzhou 510006, China; (X.S.); (T.Z.); (N.C.)
| |
Collapse
|
7
|
Pang C, Qiao J, Zeng X, Zou Q, Wei L. Deep Generative Models in De Novo Drug Molecule Generation. J Chem Inf Model 2024; 64:2174-2194. [PMID: 37934070 DOI: 10.1021/acs.jcim.3c01496] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2023]
Abstract
The discovery of new drugs has important implications for human health. Traditional methods for drug discovery rely on experiments to optimize the structure of lead molecules, which are time-consuming and high-cost. Recently, artificial intelligence has exhibited promising and efficient performance for drug-like molecule generation. In particular, deep generative models achieve great success in de novo generation of drug-like molecules with desired properties, showing massive potential for novel drug discovery. In this study, we review the recent progress of molecule generation using deep generative models, mainly focusing on molecule representations, public databases, data processing tools, and advanced artificial intelligence based molecule generation frameworks. In particular, we present a comprehensive comparison of state-of-the-art deep generative models for molecule generation and a summary of commonly used molecular design strategies. We identify research gaps and challenges of molecule generation such as the need for better databases, missing 3D information in molecular representation, and the lack of high-precision evaluation metrics. We suggest future directions for molecular generation and drug discovery.
Collapse
Affiliation(s)
- Chao Pang
- School of Software, Shandong University, Jinan 250100, China
- Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Shandong University, Jinan 250100, China
| | - Jianbo Qiao
- School of Software, Shandong University, Jinan 250100, China
- Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Shandong University, Jinan 250100, China
| | - Xiangxiang Zeng
- College of Information Science and Engineering, Hunan University, Changsha 410082, China
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Leyi Wei
- School of Software, Shandong University, Jinan 250100, China
- Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Shandong University, Jinan 250100, China
| |
Collapse
|
8
|
Biehn SE, Goncalves LM, Lehmann J, Marty JD, Mueller C, Ramirez SA, Tillier F, Sage CR. BioPrint meets the AI age: development of artificial intelligence-based ADMET models for the drug-discovery platform SAFIRE. Future Med Chem 2024; 16:587-599. [PMID: 38372202 DOI: 10.4155/fmc-2024-0007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2024] [Accepted: 02/08/2024] [Indexed: 02/20/2024] Open
Abstract
Background: To prioritize compounds with a higher likelihood of success, artificial intelligence models can be used to predict absorption, distribution, metabolism, excretion and toxicity (ADMET) properties of molecules quickly and efficiently. Methods: Models were trained with BioPrint database proprietary data along with public datasets to predict various ADMET end points for the SAFIRE platform. Results: SAFIRE models performed at or above 75% accuracy and 0.4 Matthew's correlation coefficient with validation sets. Training with both proprietary and public data improved model performance and expanded the chemical space on which the models were trained. The platform features scoring functionality to guide user decision-making. Conclusion: High-quality datasets along with chemical space considerations yielded ADMET models performing favorably with utility in the drug discovery process.
Collapse
Affiliation(s)
- Sarah E Biehn
- Eurofins DiscoveryAI, Eurofins Panlabs, Inc., Saint Charles, MO 63304, USA
| | | | - Juerg Lehmann
- Eurofins DiscoveryAI, Eurofins Panlabs, Inc., Saint Charles, MO 63304, USA
| | - Jessica D Marty
- Eurofins DiscoveryAI, Eurofins Panlabs, Inc., Saint Charles, MO 63304, USA
| | - Christoph Mueller
- Eurofins DiscoveryAI, Eurofins Panlabs, Inc., Saint Charles, MO 63304, USA
| | - Samuel A Ramirez
- Eurofins DiscoveryAI, Eurofins Panlabs, Inc., Saint Charles, MO 63304, USA
| | - Fabien Tillier
- Eurofins DiscoveryAI, Eurofins Panlabs, Inc., Saint Charles, MO 63304, USA
| | - Carleton R Sage
- Eurofins DiscoveryAI, Eurofins Panlabs, Inc., Saint Charles, MO 63304, USA
| |
Collapse
|
9
|
Xu W, Diesen E, He T, Reuter K, Margraf JT. Discovering High Entropy Alloy Electrocatalysts in Vast Composition Spaces with Multiobjective Optimization. J Am Chem Soc 2024; 146:7698-7707. [PMID: 38466356 PMCID: PMC10958507 DOI: 10.1021/jacs.3c14486] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2023] [Revised: 02/21/2024] [Accepted: 02/26/2024] [Indexed: 03/13/2024]
Abstract
High entropy alloys (HEAs) are a highly promising class of materials for electrocatalysis as their unique active site distributions break the scaling relations that limit the activity of conventional transition metal catalysts. Existing Bayesian optimization (BO)-based virtual screening approaches focus on catalytic activity as the sole objective and correspondingly tend to identify promising materials that are unlikely to be entropically stabilized. Here, we overcome this limitation with a multiobjective BO framework for HEAs that simultaneously targets activity, cost-effectiveness, and entropic stabilization. With diversity-guided batch selection further boosting its data efficiency, the framework readily identifies numerous promising candidates for the oxygen reduction reaction that strike the balance between all three objectives in hitherto unchartered HEA design spaces comprising up to 10 elements.
Collapse
Affiliation(s)
- Wenbin Xu
- Fritz-Haber-Institut
der Max-Planck-Gesellschaft, Berlin D-14195, Germany
- Lawrence
Berkeley National Laboratory, Berkeley, California 94720, United States
| | - Elias Diesen
- Fritz-Haber-Institut
der Max-Planck-Gesellschaft, Berlin D-14195, Germany
| | - Tianwei He
- Yunnan
Key Laboratory for Micro/Nano Materials & Technology, National
Center for International Research on Photoelectric and Energy Materials,
School of Materials and Energy, Yunnan University, Kunming 650091, China
| | - Karsten Reuter
- Fritz-Haber-Institut
der Max-Planck-Gesellschaft, Berlin D-14195, Germany
| | - Johannes T. Margraf
- Fritz-Haber-Institut
der Max-Planck-Gesellschaft, Berlin D-14195, Germany
- Bavarian
Center for Battery Technology (BayBatt), University of Bayreuth, Bayreuth D-95447, Germany
| |
Collapse
|
10
|
Gallarati S, van Gerwen P, Laplaza R, Brey L, Makaveev A, Corminboeuf C. A genetic optimization strategy with generality in asymmetric organocatalysis as a primary target. Chem Sci 2024; 15:3640-3660. [PMID: 38455002 PMCID: PMC10915838 DOI: 10.1039/d3sc06208b] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2023] [Accepted: 01/30/2024] [Indexed: 03/09/2024] Open
Abstract
A catalyst possessing a broad substrate scope, in terms of both turnover and enantioselectivity, is sometimes called "general". Despite their great utility in asymmetric synthesis, truly general catalysts are difficult or expensive to discover via traditional high-throughput screening and are, therefore, rare. Existing computational tools accelerate the evaluation of reaction conditions from a pre-defined set of experiments to identify the most general ones, but cannot generate entirely new catalysts with enhanced substrate breadth. For these reasons, we report an inverse design strategy based on the open-source genetic algorithm NaviCatGA and on the OSCAR database of organocatalysts to simultaneously probe the catalyst and substrate scope and optimize generality as a primary target. We apply this strategy to the Pictet-Spengler condensation, for which we curate a database of 820 reactions, used to train statistical models of selectivity and activity. Starting from OSCAR, we define a combinatorial space of millions of catalyst possibilities, and perform evolutionary experiments on a diverse substrate scope that is representative of the whole chemical space of tetrahydro-β-carboline products. While privileged catalysts emerge, we show how genetic optimization can address the broader question of generality in asymmetric synthesis, extracting structure-performance relationships from the challenging areas of chemical space.
Collapse
Affiliation(s)
- Simone Gallarati
- Laboratory for Computational Molecular Design, Institute of Chemical Sciences and Engineering, Ecole Polytechnique Fédérale de Lausanne (EPFL) 1015 Lausanne Switzerland
| | - Puck van Gerwen
- Laboratory for Computational Molecular Design, Institute of Chemical Sciences and Engineering, Ecole Polytechnique Fédérale de Lausanne (EPFL) 1015 Lausanne Switzerland
- National Center for Competence in Research - Catalysis (NCCR-Catalysis), Ecole Polytechnique Fédérale de Lausanne (EPFL) 1015 Lausanne Switzerland
| | - Ruben Laplaza
- Laboratory for Computational Molecular Design, Institute of Chemical Sciences and Engineering, Ecole Polytechnique Fédérale de Lausanne (EPFL) 1015 Lausanne Switzerland
- National Center for Competence in Research - Catalysis (NCCR-Catalysis), Ecole Polytechnique Fédérale de Lausanne (EPFL) 1015 Lausanne Switzerland
| | - Lucien Brey
- Laboratory for Computational Molecular Design, Institute of Chemical Sciences and Engineering, Ecole Polytechnique Fédérale de Lausanne (EPFL) 1015 Lausanne Switzerland
| | - Alexander Makaveev
- Laboratory for Computational Molecular Design, Institute of Chemical Sciences and Engineering, Ecole Polytechnique Fédérale de Lausanne (EPFL) 1015 Lausanne Switzerland
| | - Clemence Corminboeuf
- Laboratory for Computational Molecular Design, Institute of Chemical Sciences and Engineering, Ecole Polytechnique Fédérale de Lausanne (EPFL) 1015 Lausanne Switzerland
- National Center for Competence in Research - Catalysis (NCCR-Catalysis), Ecole Polytechnique Fédérale de Lausanne (EPFL) 1015 Lausanne Switzerland
- National Center for Computational Design and Discovery of Novel Materials (MARVEL), Ecole Polytechnique Fédérale de Lausanne (EPFL) 1015 Lausanne Switzerland
| |
Collapse
|
11
|
Kaufman B, Williams EC, Underkoffler C, Pederson R, Mardirossian N, Watson I, Parkhill J. COATI: Multimodal Contrastive Pretraining for Representing and Traversing Chemical Space. J Chem Inf Model 2024; 64:1145-1157. [PMID: 38316665 DOI: 10.1021/acs.jcim.3c01753] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2024]
Abstract
Creating a successful small molecule drug is a challenging multiparameter optimization problem in an effectively infinite space of possible molecules. Generative models have emerged as powerful tools for traversing data manifolds composed of images, sounds, and text and offer an opportunity to dramatically improve the drug discovery and design process. To create generative optimization methods that are more useful than brute-force molecular generation and filtering via virtual screening, we propose that four integrated features are necessary: large, quantitative data sets of molecular structure and activity, an invertible vector representation of realistic accessible molecules, smooth and differentiable regressors that quantify uncertainty, and algorithms to simultaneously optimize properties of interest. Over the course of 12 months, Terray Therapeutics has collected a data set of 2 billion quantitative binding measurements of small molecules to therapeutic targets, which directly motivates multiparameter generative optimization of molecules conditioned on these data. To this end, we present contrastive optimization for accelerated therapeutic inference (COATI), a pretrained, multimodal encoder-decoder model of druglike chemical space. COATI is constructed without any human biasing of features, using contrastive learning from text and 3D representations of molecules to allow for downstream use with structural models. We demonstrate that COATI possesses many of the desired properties of universal molecular embedding: fixed-dimension, invertibility, autoencoding, accurate regression, and low computation cost. Finally, we present a novel metadynamics algorithm for generative optimization using a small subset of our proprietary data collected for a model protein, carbonic anhydrase, designing molecules that satisfy the multiparameter optimization task of potency, solubility, and drug likeness. This work sets the stage for fully integrated generative molecular design and optimization for small molecules.
Collapse
Affiliation(s)
- Benjamin Kaufman
- Terray Therapeutics, Inc., 800 Royal Oaks Dr, Monrovia, California 91016, United States
| | - Edward C Williams
- Terray Therapeutics, Inc., 800 Royal Oaks Dr, Monrovia, California 91016, United States
| | - Carl Underkoffler
- Terray Therapeutics, Inc., 800 Royal Oaks Dr, Monrovia, California 91016, United States
| | - Ryan Pederson
- Terray Therapeutics, Inc., 800 Royal Oaks Dr, Monrovia, California 91016, United States
| | - Narbe Mardirossian
- Terray Therapeutics, Inc., 800 Royal Oaks Dr, Monrovia, California 91016, United States
| | - Ian Watson
- Terray Therapeutics, Inc., 800 Royal Oaks Dr, Monrovia, California 91016, United States
| | - John Parkhill
- Terray Therapeutics, Inc., 800 Royal Oaks Dr, Monrovia, California 91016, United States
| |
Collapse
|
12
|
Loeffler HH, He J, Tibo A, Janet JP, Voronov A, Mervin LH, Engkvist O. Reinvent 4: Modern AI-driven generative molecule design. J Cheminform 2024; 16:20. [PMID: 38383444 PMCID: PMC10882833 DOI: 10.1186/s13321-024-00812-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2023] [Accepted: 02/09/2024] [Indexed: 02/23/2024] Open
Abstract
REINVENT 4 is a modern open-source generative AI framework for the design of small molecules. The software utilizes recurrent neural networks and transformer architectures to drive molecule generation. These generators are seamlessly embedded within the general machine learning optimization algorithms, transfer learning, reinforcement learning and curriculum learning. REINVENT 4 enables and facilitates de novo design, R-group replacement, library design, linker design, scaffold hopping and molecule optimization. This contribution gives an overview of the software and describes its design. Algorithms and their applications are discussed in detail. REINVENT 4 is a command line tool which reads a user configuration in either TOML or JSON format. The aim of this release is to provide reference implementations for some of the most common algorithms in AI based molecule generation. An additional goal with the release is to create a framework for education and future innovation in AI based molecular design. The software is available from https://github.com/MolecularAI/REINVENT4 and released under the permissive Apache 2.0 license. Scientific contribution. The software provides an open-source reference implementation for generative molecular design where the software is also being used in production to support in-house drug discovery projects. The publication of the most common machine learning algorithms in one code and full documentation thereof will increase transparency of AI and foster innovation, collaboration and education.
Collapse
Affiliation(s)
- Hannes H Loeffler
- Molecular AI, Discovery Sciences, R&D, AstraZeneca, Gothenburg, Sweden.
| | - Jiazhen He
- Molecular AI, Discovery Sciences, R&D, AstraZeneca, Gothenburg, Sweden
| | - Alessandro Tibo
- Molecular AI, Discovery Sciences, R&D, AstraZeneca, Gothenburg, Sweden
| | - Jon Paul Janet
- Molecular AI, Discovery Sciences, R&D, AstraZeneca, Gothenburg, Sweden
| | - Alexey Voronov
- Molecular AI, Discovery Sciences, R&D, AstraZeneca, Gothenburg, Sweden
| | - Lewis H Mervin
- Molecular AI, Discovery Sciences, R&D, AstraZeneca, Cambridge, UK
| | - Ola Engkvist
- Molecular AI, Discovery Sciences, R&D, AstraZeneca, Gothenburg, Sweden
| |
Collapse
|
13
|
Kerstjens A, De Winter H. Molecule auto-correction to facilitate molecular design. J Comput Aided Mol Des 2024; 38:10. [PMID: 38363377 PMCID: PMC10873457 DOI: 10.1007/s10822-024-00549-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2023] [Accepted: 01/11/2024] [Indexed: 02/17/2024]
Abstract
Ensuring that computationally designed molecules are chemically reasonable is at best cumbersome. We present a molecule correction algorithm that morphs invalid molecular graphs into structurally related valid analogs. The algorithm is implemented as a tree search, guided by a set of policies to minimize its cost. We showcase how the algorithm can be applied to molecular design, either as a post-processing step or as an integral part of molecule generators.
Collapse
Affiliation(s)
- Alan Kerstjens
- Laboratory of Medicinal Chemistry, Department of Pharmaceutical Sciences, University of Antwerp, Universiteitslaan 1, 2610, Wilrijk, Belgium
| | - Hans De Winter
- Laboratory of Medicinal Chemistry, Department of Pharmaceutical Sciences, University of Antwerp, Universiteitslaan 1, 2610, Wilrijk, Belgium.
| |
Collapse
|
14
|
Hasselgren C, Oprea TI. Artificial Intelligence for Drug Discovery: Are We There Yet? Annu Rev Pharmacol Toxicol 2024; 64:527-550. [PMID: 37738505 DOI: 10.1146/annurev-pharmtox-040323-040828] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/24/2023]
Abstract
Drug discovery is adapting to novel technologies such as data science, informatics, and artificial intelligence (AI) to accelerate effective treatment development while reducing costs and animal experiments. AI is transforming drug discovery, as indicated by increasing interest from investors, industrial and academic scientists, and legislators. Successful drug discovery requires optimizing properties related to pharmacodynamics, pharmacokinetics, and clinical outcomes. This review discusses the use of AI in the three pillars of drug discovery: diseases, targets, and therapeutic modalities, with a focus on small-molecule drugs. AI technologies, such as generative chemistry, machine learning, and multiproperty optimization, have enabled several compounds to enter clinical trials. The scientific community must carefully vet known information to address the reproducibility crisis. The full potential of AI in drug discovery can only be realized with sufficient ground truth and appropriate human intervention at later pipeline stages.
Collapse
Affiliation(s)
- Catrin Hasselgren
- Safety Assessment, Genentech, Inc., South San Francisco, California, USA
| | - Tudor I Oprea
- Expert Systems Inc., San Diego, California, USA;
- Department of Internal Medicine, University of New Mexico Health Sciences Center, Albuquerque, New Mexico, USA
| |
Collapse
|
15
|
Back S, Aspuru-Guzik A, Ceriotti M, Gryn'ova G, Grzybowski B, Gu GH, Hein J, Hippalgaonkar K, Hormázabal R, Jung Y, Kim S, Kim WY, Moosavi SM, Noh J, Park C, Schrier J, Schwaller P, Tsuda K, Vegge T, von Lilienfeld OA, Walsh A. Accelerated chemical science with AI. DIGITAL DISCOVERY 2024; 3:23-33. [PMID: 38239898 PMCID: PMC10793638 DOI: 10.1039/d3dd00213f] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/25/2023] [Accepted: 12/06/2023] [Indexed: 01/22/2024]
Abstract
In light of the pressing need for practical materials and molecular solutions to renewable energy and health problems, to name just two examples, one wonders how to accelerate research and development in the chemical sciences, so as to address the time it takes to bring materials from initial discovery to commercialization. Artificial intelligence (AI)-based techniques, in particular, are having a transformative and accelerating impact on many if not most, technological domains. To shed light on these questions, the authors and participants gathered in person for the ASLLA Symposium on the theme of 'Accelerated Chemical Science with AI' at Gangneung, Republic of Korea. We present the findings, ideas, comments, and often contentious opinions expressed during four panel discussions related to the respective general topics: 'Data', 'New applications', 'Machine learning algorithms', and 'Education'. All discussions were recorded, transcribed into text using Open AI's Whisper, and summarized using LG AI Research's EXAONE LLM, followed by revision by all authors. For the broader benefit of current researchers, educators in higher education, and academic bodies such as associations, publishers, librarians, and companies, we provide chemistry-specific recommendations and summarize the resulting conclusions.
Collapse
Affiliation(s)
- Seoin Back
- Department of Chemical and Biomolecular Engineering, Institute of Emergent Materials, Sogang University Seoul Republic of Korea
| | - Alán Aspuru-Guzik
- Departments of Chemistry, Computer Science, University of Toronto St. George Campus Toronto ON Canada
- Acceleration Consortium and Vector Institute for Artificial Intelligence Toronto ON M5S 1M1 Canada
| | - Michele Ceriotti
- Laboratory of Computational Science and Modeling (COSMO), École Polytechnique Fédérale de Lausanne Lausanne Switzerland
| | - Ganna Gryn'ova
- Heidelberg Institute for Theoretical Studies (HITS gGmbH) 69118 Heidelberg Germany
- Interdisciplinary Center for Scientific Computing, Heidelberg University 69120 Heidelberg Germany
| | - Bartosz Grzybowski
- Center for Algorithmic and Robotized Synthesis (CARS), Institute for Basic Science (IBS) Ulsan Republic of Korea
- Institute of Organic Chemistry, Polish Academy of Sciences Warsaw Poland
- Department of Chemistry, Ulsan National Institute of Science and Technology Ulsan Republic of Korea
| | - Geun Ho Gu
- Department of Energy Engineering, Korea Institute of Energy Technology (KENTECH) Naju 58330 Republic of Korea
| | - Jason Hein
- Department of Chemistry, University of British Columbia Vancouver BC V6T 1Z1 Canada
| | - Kedar Hippalgaonkar
- School of Materials Science and Engineering, Nanyang Technological University 50 Nanyang Avenue Singapore 639798 Singapore
- Institute of Materials Research and Engineering, Agency for Science Technology and Research 2 Fusionopolis Way, 08-03 Singapore 138634 Singapore
| | | | - Yousung Jung
- Department of Chemical and Biomolecular Engineering, KAIST Daejeon Republic of Korea
- School of Chemical and Biological Engineering, Interdisciplinary Program in Artificial Intelligence, Seoul National University 1 Gwanak-ro, Gwanak-gu Seoul 08826 Republic of Korea
| | - Seonah Kim
- Department of Chemistry, Colorado State University 1301 Center Avenue Fort Collins CO 80523 USA
| | - Woo Youn Kim
- Department of Chemistry, KAIST Daejeon Republic of Korea
| | - Seyed Mohamad Moosavi
- Chemical Engineering & Applied Chemistry, University of Toronto Toronto Ontario M5S 3E5 Canada
| | - Juhwan Noh
- Chemical Data-Driven Research Center, Korea Research Institute of Chemical Technology Daejeon 34114 Republic of Korea
| | | | - Joshua Schrier
- Department of Chemistry, Fordham University The Bronx NY 10458 USA
| | - Philippe Schwaller
- Laboratory of Artificial Chemical Intelligence (LIAC) & National Centre of Competence in Research (NCCR) Catalysis, École Polytechnique Fédérale de Lausanne Lausanne Switzerland
| | - Koji Tsuda
- Graduate School of Frontier Sciences, The University of Tokyo Kashiwa Chiba 277-8561 Japan
- Center for Basic Research on Materials, National Institute for Materials Science Tsukuba Ibaraki 305-0044 Japan
- RIKEN Center for Advanced Intelligence Project Tokyo 103-0027 Japan
| | - Tejs Vegge
- Department of Energy Conversion and Storage, Technical University of Denmark 301 Anker Engelunds vej, Kongens Lyngby Copenhagen 2800 Denmark
| | - O Anatole von Lilienfeld
- Acceleration Consortium and Vector Institute for Artificial Intelligence Toronto ON M5S 1M1 Canada
- Departments of Chemistry, Materials Science and Engineering, and Physics, University of Toronto, St George Campus Toronto ON Canada
- Machine Learning Group, Technische Universität Berlin and Berlin Institute for the Foundations of Learning and Data 10587 Berlin Germany
| | - Aron Walsh
- Department of Materials, Imperial College London London SW7 2AZ UK
- Department of Physics, Ewha Women's University Seoul Republic of Korea
| |
Collapse
|
16
|
Dangat Y, Freindorf M, Kraka E. Mechanistic Insights into S-Depalmitolyse Activity of Cln5 Protein Linked to Neurodegeneration and Batten Disease: A QM/MM Study. J Am Chem Soc 2024; 146:145-158. [PMID: 38055807 DOI: 10.1021/jacs.3c06397] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/08/2023]
Abstract
Ceroid lipofuscinosis neuronal protein 5 (Cln5) is encoded by the CLN5 gene. The genetic variants of this gene are associated with the CLN5 form of Batten disease. Recently, the first crystal structure of Cln5 was reported. Cln5 shows cysteine palmitoyl thioesterase S-depalmitoylation activity, which was explored via fluorescent emission spectroscopy utilizing the fluorescent probe DDP-5. In this work, the mechanism of the reaction between Cln5 and DDP-5 was studied computationally by applying a QM/MM methodology at the ωB97X-D/6-31G(d,p):AMBER level. The results of our study clearly demonstrate the critical role of the catalytic triad Cys280-His166-Glu183 in S-depalmitoylation activity. This is evidenced through a comparison of the pathways catalyzed by the Cys280-His166-Glu183 triad and those with only Cys280 involved. The computed reaction barriers are in agreement with the catalytic efficiency. The calculated Gibb's free-energy profile suggests that S-depalmitoylation is a rate-limiting step compared to the preceding S-palmitoylation, with barriers of 26.1 and 25.3 kcal/mol, respectively. The energetics were complemented by monitoring the fluctuations in the electron density distribution through NBO charges and bond strength alterations via local mode stretching force constants during the catalytic pathways. This comprehensive protocol led to a more holistic picture of the reaction mechanism at the atomic level. It forms the foundation for future studies on the effects of gene mutations on both the S-palmitoylation and S-depalmitoylation steps, providing valuable data for the further development of enzyme replacement therapy, which is currently the only FDA-approved therapy for childhood neurodegenerative diseases, including Batten disease.
Collapse
Affiliation(s)
- Yuvraj Dangat
- Department of Chemistry, Southern Methodist University, 3215 Daniel Avenue, Dallas, Texas 75275-0314, United States
| | - Marek Freindorf
- Department of Chemistry, Southern Methodist University, 3215 Daniel Avenue, Dallas, Texas 75275-0314, United States
| | - Elfi Kraka
- Department of Chemistry, Southern Methodist University, 3215 Daniel Avenue, Dallas, Texas 75275-0314, United States
| |
Collapse
|
17
|
Koscher BA, Canty RB, McDonald MA, Greenman KP, McGill CJ, Bilodeau CL, Jin W, Wu H, Vermeire FH, Jin B, Hart T, Kulesza T, Li SC, Jaakkola TS, Barzilay R, Gómez-Bombarelli R, Green WH, Jensen KF. Autonomous, multiproperty-driven molecular discovery: From predictions to measurements and back. Science 2023; 382:eadi1407. [PMID: 38127734 DOI: 10.1126/science.adi1407] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2023] [Accepted: 11/09/2023] [Indexed: 12/23/2023]
Abstract
A closed-loop, autonomous molecular discovery platform driven by integrated machine learning tools was developed to accelerate the design of molecules with desired properties. We demonstrated two case studies on dye-like molecules, targeting absorption wavelength, lipophilicity, and photooxidative stability. In the first study, the platform experimentally realized 294 unreported molecules across three automatic iterations of molecular design-make-test-analyze cycles while exploring the structure-function space of four rarely reported scaffolds. In each iteration, the property prediction models that guided exploration learned the structure-property space of diverse scaffold derivatives, which were realized with multistep syntheses and a variety of reactions. The second study exploited property models trained on the explored chemical space and previously reported molecules to discover nine top-performing molecules within a lightly explored structure-property space.
Collapse
Affiliation(s)
- Brent A Koscher
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Richard B Canty
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Matthew A McDonald
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Kevin P Greenman
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Charles J McGill
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Camille L Bilodeau
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Wengong Jin
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Haoyang Wu
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Florence H Vermeire
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Brooke Jin
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Travis Hart
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Timothy Kulesza
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Shih-Cheng Li
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Tommi S Jaakkola
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Regina Barzilay
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Rafael Gómez-Bombarelli
- Department of Materials Science and Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - William H Green
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Klavs F Jensen
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA
| |
Collapse
|
18
|
Karandashev K, Weinreich J, Heinen S, Arismendi Arrieta DJ, von Rudorff GF, Hermansson K, von Lilienfeld OA. Evolutionary Monte Carlo of QM Properties in Chemical Space: Electrolyte Design. J Chem Theory Comput 2023; 19:8861-8870. [PMID: 38009856 PMCID: PMC10720348 DOI: 10.1021/acs.jctc.3c00822] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2023] [Revised: 10/29/2023] [Accepted: 10/30/2023] [Indexed: 11/29/2023]
Abstract
Optimizing a target function over the space of organic molecules is an important problem appearing in many fields of applied science but also a very difficult one due to the vast number of possible molecular systems. We propose an evolutionary Monte Carlo algorithm for solving such problems which is capable of straightforwardly tuning both exploration and exploitation characteristics of an optimization procedure while retaining favorable properties of genetic algorithms. The method, dubbed MOSAiCS (Metropolis Optimization by Sampling Adaptively in Chemical Space), is tested on problems related to optimizing components of battery electrolytes, namely, minimizing solvation energy in water or maximizing dipole moment while enforcing a lower bound on the HOMO-LUMO gap; optimization was carried out over sets of molecular graphs inspired by QM9 and Electrolyte Genome Project (EGP) data sets. MOSAiCS reliably generated molecular candidates with good target quantity values, which were in most cases better than the ones found in QM9 or EGP. While the optimization results presented in this work sometimes required up to 106 QM calculations and were thus feasible only thanks to computationally efficient ab initio approximations of properties of interest, we discuss possible strategies for accelerating MOSAiCS using machine learning approaches.
Collapse
Affiliation(s)
| | - Jan Weinreich
- Faculty
of Physics, University of Vienna, Kolingasse 14-16, AT-1090 Wien, Austria
| | - Stefan Heinen
- Vector
Institute for Artificial Intelligence, Toronto, M5S 1M1 Ontario, Canada
| | | | - Guido Falk von Rudorff
- Department
of Chemistry, University Kassel, Heinrich-Plett-Str.40, 34132 Kassel, Germany
- Center
for Interdisciplinary Nanostructure Science and Technology (CINSaT), Heinrich-Plett-Straße 40, 34132 Kassel, Germany
| | - Kersti Hermansson
- Department
of Chemistry-Ångström Laboratory, Uppsala University, Box 538, SE-75121 Uppsala, Sweden
| | - O. Anatole von Lilienfeld
- Vector
Institute for Artificial Intelligence, Toronto, M5S 1M1 Ontario, Canada
- Departments
of Chemistry, Materials Science and Engineering, and Physics, University of Toronto, St. George
Campus, Toronto, M5S 1A1 Ontario, Canada
- Machine
Learning Group, Technische Universität
Berlin and Institute for the Foundations of Learning and Data, 10587 Berlin, Germany
| |
Collapse
|
19
|
Du H, Jiang D, Zhang O, Wu Z, Gao J, Zhang X, Wang X, Deng Y, Kang Y, Li D, Pan P, Hsieh CY, Hou T. A flexible data-free framework for structure-based de novo drug design with reinforcement learning. Chem Sci 2023; 14:12166-12181. [PMID: 37969589 PMCID: PMC10631243 DOI: 10.1039/d3sc04091g] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2023] [Accepted: 10/11/2023] [Indexed: 11/17/2023] Open
Abstract
Contemporary structure-based molecular generative methods have demonstrated their potential to model the geometric and energetic complementarity between ligands and receptors, thereby facilitating the design of molecules with favorable binding affinity and target specificity. Despite the introduction of deep generative models for molecular generation, the atom-wise generation paradigm that partially contradicts chemical intuition limits the validity and synthetic accessibility of the generated molecules. Additionally, the dependence of deep learning models on large-scale structural data has hindered their adaptability across different targets. To overcome these challenges, we present a novel search-based framework, 3D-MCTS, for structure-based de novo drug design. Distinct from prevailing atom-centric methods, 3D-MCTS employs a fragment-based molecular editing strategy. The fragments decomposed from small-molecule drugs are recombined under predefined retrosynthetic rules, offering improved drug-likeness and synthesizability, overcoming the inherent limitations of atom-based approaches. Leveraging multi-threaded parallel simulations combined with a real-time energy constraint-based pruning strategy, 3D-MCTS achieves remarkable efficiency. At a fixed computational cost, it outperforms other state-of-the-art (SOTA) methods by producing molecules with enhanced binding affinity. Furthermore, its fragment-based approach ensures the generation of more dependable binding conformations, exhibiting a success rate 43.6% higher than that of other SOTAs. This advantage becomes even more pronounced when handling targets that significantly deviate from the training dataset. 3D-MCTS is capable of achieving thirty times more hits with high binding affinity than traditional virtual screening methods, which demonstrates the superior ability of 3D-MCTS to explore chemical space. Moreover, the flexibility of our framework makes it easy to incorporate domain knowledge during the process, thereby enabling the generation of molecules with desirable pharmacophores and enhanced binding affinity. The adaptability of 3D-MCTS is further showcased in metalloprotein applications, highlighting its potential across various drug design scenarios.
Collapse
Affiliation(s)
- Hongyan Du
- College of Pharmaceutical Sciences, Zhejiang University Hangzhou 310058 Zhejiang China
| | - Dejun Jiang
- College of Pharmaceutical Sciences, Zhejiang University Hangzhou 310058 Zhejiang China
| | - Odin Zhang
- College of Pharmaceutical Sciences, Zhejiang University Hangzhou 310058 Zhejiang China
| | - Zhenxing Wu
- College of Pharmaceutical Sciences, Zhejiang University Hangzhou 310058 Zhejiang China
| | - Junbo Gao
- College of Pharmaceutical Sciences, Zhejiang University Hangzhou 310058 Zhejiang China
| | - Xujun Zhang
- College of Pharmaceutical Sciences, Zhejiang University Hangzhou 310058 Zhejiang China
| | - Xiaorui Wang
- Hangzhou Carbonsilicon AI Technology Co., Ltd Hangzhou 310018 Zhejiang China
- Dr. Neher's Biophysics Laboratory for Innovative Drug Discovery, State Key Laboratory of Quality Research in Chinese Medicine, Macau Institute for Applied Research in Medicine and Health, Macau University of Science and Technology Macao 999078 China
| | - Yafeng Deng
- Hangzhou Carbonsilicon AI Technology Co., Ltd Hangzhou 310018 Zhejiang China
| | - Yu Kang
- College of Pharmaceutical Sciences, Zhejiang University Hangzhou 310058 Zhejiang China
| | - Dan Li
- College of Pharmaceutical Sciences, Zhejiang University Hangzhou 310058 Zhejiang China
| | - Peichen Pan
- College of Pharmaceutical Sciences, Zhejiang University Hangzhou 310058 Zhejiang China
| | - Chang-Yu Hsieh
- College of Pharmaceutical Sciences, Zhejiang University Hangzhou 310058 Zhejiang China
| | - Tingjun Hou
- College of Pharmaceutical Sciences, Zhejiang University Hangzhou 310058 Zhejiang China
| |
Collapse
|
20
|
Casetti N, Alfonso-Ramos JE, Coley CW, Stuyver T. Combining Molecular Quantum Mechanical Modeling and Machine Learning for Accelerated Reaction Screening and Discovery. Chemistry 2023; 29:e202301957. [PMID: 37526059 DOI: 10.1002/chem.202301957] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2023] [Revised: 07/30/2023] [Accepted: 07/31/2023] [Indexed: 08/02/2023]
Abstract
Molecular quantum mechanical modeling, accelerated by machine learning, has opened the door to high-throughput screening campaigns of complex properties, such as the activation energies of chemical reactions and absorption/emission spectra of materials and molecules; in silico. Here, we present an overview of the main principles, concepts, and design considerations involved in such hybrid computational quantum chemistry/machine learning screening workflows, with a special emphasis on some recent examples of their successful application. We end with a brief outlook of further advances that will benefit the field.
Collapse
Affiliation(s)
- Nicholas Casetti
- Department of Chemical Engineering, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, Massachusetts, 02139, United States
| | - Javier E Alfonso-Ramos
- Ecole Nationale Supérieure de Chimie de Paris, Université PSL, CNRS, Institute of Chemistry for Life and Health Sciences, 75005, Paris, France
| | - Connor W Coley
- Department of Chemical Engineering, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, Massachusetts, 02139, United States
- Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, Massachusetts, 02139, United States
| | - Thijs Stuyver
- Ecole Nationale Supérieure de Chimie de Paris, Université PSL, CNRS, Institute of Chemistry for Life and Health Sciences, 75005, Paris, France
| |
Collapse
|
21
|
Lu H, Kang X, Yu H, Zhang W, Luo Y. Using a single complex to predict the reaction energy profile: a case study of Pd/Ni-catalyzed ethylene polymerization. Dalton Trans 2023; 52:14790-14796. [PMID: 37807861 DOI: 10.1039/d3dt02745g] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/10/2023]
Abstract
Mechanism-driven catalyst screening could be greatly accelerated by quantitative prediction models of the reaction energy profile. Here, we propose a novel method for molecular representation, taking palladium- and nickel-catalyzed ethylene polymerization as model reactions. The geometric parameters (GPfra) and electron occupancies (EOfra) from the non-ligand fragment of the η3-complex were extracted as the molecular descriptors, followed by constructing the reaction energy profile prediction models on the basis of various regression algorithms. The models showed great accuracy with respect to both theoretical and experimental data. More importantly, the models are convenient for training and utilization. On one hand, all the features were easily captured from the single η3-complex. On the other hand, further investigation also demonstrated that the models could be constructed with a small training sample size. We believe that our featurization method could possibly be generalized to more organometallic reactions and paves the way to efficient catalyst design.
Collapse
Affiliation(s)
- Han Lu
- State Key Laboratory of Fine Chemicals, School of Chemical Engineering, Dalian University of Technology, Dalian 116024, China.
| | - Xiaohui Kang
- College of Pharmacy, Dalian Medical University, Dalian 116044, China
| | - Hang Yu
- Liaoning Key Laboratory of Clean Energy, Shenyang Aerospace University, Shenyang 110136, China
| | - Wenzhen Zhang
- State Key Laboratory of Fine Chemicals, School of Chemical Engineering, Dalian University of Technology, Dalian 116024, China.
| | - Yi Luo
- State Key Laboratory of Fine Chemicals, School of Chemical Engineering, Dalian University of Technology, Dalian 116024, China.
- PetroChina Petrochemical Research Institute, Beijing 102206, China
| |
Collapse
|
22
|
Schrier J, Norquist AJ, Buonassisi T, Brgoch J. In Pursuit of the Exceptional: Research Directions for Machine Learning in Chemical and Materials Science. J Am Chem Soc 2023; 145:21699-21716. [PMID: 37754929 DOI: 10.1021/jacs.3c04783] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/28/2023]
Abstract
Exceptional molecules and materials with one or more extraordinary properties are both technologically valuable and fundamentally interesting, because they often involve new physical phenomena or new compositions that defy expectations. Historically, exceptionality has been achieved through serendipity, but recently, machine learning (ML) and automated experimentation have been widely proposed to accelerate target identification and synthesis planning. In this Perspective, we argue that the data-driven methods commonly used today are well-suited for optimization but not for the realization of new exceptional materials or molecules. Finding such outliers should be possible using ML, but only by shifting away from using traditional ML approaches that tweak the composition, crystal structure, or reaction pathway. We highlight case studies of high-Tc oxide superconductors and superhard materials to demonstrate the challenges of ML-guided discovery and discuss the limitations of automation for this task. We then provide six recommendations for the development of ML methods capable of exceptional materials discovery: (i) Avoid the tyranny of the middle and focus on extrema; (ii) When data are limited, qualitative predictions that provide direction are more valuable than interpolative accuracy; (iii) Sample what can be made and how to make it and defer optimization; (iv) Create room (and look) for the unexpected while pursuing your goal; (v) Try to fill-in-the-blanks of input and output space; (vi) Do not confuse human understanding with model interpretability. We conclude with a description of how these recommendations can be integrated into automated discovery workflows, which should enable the discovery of exceptional molecules and materials.
Collapse
Affiliation(s)
- Joshua Schrier
- Department of Chemistry, Fordham University, The Bronx, New York 10458, United States
| | - Alexander J Norquist
- Department of Chemistry, Haverford College, Haverford, Pennsylvania 19041, United States
| | - Tonio Buonassisi
- Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Jakoah Brgoch
- Department of Chemistry and Texas Center for Superconductivity, University of Houston, Houston, Texas 77204, United States
| |
Collapse
|
23
|
Kerstjens A, De Winter H. A molecule perturbation software library and its application to study the effects of molecular design constraints. J Cheminform 2023; 15:89. [PMID: 37752561 PMCID: PMC10523775 DOI: 10.1186/s13321-023-00761-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2023] [Accepted: 09/15/2023] [Indexed: 09/28/2023] Open
Abstract
Computational molecular design can yield chemically unreasonable compounds when performed carelessly. A popular strategy to mitigate this risk is mimicking reference chemistry. This is commonly achieved by restricting the way in which molecules are constructed or modified. While it is well established that such an approach helps in designing chemically appealing molecules, concerns about these restrictions impacting chemical space exploration negatively linger. In this work we present a software library for constrained graph-based molecule manipulation and showcase its functionality by developing a molecule generator. Said generator designs molecules mimicking reference chemical features of differing granularity. We find that restricting molecular construction lightly, beyond the usual positive effects on drug-likeness and synthesizability of designed molecules, provides guidance to optimization algorithms navigating chemical space. Nonetheless, restricting molecular construction excessively can indeed hinder effective chemical space exploration.
Collapse
Affiliation(s)
- Alan Kerstjens
- Laboratory of Medicinal Chemistry, Department of Pharmaceutical Sciences, University of Antwerp, Universiteitslaan 1, 2610, Wilrijk, Belgium
| | - Hans De Winter
- Laboratory of Medicinal Chemistry, Department of Pharmaceutical Sciences, University of Antwerp, Universiteitslaan 1, 2610, Wilrijk, Belgium.
| |
Collapse
|