1
|
Hoque A, Surve M, Kalyanakrishnan S, Sunoj RB. Reinforcement Learning for Improving Chemical Reaction Performance. J Am Chem Soc 2024. [PMID: 39356950 DOI: 10.1021/jacs.4c08866] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/04/2024]
Abstract
Deep learning (DL) methods have gained notable prominence in predictive and generative tasks in molecular space. However, their application in chemical reactions remains grossly underutilized. Chemical reactions are intrinsically complex: typically involving multiple molecules besides bond-breaking/forming events. In reaction discovery, one aims to maximize yield and/or selectivity that depends on a number of factors, mostly centered on reacting partners and reaction conditions. Herein, we introduce RE-EXPLORE, a novel approach that integrates deep reinforcement learning (RL) with an RNN-based deep generative model to identify prospective new reactants/catalysts, whose yield/selectivity is estimated using a pretrained regressor. Three chemical databases (ChEMBL, ZINC, and COCONUT containing half a million to one million unlabeled molecules) are independently used for pretraining the generators to enrich them with valuable information from diverse chemical space. Standard RL methods are found to be insufficient, as learners tend to prioritize exploitation for immediate gains, resulting in repetitive generation of same/similar molecules. Our engineered reward function includes a Tanimoto-based uniqueness factor within the RL loop that improved the exploration of the environment and has helped accrue larger returns. Integration of a user-defined core fragment into the generated molecules facilitated learning of specific reaction types. Together, RE-EXPLORE can navigate the reaction space toward practically meaningful regions and offers notable improvements across the three distinct reaction types considered in this study. It identifies high-yielding substrates and highly enantioselective chiral catalysts. This RL-based approach has the potential to expedite reaction discovery and aid in the synthesis planning of important compounds, including drugs and pharmaceuticals.
Collapse
Affiliation(s)
- Ajnabiul Hoque
- Department of Chemistry, Indian Institute of Technology Bombay, Powai, Mumbai 400076, India
| | - Mihir Surve
- Department of Chemistry, Indian Institute of Technology Bombay, Powai, Mumbai 400076, India
| | - Shivaram Kalyanakrishnan
- Department of Computer Science and Engineering, Indian Institute of Technology Bombay, Powai, Mumbai 400076, India
| | - Raghavan B Sunoj
- Department of Chemistry, Indian Institute of Technology Bombay, Powai, Mumbai 400076, India
- Center for Machine Intelligence and Data Science (CMInDS), Indian Institute of Technology Bombay, Powai, Mumbai 400076, India
| |
Collapse
|
2
|
Karandashev K, Weinreich J, Heinen S, Arismendi Arrieta DJ, von Rudorff GF, Hermansson K, von Lilienfeld OA. Evolutionary Monte Carlo of QM Properties in Chemical Space: Electrolyte Design. J Chem Theory Comput 2023; 19:8861-8870. [PMID: 38009856 PMCID: PMC10720348 DOI: 10.1021/acs.jctc.3c00822] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2023] [Revised: 10/29/2023] [Accepted: 10/30/2023] [Indexed: 11/29/2023]
Abstract
Optimizing a target function over the space of organic molecules is an important problem appearing in many fields of applied science but also a very difficult one due to the vast number of possible molecular systems. We propose an evolutionary Monte Carlo algorithm for solving such problems which is capable of straightforwardly tuning both exploration and exploitation characteristics of an optimization procedure while retaining favorable properties of genetic algorithms. The method, dubbed MOSAiCS (Metropolis Optimization by Sampling Adaptively in Chemical Space), is tested on problems related to optimizing components of battery electrolytes, namely, minimizing solvation energy in water or maximizing dipole moment while enforcing a lower bound on the HOMO-LUMO gap; optimization was carried out over sets of molecular graphs inspired by QM9 and Electrolyte Genome Project (EGP) data sets. MOSAiCS reliably generated molecular candidates with good target quantity values, which were in most cases better than the ones found in QM9 or EGP. While the optimization results presented in this work sometimes required up to 106 QM calculations and were thus feasible only thanks to computationally efficient ab initio approximations of properties of interest, we discuss possible strategies for accelerating MOSAiCS using machine learning approaches.
Collapse
Affiliation(s)
| | - Jan Weinreich
- Faculty
of Physics, University of Vienna, Kolingasse 14-16, AT-1090 Wien, Austria
| | - Stefan Heinen
- Vector
Institute for Artificial Intelligence, Toronto, M5S 1M1 Ontario, Canada
| | | | - Guido Falk von Rudorff
- Department
of Chemistry, University Kassel, Heinrich-Plett-Str.40, 34132 Kassel, Germany
- Center
for Interdisciplinary Nanostructure Science and Technology (CINSaT), Heinrich-Plett-Straße 40, 34132 Kassel, Germany
| | - Kersti Hermansson
- Department
of Chemistry-Ångström Laboratory, Uppsala University, Box 538, SE-75121 Uppsala, Sweden
| | - O. Anatole von Lilienfeld
- Vector
Institute for Artificial Intelligence, Toronto, M5S 1M1 Ontario, Canada
- Departments
of Chemistry, Materials Science and Engineering, and Physics, University of Toronto, St. George
Campus, Toronto, M5S 1A1 Ontario, Canada
- Machine
Learning Group, Technische Universität
Berlin and Institute for the Foundations of Learning and Data, 10587 Berlin, Germany
| |
Collapse
|
3
|
Chen L, Shen Q, Lou J. Magicmol: a light-weighted pipeline for drug-like molecule evolution and quick chemical space exploration. BMC Bioinformatics 2023; 24:173. [PMID: 37101113 PMCID: PMC10132416 DOI: 10.1186/s12859-023-05286-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2022] [Accepted: 04/13/2023] [Indexed: 04/28/2023] Open
Abstract
The flourishment of machine learning and deep learning methods has boosted the development of cheminformatics, especially regarding the application of drug discovery and new material exploration. Lower time and space expenses make it possible for scientists to search the enormous chemical space. Recently, some work combined reinforcement learning strategies with recurrent neural network (RNN)-based models to optimize the property of generated small molecules, which notably improved a batch of critical factors for these candidates. However, a common problem among these RNN-based methods is that several generated molecules have difficulty in synthesizing despite owning higher desired properties such as binding affinity. However, RNN-based framework better reproduces the molecule distribution among the training set than other categories of models during molecule exploration tasks. Thus, to optimize the whole exploration process and make it contribute to the optimization of specified molecules, we devised a light-weighted pipeline called Magicmol; this pipeline has a re-mastered RNN network and utilize SELFIES presentation instead of SMILES. Our backbone model achieved extraordinary performance while reducing the training cost; moreover, we devised reward truncate strategies to eliminate the model collapse problem. Additionally, adopting SELFIES presentation made it possible to combine STONED-SELFIES as a post-processing procedure for specified molecule optimization and quick chemical space exploration.
Collapse
Affiliation(s)
- Lin Chen
- Yangtze Delta Region (Huzhou) Institute of Intelligent Transportation, Huzhou University, Huzhou, China
- Zhejiang Province Key Laboratory of Smart Management and Application of Modern Agricultural Resources, School of Information Engineering, Huzhou University, Huzhou, China
| | - Qing Shen
- Yangtze Delta Region (Huzhou) Institute of Intelligent Transportation, Huzhou University, Huzhou, China
- School of Electronic Information, Huzhou College, Huzhou, China
| | - Jungang Lou
- Yangtze Delta Region (Huzhou) Institute of Intelligent Transportation, Huzhou University, Huzhou, China.
- Zhejiang Province Key Laboratory of Smart Management and Application of Modern Agricultural Resources, School of Information Engineering, Huzhou University, Huzhou, China.
| |
Collapse
|
4
|
Krenn M, Pollice R, Guo SY, Aldeghi M, Cervera-Lierta A, Friederich P, dos Passos Gomes G, Häse F, Jinich A, Nigam A, Yao Z, Aspuru-Guzik A. On scientific understanding with artificial intelligence. NATURE REVIEWS. PHYSICS 2022; 4:761-769. [PMID: 36247217 PMCID: PMC9552145 DOI: 10.1038/s42254-022-00518-3] [Citation(s) in RCA: 33] [Impact Index Per Article: 16.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Accepted: 08/30/2022] [Indexed: 05/27/2023]
Abstract
An oracle that correctly predicts the outcome of every particle physics experiment, the products of every possible chemical reaction or the function of every protein would revolutionize science and technology. However, scientists would not be entirely satisfied because they would want to comprehend how the oracle made these predictions. This is scientific understanding, one of the main aims of science. With the increase in the available computational power and advances in artificial intelligence, a natural question arises: how can advanced computational systems, and specifically artificial intelligence, contribute to new scientific understanding or gain it autonomously? Trying to answer this question, we adopted a definition of 'scientific understanding' from the philosophy of science that enabled us to overview the scattered literature on the topic and, combined with dozens of anecdotes from scientists, map out three dimensions of computer-assisted scientific understanding. For each dimension, we review the existing state of the art and discuss future developments. We hope that this Perspective will inspire and focus research directions in this multidisciplinary emerging field.
Collapse
Affiliation(s)
- Mario Krenn
- Max Planck Institute for the Science of Light (MPL), Erlangen, Germany
- Chemical Physics Theory Group, Department of Chemistry, University of Toronto, Toronto, Ontario Canada
- Department of Computer Science, University of Toronto, Toronto, Ontario Canada
- Vector Institute for Artificial Intelligence, Toronto, Ontario Canada
| | - Robert Pollice
- Chemical Physics Theory Group, Department of Chemistry, University of Toronto, Toronto, Ontario Canada
- Department of Computer Science, University of Toronto, Toronto, Ontario Canada
| | - Si Yue Guo
- Chemical Physics Theory Group, Department of Chemistry, University of Toronto, Toronto, Ontario Canada
| | - Matteo Aldeghi
- Chemical Physics Theory Group, Department of Chemistry, University of Toronto, Toronto, Ontario Canada
- Department of Computer Science, University of Toronto, Toronto, Ontario Canada
- Vector Institute for Artificial Intelligence, Toronto, Ontario Canada
| | - Alba Cervera-Lierta
- Chemical Physics Theory Group, Department of Chemistry, University of Toronto, Toronto, Ontario Canada
- Department of Computer Science, University of Toronto, Toronto, Ontario Canada
| | - Pascal Friederich
- Chemical Physics Theory Group, Department of Chemistry, University of Toronto, Toronto, Ontario Canada
- Department of Computer Science, University of Toronto, Toronto, Ontario Canada
- Institute of Nanotechnology, Karlsruhe Institute of Technology, Eggenstein-Leopoldshafen, Germany
| | - Gabriel dos Passos Gomes
- Chemical Physics Theory Group, Department of Chemistry, University of Toronto, Toronto, Ontario Canada
- Department of Computer Science, University of Toronto, Toronto, Ontario Canada
| | - Florian Häse
- Chemical Physics Theory Group, Department of Chemistry, University of Toronto, Toronto, Ontario Canada
- Department of Computer Science, University of Toronto, Toronto, Ontario Canada
- Vector Institute for Artificial Intelligence, Toronto, Ontario Canada
- Department of Chemistry and Chemical Biology, Harvard University, Cambridge, MA USA
| | - Adrian Jinich
- Division of Infectious Diseases, Weill Department of Medicine, Weill Cornell Medical College, New York, USA
| | - AkshatKumar Nigam
- Chemical Physics Theory Group, Department of Chemistry, University of Toronto, Toronto, Ontario Canada
- Department of Computer Science, University of Toronto, Toronto, Ontario Canada
| | - Zhenpeng Yao
- Chemical Physics Theory Group, Department of Chemistry, University of Toronto, Toronto, Ontario Canada
- Center of Hydrogen Science, Shanghai Jiao Tong University, Shanghai, China
- State Key Laboratory of Metal Matrix Composites, School of Materials Science and Engineering, Shanghai Jiao Tong University, Shanghai, China
- Innovation Center for Future Materials, Zhangjiang Institute for Advanced Study, Shanghai Jiao Tong University, Shanghai, China
| | - Alán Aspuru-Guzik
- Chemical Physics Theory Group, Department of Chemistry, University of Toronto, Toronto, Ontario Canada
- Department of Computer Science, University of Toronto, Toronto, Ontario Canada
- Vector Institute for Artificial Intelligence, Toronto, Ontario Canada
- Canadian Institute for Advanced Research (CIFAR) Lebovic Fellow, Toronto, Ontario Canada
| |
Collapse
|