1
|
Hoque A, Surve M, Kalyanakrishnan S, Sunoj RB. Reinforcement Learning for Improving Chemical Reaction Performance. J Am Chem Soc 2024. [PMID: 39356950 DOI: 10.1021/jacs.4c08866] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/04/2024]
Abstract
Deep learning (DL) methods have gained notable prominence in predictive and generative tasks in molecular space. However, their application in chemical reactions remains grossly underutilized. Chemical reactions are intrinsically complex: typically involving multiple molecules besides bond-breaking/forming events. In reaction discovery, one aims to maximize yield and/or selectivity that depends on a number of factors, mostly centered on reacting partners and reaction conditions. Herein, we introduce RE-EXPLORE, a novel approach that integrates deep reinforcement learning (RL) with an RNN-based deep generative model to identify prospective new reactants/catalysts, whose yield/selectivity is estimated using a pretrained regressor. Three chemical databases (ChEMBL, ZINC, and COCONUT containing half a million to one million unlabeled molecules) are independently used for pretraining the generators to enrich them with valuable information from diverse chemical space. Standard RL methods are found to be insufficient, as learners tend to prioritize exploitation for immediate gains, resulting in repetitive generation of same/similar molecules. Our engineered reward function includes a Tanimoto-based uniqueness factor within the RL loop that improved the exploration of the environment and has helped accrue larger returns. Integration of a user-defined core fragment into the generated molecules facilitated learning of specific reaction types. Together, RE-EXPLORE can navigate the reaction space toward practically meaningful regions and offers notable improvements across the three distinct reaction types considered in this study. It identifies high-yielding substrates and highly enantioselective chiral catalysts. This RL-based approach has the potential to expedite reaction discovery and aid in the synthesis planning of important compounds, including drugs and pharmaceuticals.
Collapse
Affiliation(s)
- Ajnabiul Hoque
- Department of Chemistry, Indian Institute of Technology Bombay, Powai, Mumbai 400076, India
| | - Mihir Surve
- Department of Chemistry, Indian Institute of Technology Bombay, Powai, Mumbai 400076, India
| | - Shivaram Kalyanakrishnan
- Department of Computer Science and Engineering, Indian Institute of Technology Bombay, Powai, Mumbai 400076, India
| | - Raghavan B Sunoj
- Department of Chemistry, Indian Institute of Technology Bombay, Powai, Mumbai 400076, India
- Center for Machine Intelligence and Data Science (CMInDS), Indian Institute of Technology Bombay, Powai, Mumbai 400076, India
| |
Collapse
|
2
|
Tomasini M, Voccia M, Caporaso L, Szostak M, Poater A. Tuning the steric hindrance of alkylamines: a predictive model of steric editing of planar amines. Chem Sci 2024; 15:13405-13414. [PMID: 39183899 PMCID: PMC11339794 DOI: 10.1039/d4sc03873h] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2024] [Accepted: 07/10/2024] [Indexed: 08/27/2024] Open
Abstract
Amines are one of the most prevalent functional groups in chemistry. Perhaps even more importantly, amines represent one of the most ubiquitous moieties within the realm of bioactive natural products and life-saving pharmaceuticals. The archetypal geometrical property of amines is their sp3 hybridization with the lone pair of nitrogen occupying the apex of the pyramid. Herein, we present a blueprint for quantifying the properties of extremely sterically hindered alkylamines. These amines reach planarity around the nitrogen atom due to the excessive steric hindrance, which results in a conformational re-modeling of the amine moiety. Crucially, the steric properties of amines are characterized by the %V Bur index, which we show is a general predictive parameter for evaluating the properties of sterically hindered amines. Computational studies on the acidic nature and the reactivity of organometallic Au and Pd complexes are outlined. Density functional theory calculations permit for predictive catalysis, ordering the mapping of extremely hindered tertiary amines by employing artificial intelligence via machine learning. Overall, the study outlines the correlation between the unusual geometry and the key thermodynamic and kinetic properties of extremely hindered alkylamines. The steric hindrance, as quantified by %V Bur, is the crucial factor influencing the observed trends and the space required to accommodate sterically hindered tertiary amines.
Collapse
Affiliation(s)
- Michele Tomasini
- Institut de Química Computacional i Catàlisi, Departament de Química, Universitat de Girona c/Ma Aurèlia Capmany 69 17003 Girona Catalonia Spain
- Dipartimento di Chimica e Biologia, Università di Salerno Via Ponte don Melillo 84084 Fisciano Italy
| | - Maria Voccia
- Institut de Química Computacional i Catàlisi, Departament de Química, Universitat de Girona c/Ma Aurèlia Capmany 69 17003 Girona Catalonia Spain
- Dipartimento di Chimica e Biologia, Università di Salerno Via Ponte don Melillo 84084 Fisciano Italy
| | - Lucia Caporaso
- Dipartimento di Chimica e Biologia, Università di Salerno Via Ponte don Melillo 84084 Fisciano Italy
| | - Michal Szostak
- Department of Chemistry, Rutgers University 73 Warren Street Newark New Jersey 07102 USA
| | - Albert Poater
- Institut de Química Computacional i Catàlisi, Departament de Química, Universitat de Girona c/Ma Aurèlia Capmany 69 17003 Girona Catalonia Spain
| |
Collapse
|
3
|
van Gerwen P, Briling KR, Bunne C, Somnath VR, Laplaza R, Krause A, Corminboeuf C. 3DReact: Geometric Deep Learning for Chemical Reactions. J Chem Inf Model 2024; 64:5771-5785. [PMID: 39007724 PMCID: PMC11323278 DOI: 10.1021/acs.jcim.4c00104] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2024] [Revised: 07/03/2024] [Accepted: 07/08/2024] [Indexed: 07/16/2024]
Abstract
Geometric deep learning models, which incorporate the relevant molecular symmetries within the neural network architecture, have considerably improved the accuracy and data efficiency of predictions of molecular properties. Building on this success, we introduce 3DReact, a geometric deep learning model to predict reaction properties from three-dimensional structures of reactants and products. We demonstrate that the invariant version of the model is sufficient for existing reaction data sets. We illustrate its competitive performance on the prediction of activation barriers on the GDB7-22-TS, Cyclo-23-TS, and Proparg-21-TS data sets in different atom-mapping regimes. We show that, compared to existing models for reaction property prediction, 3DReact offers a flexible framework that exploits atom-mapping information, if available, as well as geometries of reactants and products (in an invariant or equivariant fashion). Accordingly, it performs systematically well across different data sets, atom-mapping regimes, as well as both interpolation and extrapolation tasks.
Collapse
Affiliation(s)
- Puck van Gerwen
- Laboratory
for Computational Molecular Design, Institute of Chemical Sciences
and Engineering, École Polytechnique
Fédérale de Lausanne, 1015 Lausanne, Switzerland
- National
Center for Competence in Research − Catalysis (NCCR-Catalysis), École Polytechnique Fédérale
de Lausanne, 1015 Lausanne, Switzerland
| | - Ksenia R. Briling
- Laboratory
for Computational Molecular Design, Institute of Chemical Sciences
and Engineering, École Polytechnique
Fédérale de Lausanne, 1015 Lausanne, Switzerland
| | - Charlotte Bunne
- National
Center for Competence in Research − Catalysis (NCCR-Catalysis), École Polytechnique Fédérale
de Lausanne, 1015 Lausanne, Switzerland
- Learning
& Adaptive Systems Group, Department of Computer Science, ETH Zurich, 8092 Zurich, Switzerland
| | - Vignesh Ram Somnath
- National
Center for Competence in Research − Catalysis (NCCR-Catalysis), École Polytechnique Fédérale
de Lausanne, 1015 Lausanne, Switzerland
- Learning
& Adaptive Systems Group, Department of Computer Science, ETH Zurich, 8092 Zurich, Switzerland
| | - Ruben Laplaza
- Laboratory
for Computational Molecular Design, Institute of Chemical Sciences
and Engineering, École Polytechnique
Fédérale de Lausanne, 1015 Lausanne, Switzerland
- National
Center for Competence in Research − Catalysis (NCCR-Catalysis), École Polytechnique Fédérale
de Lausanne, 1015 Lausanne, Switzerland
| | - Andreas Krause
- National
Center for Competence in Research − Catalysis (NCCR-Catalysis), École Polytechnique Fédérale
de Lausanne, 1015 Lausanne, Switzerland
- Learning
& Adaptive Systems Group, Department of Computer Science, ETH Zurich, 8092 Zurich, Switzerland
| | - Clemence Corminboeuf
- Laboratory
for Computational Molecular Design, Institute of Chemical Sciences
and Engineering, École Polytechnique
Fédérale de Lausanne, 1015 Lausanne, Switzerland
- National
Center for Competence in Research − Catalysis (NCCR-Catalysis), École Polytechnique Fédérale
de Lausanne, 1015 Lausanne, Switzerland
| |
Collapse
|
4
|
Migliaro I, Cundari TR. Integrated Study on Methane Activation: Exploring Main Group Frustrated Lewis Pairs through Density Functional Theory, Machine Learning, and Machine-Learned Force Fields. J Chem Theory Comput 2024; 20:6388-6401. [PMID: 38941286 DOI: 10.1021/acs.jctc.4c00354] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/30/2024]
Abstract
Frustrated Lewis Pairs (FLP) are an important advance in metal-free catalysis due to their ability to activate a variety of small molecules. Many studies have focused on a very limited sample of Lewis acids and bases. Herein, we disclose an automated exploration algorithm using density functional methods, artificial neural networks (ANNs), and a molecule builder that incentivizes the exploration of favorable FLP space for the activation of methane via two mechanisms: deprotonation and hydride abstraction. The exploration algorithm creates FLPs with different Lewis acids (LA), Lewis bases (LB), and their substituents (LA/LB), which proved successful in quickly converging in the favorable chemical space, suggesting chemically sound structures, and generating thousands of potential candidates for methane activating FLPs. By modeling thousands of reactions, an FLP database of methane activation was created, allowing one to data mine properties, e.g., adduct bond length, highest occupied molecular orbital-lowest-unoccupied molecular orbital (HOMO-LUMO) gap, global electrophilicity index, favored Lewis acids/bases/substituents, and substituent steric volume. These properties not only successfully narrow the FLP chemical space but also provide meaningful insight into the chemical nature of competent methane activators. The machine learning discovery strategy disclosed here is general enough to be applicable to many chemical optimization tasks. This study also investigates the efficacy of a Machine-Learned Force Field (MLFF) in predicting the formation energies of Frustrated Lewis Pairs (FLPs). Our model, exhibiting a test error of ±10 kcal/mol, highlighted impressive computational efficiency by enabling the calculation of all possible FLP permutations within our chemical space. The MLFF demonstrated proficiency in predicting energies, providing a significant acceleration compared to quantum mechanics methods. However, challenges emerged in accurately capturing forces, necessitating recourse to classical force fields for reliable structure relaxation. The present study sheds light on the MLFF's potential as a tool for rapid energy predictions, emphasizing the need for further refinement to enhance its accuracy, particularly in force predictions, to expand its utility in chemical simulations.
Collapse
Affiliation(s)
- Ignacio Migliaro
- Department of Chemistry, Center of Advanced Scientific Computing and Modeling, University of North Texas, Denton, Texas 76203, United States
| | - Thomas R Cundari
- Department of Chemistry, Center of Advanced Scientific Computing and Modeling, University of North Texas, Denton, Texas 76203, United States
| |
Collapse
|
5
|
Kalikadien AV, Mirza A, Hossaini AN, Sreenithya A, Pidko EA. Paving the road towards automated homogeneous catalyst design. Chempluschem 2024; 89:e202300702. [PMID: 38279609 DOI: 10.1002/cplu.202300702] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2023] [Revised: 12/20/2023] [Indexed: 01/28/2024]
Abstract
In the past decade, computational tools have become integral to catalyst design. They continue to offer significant support to experimental organic synthesis and catalysis researchers aiming for optimal reaction outcomes. More recently, data-driven approaches utilizing machine learning have garnered considerable attention for their expansive capabilities. This Perspective provides an overview of diverse initiatives in the realm of computational catalyst design and introduces our automated tools tailored for high-throughput in silico exploration of the chemical space. While valuable insights are gained through methods for high-throughput in silico exploration and analysis of chemical space, their degree of automation and modularity are key. We argue that the integration of data-driven, automated and modular workflows is key to enhancing homogeneous catalyst design on an unprecedented scale, contributing to the advancement of catalysis research.
Collapse
Affiliation(s)
- Adarsh V Kalikadien
- Inorganic Systems Engineering, Department of Chemical Engineering, Faculty of Applied Sciences, Delft University of Technology, Van der Maasweg 9, 2629 HZ, Delft, The Netherlands
| | - Adrian Mirza
- Inorganic Systems Engineering, Department of Chemical Engineering, Faculty of Applied Sciences, Delft University of Technology, Van der Maasweg 9, 2629 HZ, Delft, The Netherlands
| | - Aydin Najl Hossaini
- Inorganic Systems Engineering, Department of Chemical Engineering, Faculty of Applied Sciences, Delft University of Technology, Van der Maasweg 9, 2629 HZ, Delft, The Netherlands
| | - Avadakkam Sreenithya
- Inorganic Systems Engineering, Department of Chemical Engineering, Faculty of Applied Sciences, Delft University of Technology, Van der Maasweg 9, 2629 HZ, Delft, The Netherlands
| | - Evgeny A Pidko
- Inorganic Systems Engineering, Department of Chemical Engineering, Faculty of Applied Sciences, Delft University of Technology, Van der Maasweg 9, 2629 HZ, Delft, The Netherlands
| |
Collapse
|
6
|
van Gerwen P, Briling KR, Calvino Alonso Y, Franke M, Corminboeuf C. Benchmarking machine-readable vectors of chemical reactions on computed activation barriers. DIGITAL DISCOVERY 2024; 3:932-943. [PMID: 38756222 PMCID: PMC11094696 DOI: 10.1039/d3dd00175j] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/06/2023] [Accepted: 02/28/2024] [Indexed: 05/18/2024]
Abstract
In recent years, there has been a surge of interest in predicting computed activation barriers, to enable the acceleration of the automated exploration of reaction networks. Consequently, various predictive approaches have emerged, ranging from graph-based models to methods based on the three-dimensional structure of reactants and products. In tandem, many representations have been developed to predict experimental targets, which may hold promise for barrier prediction as well. Here, we bring together all of these efforts and benchmark various methods (Morgan fingerprints, the DRFP, the CGR representation-based Chemprop, SLATMd, B2Rl2, EquiReact and language model BERT + RXNFP) for the prediction of computed activation barriers on three diverse datasets.
Collapse
Affiliation(s)
- Puck van Gerwen
- Laboratory for Computational Molecular Design, Institute of Chemical Sciences and Engineering, École Polytechnique Fédérale de Lausanne 1015 Lausanne Switzerland
- National Center for Competence in Research-Catalysis (NCCR-Catalysis), École Polytechnique Fédérale de Lausanne 1015 Lausanne Switzerland
| | - Ksenia R Briling
- Laboratory for Computational Molecular Design, Institute of Chemical Sciences and Engineering, École Polytechnique Fédérale de Lausanne 1015 Lausanne Switzerland
| | - Yannick Calvino Alonso
- Laboratory for Computational Molecular Design, Institute of Chemical Sciences and Engineering, École Polytechnique Fédérale de Lausanne 1015 Lausanne Switzerland
| | - Malte Franke
- Laboratory for Computational Molecular Design, Institute of Chemical Sciences and Engineering, École Polytechnique Fédérale de Lausanne 1015 Lausanne Switzerland
| | - Clemence Corminboeuf
- Laboratory for Computational Molecular Design, Institute of Chemical Sciences and Engineering, École Polytechnique Fédérale de Lausanne 1015 Lausanne Switzerland
- National Center for Competence in Research-Catalysis (NCCR-Catalysis), École Polytechnique Fédérale de Lausanne 1015 Lausanne Switzerland
| |
Collapse
|
7
|
Vadaddi SM, Zhao Q, Savoie BM. Graph to Activation Energy Models Easily Reach Irreducible Errors but Show Limited Transferability. J Phys Chem A 2024; 128:2543-2555. [PMID: 38517281 DOI: 10.1021/acs.jpca.3c07240] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/23/2024]
Abstract
Activation energy characterization of competing reactions is a costly but crucial step for understanding the kinetic relevance of distinct reaction pathways, product yields, and myriad other properties of reacting systems. The standard methodology for activation energy characterization has historically been a transition state search using the highest level of theory that can be afforded. However, recently, several groups have popularized the idea of predicting activation energies directly based on nothing more than the reactant and product graphs, a sufficiently complex neural network, and a broad enough data set. Here, we have revisited this task using the recently developed Reaction Graph Depth 1 (RGD1) transition state data set and several newly developed graph attention architectures. All of these new architectures achieve similar state-of-the-art results of ∼4 kcal/mol mean absolute error on withheld testing sets of reactions but poor performance on external testing sets composed of reactions with differing mechanisms, reaction molecularity, or reactant size distribution. Limited transferability is also shown to be shared by other contemporary graph to activation energy architectures through a series of case studies. We conclude that an array of standard graph architectures can already achieve results comparable to the irreducible error of available reaction data sets but that out-of-distribution performance remains poor.
Collapse
Affiliation(s)
- Sai Mahit Vadaddi
- Davidson School of Chemical Engineering, Purdue University, West Lafayette, Indiana 47906, United States
| | - Qiyuan Zhao
- Department of Medicinal Chemistry, University of Michigan, Ann Arbor, Michigan 48109, United States
| | - Brett M Savoie
- Davidson School of Chemical Engineering, Purdue University, West Lafayette, Indiana 47906, United States
| |
Collapse
|
8
|
Kneiding H, Nova A, Balcells D. Directional multiobjective optimization of metal complexes at the billion-system scale. NATURE COMPUTATIONAL SCIENCE 2024; 4:263-273. [PMID: 38553635 DOI: 10.1038/s43588-024-00616-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/01/2023] [Accepted: 02/29/2024] [Indexed: 04/14/2024]
Abstract
The discovery of transition metal complexes (TMCs) with optimal properties requires large ligand libraries and efficient multiobjective optimization algorithms. Here we provide the tmQMg-L library, containing 30k diverse and synthesizable ligands with robustly assigned charges and metal coordination modes. tmQMg-L enabled the generation of 1.37 million palladium TMCs, which were used to develop and benchmark the Pareto-Lighthouse multiobjective genetic algorithm (PL-MOGA). With fine control over aim and scope, this algorithm maximized both the polarizability and highest occupied molecular orbital-lowest unoccupied molecular orbital gap of the TMCs within selected regions of the Pareto front, without requiring prior knowledge on the objective limits. Instead of genetic operations on small ligand fragments, the PL-MOGA did whole-ligand mutation and crossover operations, which in chemical spaces containing billions of systems, yielded thousands of highly diverse TMCs in an interpretable manner.
Collapse
Affiliation(s)
- Hannes Kneiding
- Hylleraas Centre for Quantum Molecular Sciences, Department of Chemistry, University of Oslo, Oslo, Norway
| | - Ainara Nova
- Hylleraas Centre for Quantum Molecular Sciences, Department of Chemistry, University of Oslo, Oslo, Norway
- Centre for Materials Science and Nanotechnology, Department of Chemistry, University of Oslo, Oslo, Norway
| | - David Balcells
- Hylleraas Centre for Quantum Molecular Sciences, Department of Chemistry, University of Oslo, Oslo, Norway.
| |
Collapse
|
9
|
Okada H, Maeda S. On Accelerating Substrate Optimization Using Computational Gibbs Energy Barriers: A Numerical Consideration Utilizing a Computational Data Set. ACS OMEGA 2024; 9:7123-7131. [PMID: 38371820 PMCID: PMC10870292 DOI: 10.1021/acsomega.3c09066] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/14/2023] [Revised: 01/05/2024] [Accepted: 01/16/2024] [Indexed: 02/20/2024]
Abstract
Substrate optimization is a time- and resource-consuming step in organic synthesis. Recent advances in chemo- and materials-informatics provide systematic and efficient procedures utilizing tools such as Bayesian optimization (BO). This study explores the possibility of reducing the required experiments further by utilizing computational Gibbs energy barriers. To thoroughly validate the impact of using computational Gibbs energy barriers in BO-assisted substrate optimization, this study employs a computational Gibbs energy barrier data set in the literature and performs an extensive numerical investigation virtually regarding the Gibbs energy barriers as virtual experimental results and those with systematic and random noises as virtual computational results. The present numerical investigation shows that even the computational reactivity affected by noises of as much as 20 kJ/mol helps reduce the number of required experiments.
Collapse
Affiliation(s)
- Hiroaki Okada
- Graduate
School of Chemical Sciences and Engineering, Hokkaido University, Sapporo, Hokkaido 060-8628, Japan
| | - Satoshi Maeda
- Department
of Chemistry, Graduate School of Science, Hokkaido University, Sapporo, Hokkaido 060-0810, Japan
- Institute
for Chemical Reaction Design and Discovery (WPI-ICReDD), Hokkaido University, Sapporo, Hokkaido 001-0021, Japan
- ERATO
Maeda Artificial Intelligence for Chemical Reaction Design and Discovery
Project, Hokkaido University, Sapporo, Hokkaido 060-0810, Japan
- Research
and Services Division of Materials Data and Integrated System (MaDIS), National Institute for Materials Science (NIMS), Tsukuba, Ibaraki 305-0044, Japan
| |
Collapse
|
10
|
Kirkland JK, Kumawat J, Shaban Tameh M, Tolman T, Lambert AC, Lief GR, Yang Q, Ess DH. Machine Learning Models for Predicting Zirconocene Properties and Barriers. J Chem Inf Model 2024; 64:775-784. [PMID: 38259142 DOI: 10.1021/acs.jcim.3c01575] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/24/2024]
Abstract
Zr metallocenes have significant potential to be highly tunable polyethylene catalysts through modification of the aromatic ligand framework. Here we report the development of multiple machine learning models using a large library (>700 systems) of DFT-calculated zirconocene properties and barriers for ethylene polymerization. We show that very accurate machine learning models are possible for HOMO-LUMO gaps of precatalysts but the performance significantly depends on the machine learning algorithm and type of featurization, such as fingerprints, Coulomb matrices, smooth overlap of atomic positions, or persistence images. Surprisingly, the description of the bonding hapticity, the number of direct connections between Zr and the ligand aromatic carbons, only has a moderate influence on the performance of most models. Despite robust models for HOMO-LUMO gaps, these types of machine learning models based on structure connectivity type features perform poorly in predicting ethylene migratory insertion barrier heights. Therefore, we developed several relatively robust and accurate machine learning models for barrier heights that are based on quantum-chemical descriptors (QCDs). The quantitative accuracy of these models depends on which potential energy surface structure QCDs were harvested from. This revealed a Hammett-type principle to naturally emerge showing that QCDs from the π-coordination complexes provide much better descriptions of the transition states than other potential-energy structures. Feature importance analysis of the QCDs provides several fundamental principles that influence zirconocene catalyst reactivity.
Collapse
Affiliation(s)
- Justin K Kirkland
- Department of Chemistry and Biochemistry, Brigham Young University, Provo, Utah 84604, United States
| | - Jugal Kumawat
- Department of Chemistry and Biochemistry, Brigham Young University, Provo, Utah 84604, United States
| | - Maliheh Shaban Tameh
- Department of Chemistry and Biochemistry, Brigham Young University, Provo, Utah 84604, United States
| | - Tyson Tolman
- Department of Chemistry and Biochemistry, Brigham Young University, Provo, Utah 84604, United States
| | - Allison C Lambert
- Department of Chemistry and Biochemistry, Brigham Young University, Provo, Utah 84604, United States
| | - Graham R Lief
- Research and Technology, Chevron Phillips Chemical Company, Highways 60 & 123, Bartlesville, Oklahoma 74003, United States
| | - Qing Yang
- Research and Technology, Chevron Phillips Chemical Company, Highways 60 & 123, Bartlesville, Oklahoma 74003, United States
| | - Daniel H Ess
- Department of Chemistry and Biochemistry, Brigham Young University, Provo, Utah 84604, United States
| |
Collapse
|
11
|
Escayola S, Bahri-Laleh N, Poater A. % VBur index and steric maps: from predictive catalysis to machine learning. Chem Soc Rev 2024; 53:853-882. [PMID: 38113051 DOI: 10.1039/d3cs00725a] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2023]
Abstract
Steric indices are parameters used in chemistry to describe the spatial arrangement of atoms or groups of atoms in molecules. They are important in determining the reactivity, stability, and physical properties of chemical compounds. One commonly used steric index is the steric hindrance, which refers to the obstruction or hindrance of movement in a molecule caused by bulky substituents or functional groups. Steric hindrance can affect the reactivity of a molecule by altering the accessibility of its reactive sites and influencing the geometry of its transition states. Notably, the Tolman cone angle and %VBur are prominent among these indices. Actually, steric effects can also be described using the concept of steric bulk, which refers to the space occupied by a molecule or functional group. Steric bulk can affect the solubility, melting point, boiling point, and viscosity of a substance. Even though electronic indices are more widely used, they have certain drawbacks that might shift preferences towards others. They present a higher computational cost, and often, the weight of electronics in correlation with chemical properties, e.g. binding energies, falls short in comparison to %VBur. However, it is worth noting that this may be because the steric index inherently captures part of the electronic content. Overall, steric indices play an important role in understanding the behaviour of chemical compounds and can be used to predict their reactivity, stability, and physical properties. Predictive chemistry is an approach to chemical research that uses computational methods to anticipate the properties and behaviour of these compounds and reactions, facilitating the design of new compounds and reactivities. Within this domain, predictive catalysis specifically targets the prediction of the performance and behaviour of catalysts. Ultimately, the goal is to identify new catalysts with optimal properties, leading to chemical processes that are both more efficient and sustainable. In this framework, %VBur can be a key metric for deepening our understanding of catalysis, emphasizing predictive catalysis and sustainability. Those latter concepts are needed to direct our efforts toward identifying the optimal catalyst for any reaction, minimizing waste, and reducing experimental efforts while maximizing the efficacy of the computational methods.
Collapse
Affiliation(s)
- Sílvia Escayola
- Institut de Química Computacional i Catàlisi and Departament de Química, Universitat de Girona, c/Mª Aurèlia Capmany 69, 17003 Girona, Catalonia, Spain.
- Donostia International Physics Center (DIPC), 20018 Donostia, Euskadi, Spain
| | - Naeimeh Bahri-Laleh
- Iran Polymer and Petrochemical Institute (IPPI), P.O. Box 14965/115, Tehran, Iran
- Institute for Sustainability with Knotted Chiral Meta Matter (WPI-SKCM), Hiroshima University, Hiroshima, 739-8526, Japan
| | - Albert Poater
- Institut de Química Computacional i Catàlisi and Departament de Química, Universitat de Girona, c/Mª Aurèlia Capmany 69, 17003 Girona, Catalonia, Spain.
| |
Collapse
|
12
|
Chen SS, Meyer Z, Jensen B, Kraus A, Lambert A, Ess DH. ReaLigands: A Ligand Library Cultivated from Experiment and Intended for Molecular Computational Catalyst Design. J Chem Inf Model 2023; 63:7412-7422. [PMID: 37987743 DOI: 10.1021/acs.jcim.3c01310] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2023]
Abstract
Computational catalyst design requires identification of a metal and ligand that together result in the desired reaction reactivity and/or selectivity. A major impediment to translating computational designs to experiments is evaluating ligands that are likely to be synthesized. Here, we provide a solution to this impediment with our ReaLigands library that contains >30,000 monodentate, bidentate (didentate), tridentate, and larger ligands cultivated by dismantling experimentally reported crystal structures. Individual ligands from mononuclear crystal structures were identified using a modified depth-first search algorithm and charge was assigned using a machine learning model based on quantum-chemical calculated features. In the library, ligands are sorted based on direct ligand-to-metal atomic connections and on denticity. Representative principal component analysis (PCA) and uniform manifold approximation and projection (UMAP) analyses were used to analyze several tridentate ligand categories, which revealed both the diversity of ligands and connections between ligand categories. We also demonstrated the utility of this library by implementing it with our building and optimization tools, which resulted in the very rapid generation of barriers for 750 bidentate ligands for Rh-hydride ethylene migratory insertion.
Collapse
Affiliation(s)
- Shu-Sen Chen
- Department of Chemistry and Biochemistry, Brigham Young University, Provo, Utah 84604 United States
| | - Zack Meyer
- Department of Chemistry and Biochemistry, Brigham Young University, Provo, Utah 84604 United States
| | - Brendan Jensen
- Department of Chemistry and Biochemistry, Brigham Young University, Provo, Utah 84604 United States
| | - Alex Kraus
- Department of Chemistry and Biochemistry, Brigham Young University, Provo, Utah 84604 United States
| | - Allison Lambert
- Department of Chemistry and Biochemistry, Brigham Young University, Provo, Utah 84604 United States
| | - Daniel H Ess
- Department of Chemistry and Biochemistry, Brigham Young University, Provo, Utah 84604 United States
| |
Collapse
|
13
|
Kevlishvili I, Duan C, Kulik HJ. Classification of Hemilabile Ligands Using Machine Learning. J Phys Chem Lett 2023:11100-11109. [PMID: 38051982 DOI: 10.1021/acs.jpclett.3c02828] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/07/2023]
Abstract
Hemilabile ligands have the capacity to partially disengage from a metal center, providing a strategy to balance stability and reactivity in catalysis, but they are not straightforward to identify. We identify ligands in the Cambridge Structural Database that have been crystallized with distinct denticities and are thus identifiable as hemilabile ligands. We implement a semi-supervised learning approach using a label-spreading algorithm to augment a small negative set that is supported by heuristic rules of ligand and metal co-occurrence. We show that a heuristic based on coordinating atom identity alone is not sufficient to identify whether a ligand is hemilabile, and our trained machine-learning classification models are instead needed to predict whether a bi-, tri-, or tetradentate ligand is hemilabile with high accuracy and precision. Feature importance analysis of our models shows that the second, third, and fourth coordination spheres all play important roles in ligand hemilability.
Collapse
Affiliation(s)
- Ilia Kevlishvili
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Chenru Duan
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
- Department of Chemistry, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Heather J Kulik
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
- Department of Chemistry, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| |
Collapse
|
14
|
Cheng J, Li T, Wang Y, Ati AH, Sun Q. The relationship between activated H2 bond length and adsorption distance on MXenes identified with graph neural network and resonating valence bond theory. J Chem Phys 2023; 159:191101. [PMID: 37965996 DOI: 10.1063/5.0169430] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2023] [Accepted: 09/14/2023] [Indexed: 11/16/2023] Open
Abstract
Motivated by the recent experimental study on hydrogen storage in MXene multilayers [Liu et al., Nat. Nanotechnol. 16, 331 (2021)], for the first time we propose a workflow to computationally screen 23 857 compounds of MXene to explore the general relation between the activated H2 bond length and adsorption distance. By using density functional theory we generate a dataset to investigate the adsorption geometries of hydrogen on MXenes, based on which we train physics-informed atomistic line graph neural networks (ALIGNNs) to predict adsorption parameters. To fit the results, we further derived a formula that quantitatively reproduces the dependence of H2 bond length on the adsorption distance from MXenes within the framework of Pauling's resonating valence bond theory, revealing the impact of transition metal's ligancy and valence on activating dihydrogen in H2 storage.
Collapse
Affiliation(s)
- Jiewei Cheng
- School of Materials Science and Engineering, Peking University, Beijing 100871, China
| | - Tingwei Li
- School of Materials Science and Engineering, Peking University, Beijing 100871, China
| | - Yongyi Wang
- College of Engineering, Peking University, Beijing 100871, China
| | - Ahmed H Ati
- School of Materials Science and Engineering, Peking University, Beijing 100871, China
| | - Qiang Sun
- School of Materials Science and Engineering, Peking University, Beijing 100871, China
- Center for Applied Physics and Technology, Peking University, Beijing 100871, China
| |
Collapse
|
15
|
Casetti N, Alfonso-Ramos JE, Coley CW, Stuyver T. Combining Molecular Quantum Mechanical Modeling and Machine Learning for Accelerated Reaction Screening and Discovery. Chemistry 2023; 29:e202301957. [PMID: 37526059 DOI: 10.1002/chem.202301957] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2023] [Revised: 07/30/2023] [Accepted: 07/31/2023] [Indexed: 08/02/2023]
Abstract
Molecular quantum mechanical modeling, accelerated by machine learning, has opened the door to high-throughput screening campaigns of complex properties, such as the activation energies of chemical reactions and absorption/emission spectra of materials and molecules; in silico. Here, we present an overview of the main principles, concepts, and design considerations involved in such hybrid computational quantum chemistry/machine learning screening workflows, with a special emphasis on some recent examples of their successful application. We end with a brief outlook of further advances that will benefit the field.
Collapse
Affiliation(s)
- Nicholas Casetti
- Department of Chemical Engineering, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, Massachusetts, 02139, United States
| | - Javier E Alfonso-Ramos
- Ecole Nationale Supérieure de Chimie de Paris, Université PSL, CNRS, Institute of Chemistry for Life and Health Sciences, 75005, Paris, France
| | - Connor W Coley
- Department of Chemical Engineering, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, Massachusetts, 02139, United States
- Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, Massachusetts, 02139, United States
| | - Thijs Stuyver
- Ecole Nationale Supérieure de Chimie de Paris, Université PSL, CNRS, Institute of Chemistry for Life and Health Sciences, 75005, Paris, France
| |
Collapse
|
16
|
Lewis-Atwell T, Beechey D, Şimşek Ö, Grayson MN. Reformulating Reactivity Design for Data-Efficient Machine Learning. ACS Catal 2023; 13:13506-13515. [PMID: 37881791 PMCID: PMC10594582 DOI: 10.1021/acscatal.3c02513] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2023] [Revised: 08/24/2023] [Indexed: 10/27/2023]
Abstract
Machine learning (ML) can deliver rapid and accurate reaction barrier predictions for use in rational reactivity design. However, model training requires large data sets of typically thousands or tens of thousands of barriers that are very expensive to obtain computationally or experimentally. Furthermore, bespoke data sets are required for each region of interest in reaction space as models typically struggle to generalize. We have therefore reformulated the ML barrier prediction problem toward a much more data-efficient process: finding a reaction from a prespecified set with a desired target value. Our reformulation enables the rapid selection of reactions with purpose-specific activation barriers, for example, in the design of reactivity and selectivity in synthesis, catalyst design, toxicology, and covalent drug discovery, requiring just tens of accurately measured barriers. Importantly, our reformulation does not require generalization beyond the domain of the data set at hand, and we show excellent results for the highly toxicologically and synthetically relevant data sets of aza-Michael addition and transition-metal-catalyzed dihydrogen activation, typically requiring less than 20 accurately measured density functional theory (DFT) barriers. Even for incomplete data sets of E2 and SN2 reactions, with high numbers of missing barriers (74% and 56% respectively), our chosen ML search method still requires significantly fewer data points than the hundreds or thousands needed for more conventional uses of ML to predict activation barriers. Finally, we include a case study in which we use our process to guide the optimization of the dihydrogen activation catalyst. Our approach was able to identify a reaction within 1 kcal mol-1 of the target barrier by only having to run 12 DFT reaction barrier calculations, which illustrates the usage and real-world applicability of this reformulation for systems of high synthetic importance.
Collapse
Affiliation(s)
- Toby Lewis-Atwell
- Department
of Chemistry, University of Bath, Claverton Down, Bath BA2
7AY, U.K.
- Department
of Computer Science, University of Bath, Claverton Down, Bath BA2
7AY, U.K.
| | - Daniel Beechey
- Department
of Computer Science, University of Bath, Claverton Down, Bath BA2
7AY, U.K.
| | - Özgür Şimşek
- Department
of Computer Science, University of Bath, Claverton Down, Bath BA2
7AY, U.K.
| | - Matthew N. Grayson
- Department
of Chemistry, University of Bath, Claverton Down, Bath BA2
7AY, U.K.
| |
Collapse
|
17
|
Hashemi A, Bougueroua S, Gaigeot MP, Pidko EA. HiREX: High-Throughput Reactivity Exploration for Extended Databases of Transition-Metal Catalysts. J Chem Inf Model 2023; 63:6081-6094. [PMID: 37738303 PMCID: PMC10565810 DOI: 10.1021/acs.jcim.3c00660] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2023] [Indexed: 09/24/2023]
Abstract
A method is introduced for the automated analysis of reactivity exploration for extended in silico databases of transition-metal catalysts. The proposed workflow is designed to tackle two key challenges for bias-free mechanistic explorations on large databases of catalysts: (1) automated exploration of the chemical space around each catalyst with unique structural and chemical features and (2) automated analysis of the resulting large chemical data sets. To address these challenges, we have extended the application of our previously developed ReNeGate method for bias-free reactivity exploration and implemented an automated analysis procedure to identify the classes of reactivity patterns within specific catalyst groups. Our procedure applied to an extended series of representative Mn(I) pincer complexes revealed correlations between structural and reactive features, pointing to new channels for catalyst transformation under the reaction conditions. Such an automated high-throughput virtual screening of systematically generated hypothetical catalyst data sets opens new opportunities for the design of high-performance catalysts as well as an accelerated method for expert bias-free high-throughput in silico reactivity exploration.
Collapse
Affiliation(s)
- Ali Hashemi
- Inorganic
Systems Engineering, Department of Chemical Engineering, Faculty of
Applied Sciences, Delft University of Technology, Van der Maasweg 9, Delft 2629 HZ, The Netherlands
| | - Sana Bougueroua
- Laboratoire
Analyse et Modélisation pour la Biologie et l’Environnement
(LAMBE) UMR8587, Paris-Saclay, Univ Evry,
CY Cergy Paris Université, CNRS, LAMBE UMR8587, Evry-Courcouronnes 91025, France
| | - Marie-Pierre Gaigeot
- Laboratoire
Analyse et Modélisation pour la Biologie et l’Environnement
(LAMBE) UMR8587, Paris-Saclay, Univ Evry,
CY Cergy Paris Université, CNRS, LAMBE UMR8587, Evry-Courcouronnes 91025, France
| | - Evgeny A. Pidko
- Inorganic
Systems Engineering, Department of Chemical Engineering, Faculty of
Applied Sciences, Delft University of Technology, Van der Maasweg 9, Delft 2629 HZ, The Netherlands
| |
Collapse
|
18
|
Karl TM, Bouayad-Gervais S, Hueffel JA, Sperger T, Wellig S, Kaldas SJ, Dabranskaya U, Ward JS, Rissanen K, Tizzard GJ, Schoenebeck F. Machine Learning-Guided Development of Trialkylphosphine Ni (I) Dimers and Applications in Site-Selective Catalysis. J Am Chem Soc 2023. [PMID: 37411044 DOI: 10.1021/jacs.3c03403] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/08/2023]
Abstract
Owing to the unknown correlation of a metal's ligand and its resulting preferred speciation in terms of oxidation state, geometry, and nuclearity, a rational design of multinuclear catalysts remains challenging. With the goal to accelerate the identification of suitable ligands that form trialkylphosphine-derived dihalogen-bridged Ni(I) dimers, we herein employed an assumption-based machine learning approach. The workflow offers guidance in ligand space for a desired speciation without (or only minimal) prior experimental data points. We experimentally verified the predictions and synthesized numerous novel Ni(I) dimers as well as explored their potential in catalysis. We demonstrate C-I selective arylations of polyhalogenated arenes bearing competing C-Br and C-Cl sites in under 5 min at room temperature using 0.2 mol % of the newly developed dimer, [Ni(I)(μ-Br)PAd2(n-Bu)]2, which is so far unmet with alternative dinuclear or mononuclear Ni or Pd catalysts.
Collapse
Affiliation(s)
- Teresa M Karl
- Institute of Organic Chemistry, RWTH Aachen University, Landoltweg 1, 52074 Aachen, Germany
| | - Samir Bouayad-Gervais
- Institute of Organic Chemistry, RWTH Aachen University, Landoltweg 1, 52074 Aachen, Germany
| | - Julian A Hueffel
- Institute of Organic Chemistry, RWTH Aachen University, Landoltweg 1, 52074 Aachen, Germany
| | - Theresa Sperger
- Institute of Organic Chemistry, RWTH Aachen University, Landoltweg 1, 52074 Aachen, Germany
| | - Sebastian Wellig
- Institute of Organic Chemistry, RWTH Aachen University, Landoltweg 1, 52074 Aachen, Germany
| | - Sherif J Kaldas
- Institute of Organic Chemistry, RWTH Aachen University, Landoltweg 1, 52074 Aachen, Germany
| | | | - Jas S Ward
- Department of Chemistry, University of Jyvaskyla, FIN40014 Jyväskylä, Finland
| | - Kari Rissanen
- Department of Chemistry, University of Jyvaskyla, FIN40014 Jyväskylä, Finland
| | - Graham J Tizzard
- UK National Crystallography Service, School of Chemistry, University of Southampton, SO17 1BJ Southhampton, U.K
| | - Franziska Schoenebeck
- Institute of Organic Chemistry, RWTH Aachen University, Landoltweg 1, 52074 Aachen, Germany
| |
Collapse
|
19
|
Li SW, Xu LC, Zhang C, Zhang SQ, Hong X. Reaction performance prediction with an extrapolative and interpretable graph model based on chemical knowledge. Nat Commun 2023; 14:3569. [PMID: 37322041 DOI: 10.1038/s41467-023-39283-x] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2022] [Accepted: 05/31/2023] [Indexed: 06/17/2023] Open
Abstract
Accurate prediction of reactivity and selectivity provides the desired guideline for synthetic development. Due to the high-dimensional relationship between molecular structure and synthetic function, it is challenging to achieve the predictive modelling of synthetic transformation with the required extrapolative ability and chemical interpretability. To meet the gap between the rich domain knowledge of chemistry and the advanced molecular graph model, herein we report a knowledge-based graph model that embeds the digitalized steric and electronic information. In addition, a molecular interaction module is developed to enable the learning of the synergistic influence of reaction components. In this study, we demonstrate that this knowledge-based graph model achieves excellent predictions of reaction yield and stereoselectivity, whose extrapolative ability is corroborated by additional scaffold-based data splittings and experimental verifications with new catalysts. Because of the embedding of local environment, the model allows the atomic level of interpretation of the steric and electronic influence on the overall synthetic performance, which serves as a useful guide for the molecular engineering towards the target synthetic function. This model offers an extrapolative and interpretable approach for reaction performance prediction, pointing out the importance of chemical knowledge-constrained reaction modelling for synthetic purpose.
Collapse
Affiliation(s)
- Shu-Wen Li
- Center of Chemistry for Frontier Technologies, Department of Chemistry, State Key Laboratory of Clean Energy Utilization, Zhejiang University, Hangzhou, 310027, China
| | - Li-Cheng Xu
- Center of Chemistry for Frontier Technologies, Department of Chemistry, State Key Laboratory of Clean Energy Utilization, Zhejiang University, Hangzhou, 310027, China
| | - Cheng Zhang
- Department of Chemistry, University of Science and Technology of China, Hefei, China
| | - Shuo-Qing Zhang
- Center of Chemistry for Frontier Technologies, Department of Chemistry, State Key Laboratory of Clean Energy Utilization, Zhejiang University, Hangzhou, 310027, China.
| | - Xin Hong
- Center of Chemistry for Frontier Technologies, Department of Chemistry, State Key Laboratory of Clean Energy Utilization, Zhejiang University, Hangzhou, 310027, China.
- Beijing National Laboratory for Molecular Sciences, Zhongguancun North First Street No. 2, Beijing, 100190, PR China.
- Key Laboratory of Precise Synthesis of Functional Molecules of Zhejiang Province, School of Science, Westlake University, 18 Shilongshan Road, Hangzhou, 310024, Zhejiang Province, China.
| |
Collapse
|
20
|
Schilter O, Vaucher A, Schwaller P, Laino T. Designing catalysts with deep generative models and computational data. A case study for Suzuki cross coupling reactions. DIGITAL DISCOVERY 2023; 2:728-735. [PMID: 37312682 PMCID: PMC10259369 DOI: 10.1039/d2dd00125j] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/11/2022] [Accepted: 02/22/2023] [Indexed: 06/15/2023]
Abstract
The need for more efficient catalytic processes is ever-growing, and so are the costs associated with experimentally searching chemical space to find new promising catalysts. Despite the consolidated use of density functional theory (DFT) and other atomistic models for virtually screening molecules based on their simulated performance, data-driven approaches are rising as indispensable tools for designing and improving catalytic processes. Here, we present a deep learning model capable of generating new catalyst-ligand candidates by self-learning meaningful structural features solely from their language representation and computed binding energies. We train a recurrent neural network-based Variational Autoencoder (VAE) to compress the molecular representation of the catalyst into a lower dimensional latent space, in which a feed-forward neural network predicts the corresponding binding energy to be used as the optimization function. The outcome of the optimization in the latent space is then reconstructed back into the original molecular representation. These trained models achieve state-of-the-art predictive performances in catalysts' binding energy prediction and catalysts' design, with a mean absolute error of 2.42 kcal mol-1 and an ability to generate 84% valid and novel catalysts.
Collapse
Affiliation(s)
- Oliver Schilter
- IBM Research Europe Säumerstrasse 4 8803 Rüschlikon Switzerland
- National Center for Competence in Research-Catalysis (NCCR-Catalysis) Switzerland
| | - Alain Vaucher
- IBM Research Europe Säumerstrasse 4 8803 Rüschlikon Switzerland
- National Center for Competence in Research-Catalysis (NCCR-Catalysis) Switzerland
| | - Philippe Schwaller
- National Center for Competence in Research-Catalysis (NCCR-Catalysis) Switzerland
| | - Teodoro Laino
- IBM Research Europe Säumerstrasse 4 8803 Rüschlikon Switzerland
- National Center for Competence in Research-Catalysis (NCCR-Catalysis) Switzerland
| |
Collapse
|
21
|
Yang X, Bhowmik A, Vegge T, Hansen HA. Neural network potentials for accelerated metadynamics of oxygen reduction kinetics at Au-water interfaces. Chem Sci 2023; 14:3913-3922. [PMID: 37035698 PMCID: PMC10074416 DOI: 10.1039/d2sc06696c] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2022] [Accepted: 03/09/2023] [Indexed: 03/16/2023] Open
Abstract
The application of ab initio molecular dynamics (AIMD) for the explicit modeling of reactions at solid-liquid interfaces in electrochemical energy conversion systems like batteries and fuel cells can provide new understandings towards reaction mechanisms. However, its prohibitive computational cost severely restricts the time- and length-scales of AIMD. Equivariant graph neural network (GNN) based accurate surrogate potentials can accelerate the speed of performing molecular dynamics after learning on representative structures in a data efficient manner. In this study, we combined uncertainty-aware GNN potentials and enhanced sampling to investigate the reactive process of the oxygen reduction reaction (ORR) at an Au(100)-water interface. By using a well-established active learning framework based on CUR matrix decomposition, we can evenly sample equilibrium structures from MD simulations and non-equilibrium reaction intermediates that are rarely visited during the reaction. The trained GNNs have shown exceptional performance in terms of force prediction accuracy, the ability to reproduce structural properties, and low uncertainties when performing MD and metadynamics simulations. Furthermore, the collective variables employed in this work enabled the automatic search of reaction pathways and provide a detailed understanding towards the ORR reaction mechanism on Au(100). Our simulations identified the associative reaction mechanism without the presence of *O and a low reaction barrier of 0.3 eV, which is in agreement with experimental findings. The methodology employed in this study can pave the way for modeling complex chemical reactions at electrochemical interfaces with an explicit solvent under ambient conditions.
Collapse
Affiliation(s)
- Xin Yang
- Department of Energy Conversion and Storage, Technical University of Denmark Anker Engelunds Vej, 2800 Kgs Lyngby Denmark
| | - Arghya Bhowmik
- Department of Energy Conversion and Storage, Technical University of Denmark Anker Engelunds Vej, 2800 Kgs Lyngby Denmark
| | - Tejs Vegge
- Department of Energy Conversion and Storage, Technical University of Denmark Anker Engelunds Vej, 2800 Kgs Lyngby Denmark
| | - Heine Anton Hansen
- Department of Energy Conversion and Storage, Technical University of Denmark Anker Engelunds Vej, 2800 Kgs Lyngby Denmark
| |
Collapse
|
22
|
García-Andrade X, García Tahoces P, Pérez-Ríos J, Martínez Núñez E. Barrier Height Prediction by Machine Learning Correction of Semiempirical Calculations. J Phys Chem A 2023; 127:2274-2283. [PMID: 36877614 PMCID: PMC10845151 DOI: 10.1021/acs.jpca.2c08340] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2022] [Revised: 02/19/2023] [Indexed: 03/07/2023]
Abstract
Different machine learning (ML) models are proposed in the present work to predict density functional theory-quality barrier heights (BHs) from semiempirical quantum mechanical (SQM) calculations. The ML models include a multitask deep neural network, gradient-boosted trees by means of the XGBoost interface, and Gaussian process regression. The obtained mean absolute errors are similar to those of previous models considering the same number of data points. The ML corrections proposed in this paper could be useful for rapid screening of the large reaction networks that appear in combustion chemistry or in astrochemistry. Finally, our results show that 70% of the features with the highest impact on model output are bespoke predictors. This custom-made set of predictors could be employed by future Δ-ML models to improve the quantitative prediction of other reaction properties.
Collapse
Affiliation(s)
| | - Pablo García Tahoces
- Department
of Electronics and Computer Science, University
of Santiago de Compostela, Santiago de Compostela 15782, Spain
| | - Jesús Pérez-Ríos
- Department
of Physics, Stony Brook University, Stony Brook, New York 11794, United States
- Institute
for Advanced Computational Science, Stony
Brook University, Stony
Brook, New York 11794-3800, United States
| | - Emilio Martínez Núñez
- Department
of Physical Chemistry, University of Santiago
de Compostela, Santiago
de Compostela 15782, Spain
| |
Collapse
|
23
|
Chen Y, Ou Y, Zheng P, Huang Y, Ge F, Dral PO. Benchmark of general-purpose machine learning-based quantum mechanical method AIQM1 on reaction barrier heights. J Chem Phys 2023; 158:074103. [PMID: 36813722 DOI: 10.1063/5.0137101] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023] Open
Abstract
Artificial intelligence-enhanced quantum mechanical method 1 (AIQM1) is a general-purpose method that was shown to achieve high accuracy for many applications with a speed close to its baseline semiempirical quantum mechanical (SQM) method ODM2*. Here, we evaluate the hitherto unknown performance of out-of-the-box AIQM1 without any refitting for reaction barrier heights on eight datasets, including a total of ∼24 thousand reactions. This evaluation shows that AIQM1's accuracy strongly depends on the type of transition state and ranges from excellent for rotation barriers to poor for, e.g., pericyclic reactions. AIQM1 clearly outperforms its baseline ODM2* method and, even more so, a popular universal potential, ANI-1ccx. Overall, however, AIQM1 accuracy largely remains similar to SQM methods (and B3LYP/6-31G* for most reaction types) suggesting that it is desirable to focus on improving AIQM1 performance for barrier heights in the future. We also show that the built-in uncertainty quantification helps in identifying confident predictions. The accuracy of confident AIQM1 predictions is approaching the level of popular density functional theory methods for most reaction types. Encouragingly, AIQM1 is rather robust for transition state optimizations, even for the type of reactions it struggles with the most. Single-point calculations with high-level methods on AIQM1-optimized geometries can be used to significantly improve barrier heights, which cannot be said for its baseline ODM2* method.
Collapse
Affiliation(s)
- Yuxinxin Chen
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, Department of Chemistry, and College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, China
| | - Yanchi Ou
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, Department of Chemistry, and College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, China
| | - Peikun Zheng
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, Department of Chemistry, and College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, China
| | - Yaohuang Huang
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, Department of Chemistry, and College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, China
| | - Fuchun Ge
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, Department of Chemistry, and College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, China
| | - Pavlo O Dral
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, Department of Chemistry, and College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, China
| |
Collapse
|
24
|
Terrones GG, Duan C, Nandy A, Kulik HJ. Low-cost machine learning prediction of excited state properties of iridium-centered phosphors. Chem Sci 2023; 14:1419-1433. [PMID: 36794185 PMCID: PMC9906783 DOI: 10.1039/d2sc06150c] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2022] [Accepted: 01/05/2023] [Indexed: 01/07/2023] Open
Abstract
Prediction of the excited state properties of photoactive iridium complexes challenges ab initio methods such as time-dependent density functional theory (TDDFT) both from the perspective of accuracy and of computational cost, complicating high-throughput virtual screening (HTVS). We instead leverage low-cost machine learning (ML) models and experimental data for 1380 iridium complexes to perform these prediction tasks. We find the best-performing and most transferable models to be those trained on electronic structure features from low-cost density functional tight binding calculations. Using artificial neural network (ANN) models, we predict the mean emission energy of phosphorescence, the excited state lifetime, and the emission spectral integral for iridium complexes with accuracy competitive with or superseding that of TDDFT. We conduct feature importance analysis to determine that high cyclometalating ligand ionization potential correlates to high mean emission energy, while high ancillary ligand ionization potential correlates to low lifetime and low spectral integral. As a demonstration of how our ML models can be used for HTVS and the acceleration of chemical discovery, we curate a set of novel hypothetical iridium complexes and use uncertainty-controlled predictions to identify promising ligands for the design of new phosphors while retaining confidence in the quality of the ANN predictions.
Collapse
Affiliation(s)
- Gianmarco G Terrones
- Department of Chemical Engineering, Massachusetts Institute of Technology Cambridge MA 02139 USA
| | - Chenru Duan
- Department of Chemical Engineering, Massachusetts Institute of Technology Cambridge MA 02139 USA
- Department of Chemistry, Massachusetts Institute of Technology Cambridge MA 02139 USA
| | - Aditya Nandy
- Department of Chemical Engineering, Massachusetts Institute of Technology Cambridge MA 02139 USA
- Department of Chemistry, Massachusetts Institute of Technology Cambridge MA 02139 USA
| | - Heather J Kulik
- Department of Chemical Engineering, Massachusetts Institute of Technology Cambridge MA 02139 USA
- Department of Chemistry, Massachusetts Institute of Technology Cambridge MA 02139 USA
| |
Collapse
|
25
|
Stuyver T, Jorner K, Coley CW. Reaction profiles for quantum chemistry-computed [3 + 2] cycloaddition reactions. Sci Data 2023; 10:66. [PMID: 36725850 PMCID: PMC9892576 DOI: 10.1038/s41597-023-01977-8] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2022] [Accepted: 01/18/2023] [Indexed: 02/03/2023] Open
Abstract
Bio-orthogonal click chemistry based on [3 + 2] dipolar cycloadditions has had a profound impact on the field of biochemistry and significant effort has been devoted to identify promising new candidate reactions for this purpose. To gauge whether a prospective reaction could be a suitable bio-orthogonal click reaction, information about both on- and off-target activation and reaction energies is highly valuable. Here, we use an automated workflow, based on the autodE program, to compute over 5000 reaction profiles for [3 + 2] cycloadditions involving both synthetic dipolarophiles and a set of biologically-inspired structural motifs. Based on a succinct benchmarking study, the B3LYP-D3(BJ)/def2-TZVP//B3LYP-D3(BJ)/def2-SVP level of theory was selected for the DFT calculations, and standard conditions and an (aqueous) SMD model were imposed to mimic physiological conditions. We believe that this data, as well as the presented workflow for high-throughput reaction profile computation, will be useful to screen for new bio-orthogonal reactions, as well as for the development of novel machine learning models for the prediction of chemical reactivity more broadly.
Collapse
Affiliation(s)
- Thijs Stuyver
- grid.116068.80000 0001 2341 2786Department of Chemical Engineering, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, Massachusetts 02139 USA
| | - Kjell Jorner
- grid.17063.330000 0001 2157 2938Department of Computer Science, University of Toronto, 40 St George St, Toronto, Ontario M5S 2E4 Canada ,grid.17063.330000 0001 2157 2938Department of Chemistry, Chemical Physics Theory Group, 80 St. George St., University of Toronto, Ontario, M5S 3H6 Canada ,grid.5371.00000 0001 0775 6028Department of Chemistry and Chemical Engineering, Chalmers University of Technology, Kemigården 4, SE-41258 Gothenburg, Sweden
| | - Connor W. Coley
- grid.116068.80000 0001 2341 2786Department of Chemical Engineering, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, Massachusetts 02139 USA ,grid.116068.80000 0001 2341 2786Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, Massachusetts 02139 USA
| |
Collapse
|
26
|
Gugler S, Reiher M. Quantum Chemical Roots of Machine-Learning Molecular Similarity Descriptors. J Chem Theory Comput 2022; 18:6670-6689. [PMID: 36218328 DOI: 10.1021/acs.jctc.2c00718] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
In this work, we explore the quantum chemical foundations of descriptors for molecular similarity. Such descriptors are key for traversing chemical compound space with machine learning. Our focus is on the Coulomb matrix and on the smooth overlap of atomic positions (SOAP). We adopt a basic framework that allows us to connect both descriptors to electronic structure theory. This framework enables us to then define two new descriptors that are more closely related to electronic structure theory, which we call Coulomb lists and smooth overlap of electron densities (SOED). By investigating their usefulness as molecular similarity descriptors, we gain new insights into how and why Coulomb matrix and SOAP work. Moreover, Coulomb lists avoid the somewhat mysterious diagonalization step of the Coulomb matrix and might provide a direct means to extract subsystem information that can be compared across Born-Oppenheimer surfaces of varying dimension. For the electron density, we derive the necessary formalism to create the SOED measure in close analogy to SOAP. Because this formalism is more involved than that of SOAP, we review the essential theory as well as introduce a set of approximations that eventually allow us to work with SOED in terms of the same implementation available for the evaluation of SOAP. We focus our analysis on elementary reaction steps, where transition state structures are more similar to either reactant or product structures than the latter two are with respect to one another. The prediction of electronic energies of transition state structures can, however, be more difficult than that of stable intermediates due to multi-configurational effects. The question arises to what extent molecular similarity descriptors rooted in electronic structure theory can resolve these intricate effects.
Collapse
Affiliation(s)
- Stefan Gugler
- Laboratorium für Physikalische Chemie, ETH Zürich, Vladimir-Prelog-Weg 2, 8093 Zürich, Switzerland
| | - Markus Reiher
- Laboratorium für Physikalische Chemie, ETH Zürich, Vladimir-Prelog-Weg 2, 8093 Zürich, Switzerland
| |
Collapse
|
27
|
Miller E, Mai BK, Read JA, Bell WC, Derrick JS, Liu P, Toste FD. A Combined DFT, Energy Decomposition, and Data Analysis Approach to Investigate the Relationship Between Noncovalent Interactions and Selectivity in a Flexible DABCOnium/Chiral Anion Catalyst System. ACS Catal 2022; 12:12369-12385. [PMID: 37215160 PMCID: PMC10195112 DOI: 10.1021/acscatal.2c03077] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Developing strategies to study reactivity and selectivity in flexible catalyst systems has become an important topic of research. Herein, we report a combined experimental and computational study aimed at understanding the mechanistic role of an achiral DABCOnium cofactor in a regio- and enantiodivergent bromocyclization reaction. It was found that electron-deficient aryl substituents enable rigidified transition states via an anion-π interaction with the catalyst, which drives the selectivity of the reaction. In contrast, electron-rich aryl groups on the DABCOnium result in significantly more flexible transition states, where interactions between the catalyst and substrate are more important. An analysis of not only the lowest-energy transition state structures but also an ensemble of low-energy transition state conformers via energy decomposition analysis and machine learning was crucial to revealing the dominant noncovalent interactions responsible for observed changes in selectivity in this flexible system.
Collapse
Affiliation(s)
- Edward Miller
- Department of Chemistry, University of California, Berkeley, California 94720, United States
| | - Binh Khanh Mai
- Department of Chemistry, University of Pittsburgh, Pittsburgh, Pennsylvania 15260, United States
| | - Jacquelyne A Read
- Department of Chemistry, University of Utah, Salt Lake City, Utah 84112, United States
| | - William C Bell
- Department of Chemistry, University of Pittsburgh, Pittsburgh, Pennsylvania 15260, United States
| | - Jeffrey S Derrick
- Department of Chemistry, University of California, Berkeley, California 94720, United States
| | - Peng Liu
- Department of Chemistry, University of Pittsburgh, Pittsburgh, Pennsylvania 15260, United States
| | - F Dean Toste
- Department of Chemistry, University of California, Berkeley, California 94720, United States
| |
Collapse
|
28
|
Darù A, Martín-Fernández C, Harvey JN. Iron-Catalyzed Kumada Cross-Coupling Reaction Involving Fe 8Me 12– and Related Clusters: A Computational Study. ACS Catal 2022. [DOI: 10.1021/acscatal.2c03436] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
- Andrea Darù
- Department of Chemistry, Scripps Research, La Jolla, California92037, United States
| | | | - Jeremy N. Harvey
- Department of Chemistry, KU Leuven, Celestijnenlaan 200F, LeuvenB-3001, Belgium
| |
Collapse
|
29
|
Tomasini M, Zhang J, Zhao H, Besalú E, Falivene L, Caporaso L, Szostak M, Poater A. A predictive journey towards trans-thioamides/amides. Chem Commun (Camb) 2022; 58:9950-9953. [PMID: 35983851 DOI: 10.1039/d2cc04228b] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
The cis-trans isomerization of (thio)amides was studied by DFT calculations to get the model for the higher preference for the cis conformation by guided predictive chemistry, suggesting how to select the alkyl/aryl substituents on the C/N atoms that lead to the trans isomer. Multilinear analysis, together with cross-validation analysis, helped to select the best fitting parameters to achieve the energy barriers of the cis to trans interconversion, as well as the relative stability between both isomers. Double experimental check led to the synthesis of the best trans candidate with sterically demanding t-butyl substituents, confirming the utility of predictive chemistry, bridging organic and computational chemistry.
Collapse
Affiliation(s)
- Michele Tomasini
- Institut de Química Computacional i Catàlisi and Departament de Química, Universitat de Girona, C/Maria Aurèlia Capmany 69, 17003, Girona, Catalonia, Spain. .,Dipartimento di Chimica e Biologia, Università di Salerno, Via Ponte don Melillo, 84084, Fisciano, Italy
| | - Jin Zhang
- College of Chemistry and Chemical Engineering, Key Laboratory of Chemical Additives for China National Light Industry, Shaanxi University of Science and Technology, 6 Xuefu Road, Xi'an, 710021, China
| | - Hui Zhao
- College of Chemistry and Chemical Engineering, Key Laboratory of Chemical Additives for China National Light Industry, Shaanxi University of Science and Technology, 6 Xuefu Road, Xi'an, 710021, China
| | - Emili Besalú
- Institut de Química Computacional i Catàlisi and Departament de Química, Universitat de Girona, C/Maria Aurèlia Capmany 69, 17003, Girona, Catalonia, Spain.
| | - Laura Falivene
- Dipartimento di Chimica e Biologia, Università di Salerno, Via Ponte don Melillo, 84084, Fisciano, Italy
| | - Lucia Caporaso
- Dipartimento di Chimica e Biologia, Università di Salerno, Via Ponte don Melillo, 84084, Fisciano, Italy
| | - Michal Szostak
- Department of Chemistry, Rutgers University, 73 Warren Street, Newark, NJ, 07102, USA
| | - Albert Poater
- Institut de Química Computacional i Catàlisi and Departament de Química, Universitat de Girona, C/Maria Aurèlia Capmany 69, 17003, Girona, Catalonia, Spain.
| |
Collapse
|
30
|
Wang Z, Sun Z, Yin H, Liu X, Wang J, Zhao H, Pang CH, Wu T, Li S, Yin Z, Yu XF. Data-Driven Materials Innovation and Applications. ADVANCED MATERIALS (DEERFIELD BEACH, FLA.) 2022; 34:e2104113. [PMID: 35451528 DOI: 10.1002/adma.202104113] [Citation(s) in RCA: 24] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/30/2021] [Revised: 03/19/2022] [Indexed: 05/07/2023]
Abstract
Owing to the rapid developments to improve the accuracy and efficiency of both experimental and computational investigative methodologies, the massive amounts of data generated have led the field of materials science into the fourth paradigm of data-driven scientific research. This transition requires the development of authoritative and up-to-date frameworks for data-driven approaches for material innovation. A critical discussion on the current advances in the data-driven discovery of materials with a focus on frameworks, machine-learning algorithms, material-specific databases, descriptors, and targeted applications in the field of inorganic materials is presented. Frameworks for rationalizing data-driven material innovation are described, and a critical review of essential subdisciplines is presented, including: i) advanced data-intensive strategies and machine-learning algorithms; ii) material databases and related tools and platforms for data generation and management; iii) commonly used molecular descriptors used in data-driven processes. Furthermore, an in-depth discussion on the broad applications of material innovation, such as energy conversion and storage, environmental decontamination, flexible electronics, optoelectronics, superconductors, metallic glasses, and magnetic materials, is provided. Finally, how these subdisciplines (with insights into the synergy of materials science, computational tools, and mathematics) support data-driven paradigms is outlined, and the opportunities and challenges in data-driven material innovation are highlighted.
Collapse
Affiliation(s)
- Zhuo Wang
- Materials Interfaces Center, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, Guangdong, 518055, P. R. China
- Department of Chemical and Environmental Engineering, University of Nottingham Ningbo China, Ningbo, 315100, P. R. China
| | - Zhehao Sun
- Research School of Chemistry, The Australian National University, ACT, 2601, Australia
| | - Hang Yin
- Research School of Chemistry, The Australian National University, ACT, 2601, Australia
| | - Xinghui Liu
- Department of Chemistry, Sungkyunkwan University (SKKU), 2066 Seoburo, Jangan-Gu, Suwon, 16419, Republic of Korea
| | - Jinlan Wang
- School of Physics, Southeast University, Nanjing, 211189, P. R. China
| | - Haitao Zhao
- Materials Interfaces Center, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, Guangdong, 518055, P. R. China
| | - Cheng Heng Pang
- Department of Chemical and Environmental Engineering, University of Nottingham Ningbo China, Ningbo, 315100, P. R. China
- Municipal Key Laboratory of Clean Energy Conversion Technologies, University of Nottingham Ningbo China, Ningbo, 315100, P. R. China
| | - Tao Wu
- Key Laboratory for Carbonaceous Wastes Processing and Process Intensification Research of Zhejiang Province, University of Nottingham Ningbo China, Ningbo, 315100, P. R. China
- New Materials Institute, University of Nottingham, Ningbo, China, Ningbo, 315100, P. R. China
| | - Shuzhou Li
- School of Materials Science and Engineering, Nanyang Technological University, Singapore, 639798, Singapore
| | - Zongyou Yin
- Research School of Chemistry, The Australian National University, ACT, 2601, Australia
| | - Xue-Feng Yu
- Materials Interfaces Center, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, Guangdong, 518055, P. R. China
| |
Collapse
|
31
|
Zhang Z, Cheng M, Xiao X, Bi K, Song T, Hu KQ, Dai Y, Zhou L, Liu C, Ji X, Shi WQ. Machine-Learning-Guided Identification of Coordination Polymer Ligands for Crystallizing Separation of Cs/Sr. ACS APPLIED MATERIALS & INTERFACES 2022; 14:33076-33084. [PMID: 35801670 DOI: 10.1021/acsami.2c05272] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Separation of Cs/Sr is one of many coordination-chemistry-centered processes in the grand scheme of spent nuclear fuel reprocessing, a critical link for a sustainable nuclear energy industry. To deploy a crystallizing Cs/Sr separation technology, we planned to systematically screen and identify candidate ligands that can efficiently and selectively bind to Sr2+ and form coordination polymers. Therefore, we mined the Cambridge Structural Database for characteristic structural information and developed a machine-learning-guided methodology for ligand evaluation. The optimized machine-learning model, correlating the molecular structures of the ligands with the predicted coordinative properties, generated a ranking list of potential compounds for Cs/Sr selective crystallization. The Sr2+ sequestration capability and selectivity over Cs+ of the promising ligands identified (squaric acid and chloranilic acid) were subsequently confirmed experimentally, with commendable performances, corroborating the artificial-intelligence-guided strategy.
Collapse
Affiliation(s)
- Zhiyuan Zhang
- School of Chemical Engineering, Sichuan University, Chengdu 610065, People's Republic of China
| | - Min Cheng
- School of Chemical Engineering, Sichuan University, Chengdu 610065, People's Republic of China
| | - Xinyi Xiao
- School of Chemical Engineering, Sichuan University, Chengdu 610065, People's Republic of China
| | - Kexin Bi
- School of Chemical Engineering, Sichuan University, Chengdu 610065, People's Republic of China
| | - Ting Song
- School of Chemical Engineering, Sichuan University, Chengdu 610065, People's Republic of China
| | - Kong-Qiu Hu
- Laboratory of Nuclear Energy Chemistry, Institute of High Energy Physics, Chinese Academy of Sciences, Beijing 100049, People's Republic of China
| | - Yiyang Dai
- School of Chemical Engineering, Sichuan University, Chengdu 610065, People's Republic of China
| | - Li Zhou
- School of Chemical Engineering, Sichuan University, Chengdu 610065, People's Republic of China
| | - Chong Liu
- School of Chemical Engineering, Sichuan University, Chengdu 610065, People's Republic of China
| | - Xu Ji
- School of Chemical Engineering, Sichuan University, Chengdu 610065, People's Republic of China
| | - Wei-Qun Shi
- Laboratory of Nuclear Energy Chemistry, Institute of High Energy Physics, Chinese Academy of Sciences, Beijing 100049, People's Republic of China
| |
Collapse
|
32
|
Fey N, Lynam JM. Computational mechanistic study in organometallic catalysis: Why prediction is still a challenge. WIRES COMPUTATIONAL MOLECULAR SCIENCE 2022. [DOI: 10.1002/wcms.1590] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Affiliation(s)
- Natalie Fey
- School of Chemistry University of Bristol, Cantock's Close Bristol UK
| | | |
Collapse
|
33
|
Lewis‐Atwell T, Townsend PA, Grayson MN. Machine learning activation energies of chemical reactions. WIRES COMPUTATIONAL MOLECULAR SCIENCE 2022. [DOI: 10.1002/wcms.1593] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Affiliation(s)
- Toby Lewis‐Atwell
- Department of Computer Science, Faculty of Science University of Bath Bath UK
| | - Piers A. Townsend
- Department of Chemistry, Faculty of Science University of Bath Bath UK
| | | |
Collapse
|
34
|
Farrar EHE, Grayson MN. Machine learning and semi-empirical calculations: a synergistic approach to rapid, accurate, and mechanism-based reaction barrier prediction. Chem Sci 2022; 13:7594-7603. [PMID: 35872815 PMCID: PMC9242013 DOI: 10.1039/d2sc02925a] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2022] [Accepted: 06/08/2022] [Indexed: 11/21/2022] Open
Abstract
Modern QM modelling methods, such as DFT, have provided detailed mechanistic insights into countless reactions. However, their computational cost inhibits their ability to rapidly screen large numbers of substrates and catalysts in reaction discovery. For a C-C bond forming nitro-Michael addition, we introduce a synergistic semi-empirical quantum mechanical (SQM) and machine learning (ML) approach that allows the prediction of DFT-quality reaction barriers in minutes, even on a standard laptop using widely available modelling software. Mean absolute errors (MAEs) are obtained that are below the accepted chemical accuracy threshold of 1 kcal mol-1 and substantially better than SQM methods without ML correction (5.71 kcal mol-1). Predictive power is shown to hold when the ML models are applied to an unseen set of compounds from the toxicology literature. Mechanistic insight is also achieved via the generation of full SQM transition state (TS) structures which are found to be very good approximations for the DFT-level geometries, revealing important steric interactions in some TSs. This combination of speed, accuracy, and mechanistic insight is unprecedented; current ML barrier models compromise on at least one of these important criteria.
Collapse
Affiliation(s)
- Elliot H E Farrar
- Department of Chemistry, University of Bath Claverton Down Bath BA2 7AY UK
| | - Matthew N Grayson
- Department of Chemistry, University of Bath Claverton Down Bath BA2 7AY UK
| |
Collapse
|
35
|
Lustosa DM, Milo A. Mechanistic Inference from Statistical Models at Different Data-Size Regimes. ACS Catal 2022. [DOI: 10.1021/acscatal.2c01741] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
- Danilo M. Lustosa
- Department of Chemistry, Ben-Gurion University of the Negev, Beer Sheva 84105, Israel
| | - Anat Milo
- Department of Chemistry, Ben-Gurion University of the Negev, Beer Sheva 84105, Israel
| |
Collapse
|
36
|
Gensch T, Smith SR, Colacot TJ, Timsina YN, Xu G, Glasspoole BW, Sigman MS. Design and Application of a Screening Set for Monophosphine Ligands in Cross-Coupling. ACS Catal 2022. [DOI: 10.1021/acscatal.2c01970] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Affiliation(s)
- Tobias Gensch
- Department of Chemistry, TU Berlin, Straße des 17. Juni 135, Sekr. C2, 10623 Berlin, Germany
| | - Sleight R. Smith
- Department of Chemistry, University of Utah, 315 South 1400 East, Salt Lake City, Utah 84112, United States
| | - Thomas J. Colacot
- MilliporeSigma, 6000 N. Teutonia Ave, Milwaukee, Wisconsin 53209, United States
| | - Yam N. Timsina
- MilliporeSigma, 6000 N. Teutonia Ave, Milwaukee, Wisconsin 53209, United States
| | - Guolin Xu
- MilliporeSigma, 6000 N. Teutonia Ave, Milwaukee, Wisconsin 53209, United States
| | - Ben W. Glasspoole
- MilliporeSigma, 6000 N. Teutonia Ave, Milwaukee, Wisconsin 53209, United States
| | - Matthew S. Sigman
- Department of Chemistry, University of Utah, 315 South 1400 East, Salt Lake City, Utah 84112, United States
| |
Collapse
|
37
|
Nandy A, Duan C, Goffinet C, Kulik HJ. New Strategies for Direct Methane-to-Methanol Conversion from Active Learning Exploration of 16 Million Catalysts. JACS AU 2022; 2:1200-1213. [PMID: 35647589 PMCID: PMC9135396 DOI: 10.1021/jacsau.2c00176] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/16/2022] [Revised: 04/12/2022] [Accepted: 04/15/2022] [Indexed: 05/03/2023]
Abstract
Despite decades of effort, no earth-abundant homogeneous catalysts have been discovered that can selectively oxidize methane to methanol. We exploit active learning to simultaneously optimize methane activation and methanol release calculated with machine learning-accelerated density functional theory in a space of 16 M candidate catalysts including novel macrocycles. By constructing macrocycles from fragments inspired by synthesized compounds, we ensure synthetic realism in our computational search. Our large-scale search reveals that low-spin Fe(II) compounds paired with strong-field (e.g., P or S-coordinating) ligands have among the best energetic tradeoffs between hydrogen atom transfer (HAT) and methanol release. This observation contrasts with prior efforts that have focused on high-spin Fe(II) with weak-field ligands. By decoupling equatorial and axial ligand effects, we determine that negatively charged axial ligands are critical for more rapid release of methanol and that higher-valency metals [i.e., M(III) vs M(II)] are likely to be rate-limited by slow methanol release. With full characterization of barrier heights, we confirm that optimizing for HAT does not lead to large oxo formation barriers. Energetic span analysis reveals designs for an intermediate-spin Mn(II) catalyst and a low-spin Fe(II) catalyst that are predicted to have good turnover frequencies. Our active learning approach to optimize two distinct reaction energies with efficient global optimization is expected to be beneficial for the search of large catalyst spaces where no prior designs have been identified and where linear scaling relationships between reaction energies or barriers may be limited or unknown.
Collapse
Affiliation(s)
- Aditya Nandy
- Department
of Chemical Engineering, Massachusetts Institute
of Technology, Cambridge, Massachusetts 02139, United States
- Department
of Chemistry, Massachusetts Institute of
Technology, Cambridge, Massachusetts 02139, United States
| | - Chenru Duan
- Department
of Chemical Engineering, Massachusetts Institute
of Technology, Cambridge, Massachusetts 02139, United States
- Department
of Chemistry, Massachusetts Institute of
Technology, Cambridge, Massachusetts 02139, United States
| | - Conrad Goffinet
- Department
of Chemical Engineering, Massachusetts Institute
of Technology, Cambridge, Massachusetts 02139, United States
| | - Heather J. Kulik
- Department
of Chemical Engineering, Massachusetts Institute
of Technology, Cambridge, Massachusetts 02139, United States
| |
Collapse
|
38
|
Das M, Sharma P, Sunoj RB. Machine learning studies on asymmetric relay Heck reaction—Potential avenues for reaction development. J Chem Phys 2022; 156:114303. [DOI: 10.1063/5.0084432] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023] Open
Abstract
The integration of machine learning (ML) methods into chemical catalysis is evolving as a new paradigm for cost and time economic reaction development in recent times. Although there have been several successful applications of ML in catalysis, the prediction of enantioselectivity ( ee) remains challenging. Herein, we describe a ML workflow to predict ee of an important class of catalytic asymmetric transformation, namely, the relay Heck (RH) reaction. A random forest ML model, built using quantum chemically derived mechanistically relevant physical organic descriptors as features, is found to predict the ee remarkably well with a low root mean square error of 8.0 ± 1.3. Importantly, the model is effective in predicting the unseen variants of an asymmetric RH reaction. Furthermore, we predicted the ee for thousands of unexplored complementary reactions, including those leading to a good number of bioactive frameworks, by engaging different combinations of catalysts and substrates drawn from the original dataset. Our ML model developed on the available examples would be able to assist in exploiting the fuller potential of asymmetric RH reactions through a priori predictions before the actual experimentation, which would thus help surpass the trial and error loop to a larger degree.
Collapse
Affiliation(s)
- Manajit Das
- Department of Chemistry, Indian Institute of Technology Bombay, Powai, Mumbai 400076, India
| | - Pooja Sharma
- Department of Chemistry, Indian Institute of Technology Bombay, Powai, Mumbai 400076, India
| | - Raghavan B. Sunoj
- Department of Chemistry, Indian Institute of Technology Bombay, Powai, Mumbai 400076, India
| |
Collapse
|
39
|
Harada Y, Hatakeyama M, Maeda S, Gao Q, Koizumi K, Sakamoto Y, Ono Y, Nakamura S. Molecular Design Learned from the Natural Product Porphyra-334: Molecular Generation via Chemical Variational Autoencoder versus Database Mining via Similarity Search, A Comparative Study. ACS OMEGA 2022; 7:8581-8590. [PMID: 35309498 PMCID: PMC8928499 DOI: 10.1021/acsomega.1c06453] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/16/2021] [Accepted: 02/18/2022] [Indexed: 06/14/2023]
Abstract
A comparative study is presented. The method via chemical variational autoencoder (VAE) and the method via similarity search are compared, focusing on their generation ability for new functional molecular design. Focusing on the natural porphyra-334 as a model molecule, we generated three groups: molecules of mycosporine-like amino acids (MAAs) as seeds (G SEEDS ), molecules generated via chemical VAE (G VAE ) and molecules gathered via similarity search (G SIM ). The number of molecules that satisfy the condition for the light absorption ability of porphyra-334 in G SEEDS , G VAE , and G SIM are 52, 138, and 6, respectively. The method via chemical VAE shows a promising potential for future molecular design. By using quantum chemistry wave function properties for chemical VAE, we find new molecules that are comparable to porphyra-334, including some with unexpected geometries. At the end, we show a group of molecules found with this method.
Collapse
Affiliation(s)
- Yuki Harada
- Cluster
for Science, Technology, and Innovation Hub, Nakamura Laboratory, RIKEN, 2-1, Hirosawa, Wako, Saitama 351-0198, Japan
| | - Makoto Hatakeyama
- Cluster
for Science, Technology, and Innovation Hub, Nakamura Laboratory, RIKEN, 2-1, Hirosawa, Wako, Saitama 351-0198, Japan
- Sanyo-Onoda
City University, 1-1-1
Daigakudori, Sanyo-Onoda, Yamaguchi 756-0884, Japan
| | - Shuichi Maeda
- Cluster
for Science, Technology, and Innovation Hub, Nakamura Laboratory, RIKEN, 2-1, Hirosawa, Wako, Saitama 351-0198, Japan
| | - Qi Gao
- Mitsubishi
Chemical Corporation Science & Innovation Center 1000 Kamoshida-cho, Yokohama, Kanagawa 227-8502, Japan
| | - Kenichi Koizumi
- Cluster
for Science, Technology, and Innovation Hub, Nakamura Laboratory, RIKEN, 2-1, Hirosawa, Wako, Saitama 351-0198, Japan
| | - Yuki Sakamoto
- Cluster
for Science, Technology, and Innovation Hub, Nakamura Laboratory, RIKEN, 2-1, Hirosawa, Wako, Saitama 351-0198, Japan
| | - Yuuki Ono
- Mitsubishi
Chemical Corporation Science & Innovation Center 1000 Kamoshida-cho, Yokohama, Kanagawa 227-8502, Japan
| | - Shinichiro Nakamura
- Cluster
for Science, Technology, and Innovation Hub, Nakamura Laboratory, RIKEN, 2-1, Hirosawa, Wako, Saitama 351-0198, Japan
| |
Collapse
|
40
|
Matsuoka W, Harabuchi Y, Maeda S. Virtual Ligand-Assisted Screening Strategy to Discover Enabling Ligands for Transition Metal Catalysis. ACS Catal 2022. [DOI: 10.1021/acscatal.2c00267] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Affiliation(s)
- Wataru Matsuoka
- Department of Chemistry, Faculty of Science, Hokkaido University, Sapporo, Hokkaido 060-0810, Japan
- ERATO Maeda Artificial Intelligence for Chemical Reaction Design and Discovery Project, Hokkaido University, Sapporo, Hokkaido 060-0810, Japan
| | - Yu Harabuchi
- Department of Chemistry, Faculty of Science, Hokkaido University, Sapporo, Hokkaido 060-0810, Japan
- Institute for Chemical Reaction Design and Discovery (WPI-ICReDD), Hokkaido University, Sapporo, Hokkaido 001-0021, Japan
- ERATO Maeda Artificial Intelligence for Chemical Reaction Design and Discovery Project, Hokkaido University, Sapporo, Hokkaido 060-0810, Japan
| | - Satoshi Maeda
- Department of Chemistry, Faculty of Science, Hokkaido University, Sapporo, Hokkaido 060-0810, Japan
- Institute for Chemical Reaction Design and Discovery (WPI-ICReDD), Hokkaido University, Sapporo, Hokkaido 001-0021, Japan
- ERATO Maeda Artificial Intelligence for Chemical Reaction Design and Discovery Project, Hokkaido University, Sapporo, Hokkaido 060-0810, Japan
- Research and Services Division of Materials Data and Integrated System (MaDIS), National Institute for Materials Science (NIMS), Tsukuba, Ibaraki 305-0044, Japan
| |
Collapse
|
41
|
Stuyver T, Coley CW. Quantum chemistry-augmented neural networks for reactivity prediction: Performance, generalizability, and explainability. J Chem Phys 2022; 156:084104. [DOI: 10.1063/5.0079574] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023] Open
Abstract
There is a perceived dichotomy between structure-based and descriptor-based molecular representations used for predictive chemistry tasks. Here, we study the performance, generalizability, and explainability of the quantum mechanics-augmented graph neural network (ml-QM-GNN) architecture as applied to the prediction of regioselectivity (classification) and of activation energies (regression). In our hybrid QM-augmented model architecture, structure-based representations are first used to predict a set of atom- and bond-level reactivity descriptors derived from density functional theory calculations. These estimated reactivity descriptors are combined with the original structure-based representation to make the final reactivity prediction. We demonstrate that our model architecture leads to significant improvements over structure-based GNNs in not only overall accuracy but also in generalization to unseen compounds. Even when provided training sets of only a couple hundred labeled data points, the ml-QM-GNN outperforms other state-of-the-art structure-based architectures that have been applied to these tasks as well as descriptor-based (linear) regressions. As a primary contribution of this work, we demonstrate a bridge between data-driven predictions and conceptual frameworks commonly used to gain qualitative insights into reactivity phenomena, taking advantage of the fact that our models are grounded in (but not restricted to) QM descriptors. This effort results in a productive synergy between theory and data science, wherein QM-augmented models provide a data-driven confirmation of previous qualitative analyses, and these analyses in turn facilitate insights into the decision-making process occurring within ml-QM-GNNs.
Collapse
Affiliation(s)
- Thijs Stuyver
- Department of Chemical Engineering, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, Massachusetts 02139, USA
| | - Connor W. Coley
- Department of Chemical Engineering, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, Massachusetts 02139, USA
- Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, Massachusetts 02139, USA
| |
Collapse
|
42
|
Duan C, Nandy A, Kulik HJ. Machine Learning for the Discovery, Design, and Engineering of Materials. Annu Rev Chem Biomol Eng 2022; 13:405-429. [PMID: 35320698 DOI: 10.1146/annurev-chembioeng-092320-120230] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Machine learning (ML) has become a part of the fabric of high-throughput screening and computational discovery of materials. Despite its increasingly central role, challenges remain in fully realizing the promise of ML. This is especially true for the practical acceleration of the engineering of robust materials and the development of design strategies that surpass trial and error or high-throughput screening alone. Depending on the quantity being predicted and the experimental data available, ML can either outperform physics-based modes, be used to accelerate such models, or be integrated with them to improve their performance. We cover recent advances in algorithms and in their application that are starting to make inroads toward (a) the discovery of new materials through large-scale enumerative screening, (b) the design of materials through identification of rules and principles that govern materials properties, and (c) the engineering of practical materials by satisfying multiple objectives. We conclude with opportunities for further advancement to realize ML as a widespread tool for practical computational materials design. Expected final online publication date for the Annual Review of Chemical and Biomolecular Engineering, Volume 13 is October 2022. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.
Collapse
Affiliation(s)
- Chenru Duan
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA; , , .,Department of Chemistry, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA
| | - Aditya Nandy
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA; , , .,Department of Chemistry, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA
| | - Heather J Kulik
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA; , ,
| |
Collapse
|
43
|
Kalikadien AV, Pidko EA, Sinha V. ChemSpaX: exploration of chemical space by automated functionalization of molecular scaffold. DIGITAL DISCOVERY 2022; 1:8-25. [PMID: 35340336 PMCID: PMC8887922 DOI: 10.1039/d1dd00017a] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/02/2021] [Accepted: 12/23/2021] [Indexed: 12/19/2022]
Abstract
Exploration of the local chemical space of molecular scaffolds by post-functionalization (PF) is a promising route to discover novel molecules with desired structure and function. PF with rationally chosen substituents based on known electronic and steric properties is a commonly used experimental and computational strategy in screening, design and optimization of catalytic scaffolds. Automated generation of reasonably accurate geometric representations of post-functionalized molecular scaffolds is highly desirable for data-driven applications. However, automated PF of transition metal (TM) complexes remains challenging. In this work a Python-based workflow, ChemSpaX, that is aimed at automating the PF of a given molecular scaffold with special emphasis on TM complexes, is introduced. In three representative applications of ChemSpaX by comparing with DFT and DFT-B calculations, we show that the generated structures have a reasonable quality for use in computational screening applications. Furthermore, we show that ChemSpaX generated geometries can be used in machine learning applications to accurately predict DFT computed HOMO-LUMO gaps for transition metal complexes. ChemSpaX is open-source and aims to bolster and democratize the efforts of the scientific community towards data-driven chemical discovery.
Collapse
Affiliation(s)
- Adarsh V Kalikadien
- Inorganic Systems Engineering, Department of Chemical Engineering, Faculty of Applied Sciences, Delft University of Technology Van der Maasweg 9 2629 HZ Delft The Netherlands
| | - Evgeny A Pidko
- Inorganic Systems Engineering, Department of Chemical Engineering, Faculty of Applied Sciences, Delft University of Technology Van der Maasweg 9 2629 HZ Delft The Netherlands
| | - Vivek Sinha
- Inorganic Systems Engineering, Department of Chemical Engineering, Faculty of Applied Sciences, Delft University of Technology Van der Maasweg 9 2629 HZ Delft The Netherlands
| |
Collapse
|
44
|
Eisenstein O. From the Felkin‐Anh Rule to the Grignard Reaction: an Almost Circular 50 Year Adventure in the World of Molecular Structures and Reaction Mechanisms with Computational Chemistry**. Isr J Chem 2022. [DOI: 10.1002/ijch.202100138] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Affiliation(s)
- Odile Eisenstein
- ICGM, Univ. Montpellier, CNRS, ENSCM, Montpellier, 34095 France Department of Chemistry and Hylleraas Centre for Quantum Molecular Sciences University of Oslo Oslo 0315 Norway
| |
Collapse
|
45
|
Wen M, Blau SM, Xie X, Dwaraknath S, Persson KA. Improving machine learning performance on small chemical reaction data with unsupervised contrastive pretraining. Chem Sci 2022; 13:1446-1458. [PMID: 35222929 PMCID: PMC8809395 DOI: 10.1039/d1sc06515g] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2021] [Accepted: 01/09/2022] [Indexed: 11/21/2022] Open
Abstract
Machine learning (ML) methods have great potential to transform chemical discovery by accelerating the exploration of chemical space and drawing scientific insights from data. However, modern chemical reaction ML models, such as those based on graph neural networks (GNNs), must be trained on a large amount of labelled data in order to avoid overfitting the data and thus possessing low accuracy and transferability. In this work, we propose a strategy to leverage unlabelled data to learn accurate ML models for small labelled chemical reaction data. We focus on an old and prominent problem-classifying reactions into distinct families-and build a GNN model for this task. We first pretrain the model on unlabelled reaction data using unsupervised contrastive learning and then fine-tune it on a small number of labelled reactions. The contrastive pretraining learns by making the representations of two augmented versions of a reaction similar to each other but distinct from other reactions. We propose chemically consistent reaction augmentation methods that protect the reaction center and find they are the key for the model to extract relevant information from unlabelled data to aid the reaction classification task. The transfer learned model outperforms a supervised model trained from scratch by a large margin. Further, it consistently performs better than models based on traditional rule-driven reaction fingerprints, which have long been the default choice for small datasets, as well as those based on reaction fingerprints derived from masked language modelling. In addition to reaction classification, the effectiveness of the strategy is tested on regression datasets; the learned GNN-based reaction fingerprints can also be used to navigate the chemical reaction space, which we demonstrate by querying for similar reactions. The strategy can be readily applied to other predictive reaction problems to uncover the power of unlabelled data for learning better models with a limited supply of labels.
Collapse
Affiliation(s)
- Mingjian Wen
- Energy Technologies Area, Lawrence Berkeley National Laboratory Berkeley CA 94720 USA
| | - Samuel M Blau
- Energy Technologies Area, Lawrence Berkeley National Laboratory Berkeley CA 94720 USA
| | - Xiaowei Xie
- College of Chemistry, University of California Berkeley CA 94720 USA
- Materials Science Division, Lawrence Berkeley National Laboratory Berkeley CA 94720 USA
| | | | - Kristin A Persson
- Department of Materials Science and Engineering, University of California Berkeley CA 94720 USA
- Molecular Foundry, Lawrence Berkeley National Laboratory Berkeley CA 94720 USA
| |
Collapse
|
46
|
Harper DR, Nandy A, Arunachalam N, Duan C, Janet JP, Kulik HJ. Representations and strategies for transferable machine learning Improve model performance in chemical discovery. J Chem Phys 2022; 156:074101. [DOI: 10.1063/5.0082964] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Affiliation(s)
- Daniel R Harper
- Massachusetts Institute of Technology, United States of America
| | - Aditya Nandy
- Massachusetts Institute of Technology, United States of America
| | | | - Chenru Duan
- Massachusetts Institute of Technology, United States of America
| | | | - Heather J. Kulik
- Dept of Chemical Engineering, Massachusetts Institute of Technology, United States of America
| |
Collapse
|
47
|
Gensch T, Dos Passos Gomes G, Friederich P, Peters E, Gaudin T, Pollice R, Jorner K, Nigam A, Lindner-D'Addario M, Sigman MS, Aspuru-Guzik A. A Comprehensive Discovery Platform for Organophosphorus Ligands for Catalysis. J Am Chem Soc 2022; 144:1205-1217. [PMID: 35020383 DOI: 10.1021/jacs.1c09718] [Citation(s) in RCA: 68] [Impact Index Per Article: 34.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Abstract
The design of molecular catalysts typically involves reconciling multiple conflicting property requirements, largely relying on human intuition and local structural searches. However, the vast number of potential catalysts requires pruning of the candidate space by efficient property prediction with quantitative structure-property relationships. Data-driven workflows embedded in a library of potential catalysts can be used to build predictive models for catalyst performance and serve as a blueprint for novel catalyst designs. Herein we introduce kraken, a discovery platform covering monodentate organophosphorus(III) ligands providing comprehensive physicochemical descriptors based on representative conformer ensembles. Using quantum-mechanical methods, we calculated descriptors for 1558 ligands, including commercially available examples, and trained machine learning models to predict properties of over 300000 new ligands. We demonstrate the application of kraken to systematically explore the property space of organophosphorus ligands and how existing data sets in catalysis can be used to accelerate ligand selection during reaction optimization.
Collapse
Affiliation(s)
- Tobias Gensch
- Department of Chemistry, University of Utah, 315 South 1400 East, Salt Lake City, Utah 84112, United States.,Department of Chemistry, TU Berlin, Straße des 17. Juni 135, Sekr. C2, 10623 Berlin, Germany
| | - Gabriel Dos Passos Gomes
- Chemical Physics Theory Group, Department of Chemistry, University of Toronto, 80 St. George St., Toronto, Ontario M5S 3H6, Canada.,Department of Computer Science, University of Toronto, 214 College St., Toronto, Ontario M5T 3A1, Canada.,Vector Institute for Artificial Intelligence, 661 University Ave. Suite 710, Toronto, Ontario M5G 1M1, Canada
| | - Pascal Friederich
- Chemical Physics Theory Group, Department of Chemistry, University of Toronto, 80 St. George St., Toronto, Ontario M5S 3H6, Canada.,Department of Computer Science, University of Toronto, 214 College St., Toronto, Ontario M5T 3A1, Canada.,Institute of Nanotechnology, Karlsruhe Institute of Technology, Hermann-von-Helmholtz-Platz 1, 76344 Eggenstein-Leopoldshafen, Germany
| | - Ellyn Peters
- Department of Chemistry, University of Utah, 315 South 1400 East, Salt Lake City, Utah 84112, United States
| | - Théophile Gaudin
- Department of Computer Science, University of Toronto, 214 College St., Toronto, Ontario M5T 3A1, Canada.,IBM Research Zurich, Säumerstrasse 4, 8803 Rüschlikon, Switzerland
| | - Robert Pollice
- Chemical Physics Theory Group, Department of Chemistry, University of Toronto, 80 St. George St., Toronto, Ontario M5S 3H6, Canada.,Department of Computer Science, University of Toronto, 214 College St., Toronto, Ontario M5T 3A1, Canada
| | - Kjell Jorner
- Chemical Physics Theory Group, Department of Chemistry, University of Toronto, 80 St. George St., Toronto, Ontario M5S 3H6, Canada.,Department of Computer Science, University of Toronto, 214 College St., Toronto, Ontario M5T 3A1, Canada.,Early Chemical Development, Pharmaceutical Sciences, R&D, AstraZeneca, Macclesfield K10 2NA, United Kingdom
| | - AkshatKumar Nigam
- Chemical Physics Theory Group, Department of Chemistry, University of Toronto, 80 St. George St., Toronto, Ontario M5S 3H6, Canada.,Department of Computer Science, University of Toronto, 214 College St., Toronto, Ontario M5T 3A1, Canada
| | - Michael Lindner-D'Addario
- Chemical Physics Theory Group, Department of Chemistry, University of Toronto, 80 St. George St., Toronto, Ontario M5S 3H6, Canada.,Department of Computer Science, University of Toronto, 214 College St., Toronto, Ontario M5T 3A1, Canada
| | - Matthew S Sigman
- Department of Chemistry, University of Utah, 315 South 1400 East, Salt Lake City, Utah 84112, United States
| | - Alán Aspuru-Guzik
- Chemical Physics Theory Group, Department of Chemistry, University of Toronto, 80 St. George St., Toronto, Ontario M5S 3H6, Canada.,Department of Computer Science, University of Toronto, 214 College St., Toronto, Ontario M5T 3A1, Canada.,Vector Institute for Artificial Intelligence, 661 University Ave. Suite 710, Toronto, Ontario M5G 1M1, Canada.,Lebovic Fellow, Canadian Institute for Advanced Research (CIFAR), 661 University Ave., Toronto, Ontario M5G, Canada
| |
Collapse
|
48
|
Morán-González L, Besora M, Maseras F. Seeking the Optimal Descriptor for S N2 Reactions through Statistical Analysis of Density Functional Theory Results. J Org Chem 2021; 87:363-372. [PMID: 34935370 DOI: 10.1021/acs.joc.1c02387] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Bimolecular nucleophilic substitution is one of the fundamental reactions in organic chemistry, yet there is still knowledge to be gained on the role of the nucleophile and the substrate. A statistical treatment of over 600 density functional theory (DFT)-computed barriers for bimolecular nucleophilic substitution at methyl derivatives (SN2@C) leads to the identification of numerical descriptors that best represent the entering and leaving ability of 26 different nucleophiles. The treatment is based on singular value decomposition (SVD) of a matrix of computed energy barriers. The current work represents the extension to a problem of reactivity of the hidden descriptor methodology that we had previously developed for the thermodynamic problem of bond dissociation energies in transition-metal complexes. The analysis of the results shows that a single descriptor is sufficient. This hidden descriptor has different values for nucleophilic and leaving abilities and, contrary to expectation, does not correlate especially well with either frontier molecular orbital descriptors or solvation descriptors. In contrast, it correlates with other thermodynamic and geometric parameters. This statistical procedure can be in principle extended to additional chemical fragments and other reactions.
Collapse
Affiliation(s)
- Lucía Morán-González
- Institute of Chemical Research of Catalonia (ICIQ), The Barcelona Institute of Science and Technology, Avgda. Països Catalans, 16, 43007 Tarragona, Catalonia, Spain
| | - Maria Besora
- Departament de Química Física i Inorgànica, Universitat Rovira i Virgili, c/Marcel·lí Domingo s/n, 43007 Tarragona, Catalonia, Spain
| | - Feliu Maseras
- Institute of Chemical Research of Catalonia (ICIQ), The Barcelona Institute of Science and Technology, Avgda. Països Catalans, 16, 43007 Tarragona, Catalonia, Spain
| |
Collapse
|
49
|
Morán‐González L, Pedregal JR, Besora M, Maseras F. Understanding the Binding Properties of N‐heterocyclic Carbenes through BDE Matrix App. Eur J Inorg Chem 2021. [DOI: 10.1002/ejic.202100932] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Affiliation(s)
- Lucía Morán‐González
- Institute of Chemical Research of Catalonia (ICIQ) The Barcelona Institute of Science and Technology Avgda. Països Catalans, 16 Tarragona 43007 Catalonia Spain
| | - Jaime Rodríguez‐Guerra Pedregal
- Institute of Chemical Research of Catalonia (ICIQ) The Barcelona Institute of Science and Technology Avgda. Països Catalans, 16 Tarragona 43007 Catalonia Spain
| | - Maria Besora
- Departament de Química Física i Inorgànica Universitat Rovira i Virgili c/Marcel⋅lí Domingo s/n Tarragona 43007 Catalonia Spain
| | - Feliu Maseras
- Institute of Chemical Research of Catalonia (ICIQ) The Barcelona Institute of Science and Technology Avgda. Països Catalans, 16 Tarragona 43007 Catalonia Spain
| |
Collapse
|
50
|
Lu H, Kang X, Luo Y. Structure-Based Relative Energy Prediction Model: A Case Study of Pd(II)-Catalyzed Ethylene Polymerization and the Electronic Effect of Ancillary Ligands. J Phys Chem B 2021; 125:12047-12053. [PMID: 34694809 DOI: 10.1021/acs.jpcb.1c05143] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Rapidly mapping a reaction energy profile to understand the reaction mechanism is of great importance and highly desired for the discovery of new chemical reactions. Herein, a combination of density functional theory (DFT) calculations and regression analysis has been applied to construct quantitative structures-based energy prediction models, considering Pd(II)-catalyzed ethylene polymerization as an example, for rapid construction of the reaction energy profile. It is inspiring that only geometrical parameters of the reaction center of one species are capable of predicting the whole energy profile with high accuracy. The reaction energies of ethylene insertion and β-H elimination, which directly correlate with polymerization activity and the possibility of branch formation, were studied to elucidate the electronic effects of ancillary ligands. Further analyses of these models from the statistical and chemical points of view afforded useful information on the design of the catalyst ligand. The current work is expected to methodologically shed new light on rapidly mapping the energy profile of chemical reactions and further provide useful information for the development of the reactions.
Collapse
Affiliation(s)
- Han Lu
- State Key Laboratory of Fine Chemicals, School of Chemical Engineering, Dalian University of Technology, Dalian 116024, China
| | - Xiaohui Kang
- College of Pharmacy, Dalian Medical University, Dalian 116044, China
| | - Yi Luo
- State Key Laboratory of Fine Chemicals, School of Chemical Engineering, Dalian University of Technology, Dalian 116024, China.,PetroChina Petrochemical Research Institute, Beijing 102206, China
| |
Collapse
|