1
|
Stuyver T. TS-tools: Rapid and automated localization of transition states based on a textual reaction SMILES input. J Comput Chem 2024. [PMID: 38850166 DOI: 10.1002/jcc.27374] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2024] [Revised: 03/08/2024] [Accepted: 03/20/2024] [Indexed: 06/10/2024]
Abstract
Here, TS-tools is presented, a Python package facilitating the automated localization of transition states (TS) based on a textual reaction SMILES input. TS searches can either be performed at xTB or DFT level of theory, with the former yielding guesses at marginal computational cost, and the latter directly yielding accurate structures at greater expense. On a benchmarking dataset of mono- and bimolecular reactions, TS-tools reaches an excellent success rate of 95% already at xTB level of theory. For tri- and multimolecular reaction pathways - which are typically not benchmarked when developing new automated TS search approaches, yet are relevant for various types of reactivity, cf. solvent- and autocatalysis and enzymatic reactivity - TS-tools retains its ability to identify TS geometries, though a DFT treatment becomes essential in many cases. Throughout the presented applications, a particular emphasis is placed on solvation-induced mechanistic changes, another issue that received limited attention in the automated TS search literature so far.
Collapse
Affiliation(s)
- Thijs Stuyver
- Ecole Nationale Supérieure de Chimie de Paris, Université PSL, CNRS, Institute of Chemistry for Life and Health Sciences, Paris, France
| |
Collapse
|
2
|
Yang Y, Zhang S, Ranasinghe KD, Isayev O, Roitberg AE. Machine Learning of Reactive Potentials. Annu Rev Phys Chem 2024; 75:371-395. [PMID: 38941524 DOI: 10.1146/annurev-physchem-062123-024417] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/30/2024]
Abstract
In the past two decades, machine learning potentials (MLPs) have driven significant developments in chemical, biological, and material sciences. The construction and training of MLPs enable fast and accurate simulations and analysis of thermodynamic and kinetic properties. This review focuses on the application of MLPs to reaction systems with consideration of bond breaking and formation. We review the development of MLP models, primarily with neural network and kernel-based algorithms, and recent applications of reactive MLPs (RMLPs) to systems at different scales. We show how RMLPs are constructed, how they speed up the calculation of reactive dynamics, and how they facilitate the study of reaction trajectories, reaction rates, free energy calculations, and many other calculations. Different data sampling strategies applied in building RMLPs are also discussed with a focus on how to collect structures for rare events and how to further improve their performance with active learning.
Collapse
Affiliation(s)
- Yinuo Yang
- Department of Chemistry, University of Florida, Gainesville, Florida;
| | - Shuhao Zhang
- Department of Chemistry, Carnegie Mellon University, Pittsburgh, Pennsylvania;
| | | | - Olexandr Isayev
- Department of Chemistry, Carnegie Mellon University, Pittsburgh, Pennsylvania;
| | - Adrian E Roitberg
- Department of Chemistry, University of Florida, Gainesville, Florida;
| |
Collapse
|
3
|
Vadaddi SM, Zhao Q, Savoie BM. Graph to Activation Energy Models Easily Reach Irreducible Errors but Show Limited Transferability. J Phys Chem A 2024; 128:2543-2555. [PMID: 38517281 DOI: 10.1021/acs.jpca.3c07240] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/23/2024]
Abstract
Activation energy characterization of competing reactions is a costly but crucial step for understanding the kinetic relevance of distinct reaction pathways, product yields, and myriad other properties of reacting systems. The standard methodology for activation energy characterization has historically been a transition state search using the highest level of theory that can be afforded. However, recently, several groups have popularized the idea of predicting activation energies directly based on nothing more than the reactant and product graphs, a sufficiently complex neural network, and a broad enough data set. Here, we have revisited this task using the recently developed Reaction Graph Depth 1 (RGD1) transition state data set and several newly developed graph attention architectures. All of these new architectures achieve similar state-of-the-art results of ∼4 kcal/mol mean absolute error on withheld testing sets of reactions but poor performance on external testing sets composed of reactions with differing mechanisms, reaction molecularity, or reactant size distribution. Limited transferability is also shown to be shared by other contemporary graph to activation energy architectures through a series of case studies. We conclude that an array of standard graph architectures can already achieve results comparable to the irreducible error of available reaction data sets but that out-of-distribution performance remains poor.
Collapse
Affiliation(s)
- Sai Mahit Vadaddi
- Davidson School of Chemical Engineering, Purdue University, West Lafayette, Indiana 47906, United States
| | - Qiyuan Zhao
- Department of Medicinal Chemistry, University of Michigan, Ann Arbor, Michigan 48109, United States
| | - Brett M Savoie
- Davidson School of Chemical Engineering, Purdue University, West Lafayette, Indiana 47906, United States
| |
Collapse
|
4
|
Nippa DF, Atz K, Hohler R, Müller AT, Marx A, Bartelmus C, Wuitschik G, Marzuoli I, Jost V, Wolfard J, Binder M, Stepan AF, Konrad DB, Grether U, Martin RE, Schneider G. Enabling late-stage drug diversification by high-throughput experimentation with geometric deep learning. Nat Chem 2024; 16:239-248. [PMID: 37996732 PMCID: PMC10849962 DOI: 10.1038/s41557-023-01360-5] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2022] [Accepted: 10/03/2023] [Indexed: 11/25/2023]
Abstract
Late-stage functionalization is an economical approach to optimize the properties of drug candidates. However, the chemical complexity of drug molecules often makes late-stage diversification challenging. To address this problem, a late-stage functionalization platform based on geometric deep learning and high-throughput reaction screening was developed. Considering borylation as a critical step in late-stage functionalization, the computational model predicted reaction yields for diverse reaction conditions with a mean absolute error margin of 4-5%, while the reactivity of novel reactions with known and unknown substrates was classified with a balanced accuracy of 92% and 67%, respectively. The regioselectivity of the major products was accurately captured with a classifier F-score of 67%. When applied to 23 diverse commercial drug molecules, the platform successfully identified numerous opportunities for structural diversification. The influence of steric and electronic information on model performance was quantified, and a comprehensive simple user-friendly reaction format was introduced that proved to be a key enabler for seamlessly integrating deep learning and high-throughput experimentation for late-stage functionalization.
Collapse
Affiliation(s)
- David F Nippa
- Roche Pharma Research and Early Development (pRED), Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd., Basel, Switzerland
- Department of Pharmacy, Ludwig-Maximilians-Universität München, Munich, Germany
| | - Kenneth Atz
- Department of Chemistry and Applied Biosciences, ETH Zurich, Zurich, Switzerland
| | - Remo Hohler
- Roche Pharma Research and Early Development (pRED), Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd., Basel, Switzerland
| | - Alex T Müller
- Roche Pharma Research and Early Development (pRED), Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd., Basel, Switzerland
| | - Andreas Marx
- Roche Pharma Research and Early Development (pRED), Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd., Basel, Switzerland
| | - Christian Bartelmus
- Roche Pharma Research and Early Development (pRED), Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd., Basel, Switzerland
| | - Georg Wuitschik
- Roche Pharma Research and Early Development (pRED), Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd., Basel, Switzerland
| | - Irene Marzuoli
- Process Chemistry and Catalysis (PCC), F. Hoffmann-La Roche Ltd., Basel, Switzerland
| | - Vera Jost
- Roche Pharma Research and Early Development (pRED), Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd., Basel, Switzerland
| | - Jens Wolfard
- Roche Pharma Research and Early Development (pRED), Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd., Basel, Switzerland
| | - Martin Binder
- Roche Pharma Research and Early Development (pRED), Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd., Basel, Switzerland
| | - Antonia F Stepan
- Roche Pharma Research and Early Development (pRED), Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd., Basel, Switzerland
| | - David B Konrad
- Department of Pharmacy, Ludwig-Maximilians-Universität München, Munich, Germany.
| | - Uwe Grether
- Roche Pharma Research and Early Development (pRED), Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd., Basel, Switzerland.
| | - Rainer E Martin
- Roche Pharma Research and Early Development (pRED), Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd., Basel, Switzerland.
| | - Gisbert Schneider
- Department of Chemistry and Applied Biosciences, ETH Zurich, Zurich, Switzerland.
- ETH Singapore SEC Ltd, Singapore, Singapore.
| |
Collapse
|
5
|
Levine DS, Jacobson LD, Bochevarov AD. Large Computational Survey of Intrinsic Reactivity of Aromatic Carbon Atoms with Respect to a Model Aldehyde Oxidase. J Chem Theory Comput 2023; 19:9302-9317. [PMID: 38085599 DOI: 10.1021/acs.jctc.3c00913] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2023]
Abstract
Aldehyde oxidase (AOX) and other related molybdenum-containing enzymes are known to oxidize the C-H bonds of aromatic rings. This process contributes to the metabolism of pharmaceutical compounds and, therefore, is of vital importance to drug pharmacokinetics. The present work describes an automated computational workflow and its use for the prediction of intrinsic reactivity of small aromatic molecules toward a minimal model of the active site of AOX. The workflow is based on quantum chemical transition state searches for the underlying single-step oxidation reaction, where the automated protocol includes identification of unique aromatic C-H bonds, creation of three-dimensional reactant and product complex geometries via a templating approach, search for a transition state, and validation of reaction end points. Conformational search on the reactants, products, and the transition states is performed. The automated procedure has been validated on previously reported transition state barriers and was used to evaluate the intrinsic reactivity of nearly three hundred heterocycles commonly found in approved drug molecules. The intrinsic reactivity of more than 1000 individual aromatic carbon sites is reported. Stereochemical and conformational aspects of the oxidation reaction, which have not been discussed in previous studies, are shown to play important roles in accurate modeling of the oxidation reaction. Observations on structural trends that determine the reactivity are provided and rationalized.
Collapse
Affiliation(s)
- Daniel S Levine
- Schrödinger, Inc., 1540 Broadway, Floor 24, New York, New York 10036, United States
| | - Leif D Jacobson
- Schrödinger, Inc., 101 SW Main Street, Suite 1300, Portland, Oregon 97204, United States
| | - Art D Bochevarov
- Schrödinger, Inc., 1540 Broadway, Floor 24, New York, New York 10036, United States
| |
Collapse
|
6
|
Domenichini G, Dellago C. Molecular Hessian matrices from a machine learning random forest regression algorithm. J Chem Phys 2023; 159:194111. [PMID: 37982481 DOI: 10.1063/5.0169384] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2023] [Accepted: 10/27/2023] [Indexed: 11/21/2023] Open
Abstract
In this article, we present a machine learning model to obtain fast and accurate estimates of the molecular Hessian matrix. In this model, based on a random forest, the second derivatives of the energy with respect to redundant internal coordinates are learned individually. The internal coordinates together with their specific representation guarantee rotational and translational invariance. The model is trained on a subset of the QM7 dataset but is shown to be applicable to larger molecules picked from the QM9 dataset. From the predicted Hessian, it is also possible to obtain reasonable estimates of the vibrational frequencies, normal modes, and zero point energies of the molecules.
Collapse
Affiliation(s)
- Giorgio Domenichini
- Faculty of Physics, University of Vienna, Kolingasse 14-16, 1090 Vienna, Austria
| | - Christoph Dellago
- Faculty of Physics, University of Vienna, Kolingasse 14-16, 1090 Vienna, Austria
| |
Collapse
|
7
|
Lewis-Atwell T, Beechey D, Şimşek Ö, Grayson MN. Reformulating Reactivity Design for Data-Efficient Machine Learning. ACS Catal 2023; 13:13506-13515. [PMID: 37881791 PMCID: PMC10594582 DOI: 10.1021/acscatal.3c02513] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2023] [Revised: 08/24/2023] [Indexed: 10/27/2023]
Abstract
Machine learning (ML) can deliver rapid and accurate reaction barrier predictions for use in rational reactivity design. However, model training requires large data sets of typically thousands or tens of thousands of barriers that are very expensive to obtain computationally or experimentally. Furthermore, bespoke data sets are required for each region of interest in reaction space as models typically struggle to generalize. We have therefore reformulated the ML barrier prediction problem toward a much more data-efficient process: finding a reaction from a prespecified set with a desired target value. Our reformulation enables the rapid selection of reactions with purpose-specific activation barriers, for example, in the design of reactivity and selectivity in synthesis, catalyst design, toxicology, and covalent drug discovery, requiring just tens of accurately measured barriers. Importantly, our reformulation does not require generalization beyond the domain of the data set at hand, and we show excellent results for the highly toxicologically and synthetically relevant data sets of aza-Michael addition and transition-metal-catalyzed dihydrogen activation, typically requiring less than 20 accurately measured density functional theory (DFT) barriers. Even for incomplete data sets of E2 and SN2 reactions, with high numbers of missing barriers (74% and 56% respectively), our chosen ML search method still requires significantly fewer data points than the hundreds or thousands needed for more conventional uses of ML to predict activation barriers. Finally, we include a case study in which we use our process to guide the optimization of the dihydrogen activation catalyst. Our approach was able to identify a reaction within 1 kcal mol-1 of the target barrier by only having to run 12 DFT reaction barrier calculations, which illustrates the usage and real-world applicability of this reformulation for systems of high synthetic importance.
Collapse
Affiliation(s)
- Toby Lewis-Atwell
- Department
of Chemistry, University of Bath, Claverton Down, Bath BA2
7AY, U.K.
- Department
of Computer Science, University of Bath, Claverton Down, Bath BA2
7AY, U.K.
| | - Daniel Beechey
- Department
of Computer Science, University of Bath, Claverton Down, Bath BA2
7AY, U.K.
| | - Özgür Şimşek
- Department
of Computer Science, University of Bath, Claverton Down, Bath BA2
7AY, U.K.
| | - Matthew N. Grayson
- Department
of Chemistry, University of Bath, Claverton Down, Bath BA2
7AY, U.K.
| |
Collapse
|
8
|
Huang B, von Rudorff GF, von Lilienfeld OA. The central role of density functional theory in the AI age. Science 2023; 381:170-175. [PMID: 37440654 DOI: 10.1126/science.abn3445] [Citation(s) in RCA: 13] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2023] [Accepted: 05/30/2023] [Indexed: 07/15/2023]
Abstract
Density functional theory (DFT) plays a pivotal role in chemical and materials science because of its relatively high predictive power, applicability, versatility, and computational efficiency. We review recent progress in machine learning (ML) model developments, which have relied heavily on DFT for synthetic data generation and for the design of model architectures. The general relevance of these developments is placed in a broader context for chemical and materials sciences. DFT-based ML models have reached high efficiency, accuracy, scalability, and transferability and pave the way to the routine use of successful experimental planning software within self-driving laboratories.
Collapse
Affiliation(s)
- Bing Huang
- University of Vienna, Faculty of Physics, AT1090 Wien, Austria
| | - Guido Falk von Rudorff
- University Kassel, Department of Chemistry, 34132 Kassel, Germany
- Center for Interdisciplinary Nanostructure Science and Technology (CINSaT), 34132 Kassel, Germany
| | - O Anatole von Lilienfeld
- Vector Institute for Artificial Intelligence, Toronto, Ontario M5S 1M1, Canada
- Department of Chemistry, University of Toronto, St. George Campus, Toronto, Ontario M5S 3H6, Canada
- Department of Materials Science and Engineering, University of Toronto, St. George Campus, Toronto, Ontario M5S 3E4, Canada
- Department of Physics, University of Toronto, St. George Campus, Toronto, Ontario M5S 1A7, Canada
- Machine Learning Group, Technische Universität Berlin and Berlin Institute for the Foundations of Learning and Data, 10587 Berlin, Germany
| |
Collapse
|
9
|
Transition1x - a dataset for building generalizable reactive machine learning potentials. Sci Data 2022; 9:779. [PMID: 36566281 PMCID: PMC9789978 DOI: 10.1038/s41597-022-01870-w] [Citation(s) in RCA: 15] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2022] [Accepted: 11/16/2022] [Indexed: 12/25/2022] Open
Abstract
Machine Learning (ML) models have, in contrast to their usefulness in molecular dynamics studies, had limited success as surrogate potentials for reaction barrier search. This is primarily because available datasets for training ML models on small molecular systems almost exclusively contain configurations at or near equilibrium. In this work, we present the dataset Transition1x containing 9.6 million Density Functional Theory (DFT) calculations of forces and energies of molecular configurations on and around reaction pathways at the ωB97x/6-31 G(d) level of theory. The data was generated by running Nudged Elastic Band (NEB) with DFT on 10k organic reactions of various types while saving intermediate calculations. We train equivariant graph message-passing neural network models on Transition1x and cross-validate on the popular ANI1x and QM9 datasets. We show that ML models cannot learn features in transition state regions solely by training on hitherto popular benchmark datasets. Transition1x is a new challenging benchmark that will provide an important step towards developing next-generation ML force fields that also work far away from equilibrium configurations and reactive systems.
Collapse
|
10
|
Heinen S, von Rudorff GF, von Lilienfeld OA. Transition state search and geometry relaxation throughout chemical compound space with quantum machine learning. J Chem Phys 2022; 157:221102. [PMID: 36546806 DOI: 10.1063/5.0112856] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022] Open
Abstract
We use energies and forces predicted within response operator based quantum machine learning (OQML) to perform geometry optimization and transition state search calculations with legacy optimizers but without the need for subsequent re-optimization with quantum chemistry methods. For randomly sampled initial coordinates of small organic query molecules, we report systematic improvement of equilibrium and transition state geometry output as training set sizes increase. Out-of-sample SN2 reactant complexes and transition state geometries have been predicted using the LBFGS and the QST2 algorithms with an root-mean-square deviation (RMSD) of 0.16 and 0.4 Å-after training on up to 200 reactant complex relaxations and transition state search trajectories from the QMrxn20 dataset, respectively. For geometry optimizations, we have also considered relaxation paths up to 5'595 constitutional isomers with sum formula C7H10O2 from the QM9-database. Using the resulting OQML models with an LBFGS optimizer reproduces the minimum geometry with an RMSD of 0.14 Å, only using ∼6000 training points obtained from normal mode sampling along the optimization paths of the training compounds without the need for active learning. For converged equilibrium and transition state geometries, subsequent vibrational normal mode frequency analysis indicates deviation from MP2 reference results by on average 14 and 26 cm-1, respectively. While the numerical cost for OQML predictions is negligible in comparison to density functional theory or MP2, the number of steps until convergence is typically larger in either case. The success rate for reaching convergence, however, improves systematically with training set size, underscoring OQML's potential for universal applicability.
Collapse
Affiliation(s)
- Stefan Heinen
- University of Vienna, Faculty of Physics, Kolingasse 14-16, AT-1090 Wien, Austria
| | | | | |
Collapse
|
11
|
Santra G, Calinsky R, Martin JML. Benefits of Range-Separated Hybrid and Double-Hybrid Functionals for a Large and Diverse Data Set of Reaction Energies and Barrier Heights. J Phys Chem A 2022; 126:5492-5505. [PMID: 35930677 PMCID: PMC9393870 DOI: 10.1021/acs.jpca.2c03922] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
![]()
To better understand the thermochemical kinetics and
mechanism
of a specific chemical reaction, an accurate estimation of barrier
heights (forward and reverse) and reaction energies is vital. Because
of the large size of reactants and transition state structures involved
in real-life mechanistic studies (e.g., enzymatically catalyzed reactions),
density functional theory remains the workhorse for such calculations.
In this paper, we have assessed the performance of 91 density functionals
for modeling the reaction energies and barrier heights on a large
and chemically diverse data set (BH9) composed of 449 organic chemistry
reactions. We have shown that range-separated hybrid functionals perform
better than the global hybrids for BH9 barrier heights and reaction
energies. Except for the PBE-based range-separated nonempirical double
hybrids, range separation of the exchange term helps improve the performance
for barrier heights and reaction energies. The 16-parameter Berkeley
double hybrid, ωB97M(2), performs remarkably well for both properties.
However, our minimally empirical range-separated double hybrid functionals
offer marginally better accuracy than ωB97M(2) for BH9 barrier
heights and reaction energies.
Collapse
Affiliation(s)
- Golokesh Santra
- Department of Molecular Chemistry and Materials Science, Weizmann Institute of Science, 7610001 Reḥovot, Israel
| | - Rivka Calinsky
- Department of Molecular Chemistry and Materials Science, Weizmann Institute of Science, 7610001 Reḥovot, Israel
| | - Jan M L Martin
- Department of Molecular Chemistry and Materials Science, Weizmann Institute of Science, 7610001 Reḥovot, Israel
| |
Collapse
|
12
|
Weinreich J, Lemm D, von Rudorff GF, von Lilienfeld OA. Ab initio machine learning of phase space averages. J Chem Phys 2022; 157:024303. [PMID: 35840379 DOI: 10.1063/5.0095674] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Equilibrium structures determine material properties and biochemical functions. We here propose to machine learn phase space averages, conventionally obtained by ab initio or force-field-based molecular dynamics (MD) or Monte Carlo (MC) simulations. In analogy to ab initio MD, our ab initio machine learning (AIML) model does not require bond topologies and, therefore, enables a general machine learning pathway to obtain ensemble properties throughout the chemical compound space. We demonstrate AIML for predicting Boltzmann averaged structures after training on hundreds of MD trajectories. The AIML output is subsequently used to train machine learning models of free energies of solvation using experimental data and to reach competitive prediction errors (mean absolute error ∼ 0.8 kcal/mol) for out-of-sample molecules-within milliseconds. As such, AIML effectively bypasses the need for MD or MC-based phase space sampling, enabling exploration campaigns of Boltzmann averages throughout the chemical compound space at a much accelerated pace. We contextualize our findings by comparison to state-of-the-art methods resulting in a Pareto plot for the free energy of solvation predictions in terms of accuracy and time.
Collapse
Affiliation(s)
- Jan Weinreich
- Faculty of Physics, University of Vienna, Kolingasse 14-16, AT-1090 Wien, Austria
| | - Dominik Lemm
- Faculty of Physics, University of Vienna, Kolingasse 14-16, AT-1090 Wien, Austria
| | | | | |
Collapse
|
13
|
Komp E, Valleau S. Low-cost prediction of molecular and transition state partition functions via machine learning. Chem Sci 2022; 13:7900-7906. [PMID: 35865893 PMCID: PMC9258343 DOI: 10.1039/d2sc01334g] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2022] [Accepted: 06/10/2022] [Indexed: 11/21/2022] Open
Abstract
We have generated an open-source dataset of over 30 000 organic chemistry gas phase partition functions. With this data, a machine learning deep neural network estimator was trained to predict partition functions of unknown organic chemistry gas phase transition states. This estimator only relies on reactant and product geometries and partition functions. A second machine learning deep neural network was trained to predict partition functions of chemical species from their geometry. Our models accurately predict the logarithm of test set partition functions with a maximum mean absolute error of 2.7%. Thus, this approach provides a means to reduce the cost of computing reaction rate constants ab initio. The models were also used to compute transition state theory reaction rate constant prefactors and the results were in quantitative agreement with the corresponding ab initio calculations with an accuracy of 98.3% on the log scale.
Collapse
Affiliation(s)
- Evan Komp
- Chemical Engineering, University of Washington 3781 Okanogan Ln Seattle WA 98195 USA
| | - Stéphanie Valleau
- Chemical Engineering, University of Washington 3781 Okanogan Ln Seattle WA 98195 USA
| |
Collapse
|
14
|
Spiekermann KA, Pattanaik L, Green WH. Fast Predictions of Reaction Barrier Heights: Toward Coupled-Cluster Accuracy. J Phys Chem A 2022; 126:3976-3986. [PMID: 35727075 DOI: 10.1021/acs.jpca.2c02614] [Citation(s) in RCA: 23] [Impact Index Per Article: 11.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Abstract
Quantitative estimates of reaction barriers are essential for developing kinetic mechanisms and predicting reaction outcomes. However, the lack of experimental data and the steep scaling of accurate quantum calculations often hinder the ability to obtain reliable kinetic values. Here, we train a directed message passing neural network on nearly 24,000 diverse gas-phase reactions calculated at CCSD(T)-F12a/cc-pVDZ-F12//ωB97X-D3/def2-TZVP. Our model uses 75% fewer parameters than previous studies, an improved reaction representation, and proper data splits to accurately estimate performance on unseen reactions. Using information from only the reactant and product, our model quickly predicts barrier heights with a testing MAE of 2.6 kcal mol-1 relative to the coupled-cluster data, making it more accurate than a good density functional theory calculation. Furthermore, our results show that future modeling efforts to estimate reaction properties would significantly benefit from fine-tuning calibration using a transfer learning technique. We anticipate this model will accelerate and improve kinetic predictions for small molecule chemistry.
Collapse
Affiliation(s)
- Kevin A Spiekermann
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Lagnajit Pattanaik
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - William H Green
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| |
Collapse
|
15
|
Prasad VK, Otero-de-la-Roza A, DiLabio GA. Small-Basis Set Density-Functional Theory Methods Corrected with Atom-Centered Potentials. J Chem Theory Comput 2022; 18:2913-2930. [PMID: 35412817 DOI: 10.1021/acs.jctc.2c00036] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Density functional theory (DFT) is currently the most popular method for modeling noncovalent interactions and thermochemistry. The accurate calculation of noncovalent interaction energies, reaction energies, and barrier heights requires choosing an appropriate functional and, typically, a relatively large basis set. Deficiencies of the density-functional approximation and the use of a limited basis set are the leading sources of error in the calculation of noncovalent and thermochemical properties in molecular systems. In this article, we present three new DFT methods based on the BLYP, M06-2X, and CAM-B3LYP functionals in combination with the 6-31G* basis set and corrected with atom-centered potentials (ACPs). ACPs are one-electron potentials that have the same form as effective-core potentials, except they do not replace any electrons. The ACPs developed in this work are used to generate energy corrections to the underlying DFT/basis-set method such that the errors in predicted chemical properties are minimized while maintaining the low computational cost of the parent methods. ACPs were developed for the elements H, B, C, N, O, F, Si, P, S, and Cl. The ACP parameters were determined using an extensive training set of 118655 data points, mostly of complete basis set coupled-cluster level quality. The target molecular properties for the ACP-corrected methods include noncovalent interaction energies, molecular conformational energies, reaction energies, barrier heights, and bond separation energies. The ACPs were tested first on the training set and then on a validation set of 42567 additional data points. We show that the ACP-corrected methods can predict the target molecular properties with accuracy close to complete basis set wavefunction theory methods, but at a computational cost of double-ζ DFT methods. This makes the new BLYP/6-31G*-ACP, M06-2X/6-31G*-ACP, and CAM-B3LYP/6-31G*-ACP methods uniquely suited to the calculation of noncovalent, thermochemical, and kinetic properties in large molecular systems.
Collapse
Affiliation(s)
- Viki Kumar Prasad
- Department of Chemistry, University of British Columbia, Okanagan, 3247 University Way, Kelowna, British Columbia V1V 1V7, Canada
| | - Alberto Otero-de-la-Roza
- Departamento de Química Física y Analítica, Facultad de Química, Universidad de Oviedo, MALTA Consolider Team, Oviedo E-33006, Spain
| | - Gino A DiLabio
- Department of Chemistry, University of British Columbia, Okanagan, 3247 University Way, Kelowna, British Columbia V1V 1V7, Canada
| |
Collapse
|
16
|
Stuyver T, Coley CW. Quantum chemistry-augmented neural networks for reactivity prediction: Performance, generalizability, and explainability. J Chem Phys 2022; 156:084104. [DOI: 10.1063/5.0079574] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023] Open
Abstract
There is a perceived dichotomy between structure-based and descriptor-based molecular representations used for predictive chemistry tasks. Here, we study the performance, generalizability, and explainability of the quantum mechanics-augmented graph neural network (ml-QM-GNN) architecture as applied to the prediction of regioselectivity (classification) and of activation energies (regression). In our hybrid QM-augmented model architecture, structure-based representations are first used to predict a set of atom- and bond-level reactivity descriptors derived from density functional theory calculations. These estimated reactivity descriptors are combined with the original structure-based representation to make the final reactivity prediction. We demonstrate that our model architecture leads to significant improvements over structure-based GNNs in not only overall accuracy but also in generalization to unseen compounds. Even when provided training sets of only a couple hundred labeled data points, the ml-QM-GNN outperforms other state-of-the-art structure-based architectures that have been applied to these tasks as well as descriptor-based (linear) regressions. As a primary contribution of this work, we demonstrate a bridge between data-driven predictions and conceptual frameworks commonly used to gain qualitative insights into reactivity phenomena, taking advantage of the fact that our models are grounded in (but not restricted to) QM descriptors. This effort results in a productive synergy between theory and data science, wherein QM-augmented models provide a data-driven confirmation of previous qualitative analyses, and these analyses in turn facilitate insights into the decision-making process occurring within ml-QM-GNNs.
Collapse
Affiliation(s)
- Thijs Stuyver
- Department of Chemical Engineering, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, Massachusetts 02139, USA
| | - Connor W. Coley
- Department of Chemical Engineering, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, Massachusetts 02139, USA
- Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, Massachusetts 02139, USA
| |
Collapse
|
17
|
Komp E, Janulaitis N, Valleau S. Progress towards machine learning reaction rate constants. Phys Chem Chem Phys 2021; 24:2692-2705. [PMID: 34935798 DOI: 10.1039/d1cp04422b] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/12/2023]
Abstract
Quantum and classical reaction rate constant calculations come at the cost of exploring potential energy surfaces. Due to the "curse of dimensionality", their evaluation quickly becomes unfeasible as the system size grows. Machine learning algorithms can accelerate the calculation of reaction rate constants by predicting them using low cost input features. In this perspective, we briefly introduce supervised machine learning algorithms in the context of reaction rate constant prediction. We discuss existing and recently created kinetic datasets and input feature representations as well as the use and design of machine learning algorithms to predict reaction rate constants or quantities required for their computation. Amongst these, we first describe the use of machine learning to predict activation, reaction, solvation and dissociation energies. We then look at the use of machine learning to predict reactive force field parameters, reaction rate constants as well as to help accelerate the search for minimum energy paths. Lastly, we provide an outlook on areas which have yet to be explored so as to improve and evaluate the use of machine learning algorithms for chemical reaction rate constants.
Collapse
Affiliation(s)
- Evan Komp
- Department of Chemical Engineering, University of Washington, Seattle, Washington 98195, USA.
| | - Nida Janulaitis
- Department of Chemical Engineering, University of Washington, Seattle, Washington 98195, USA.
| | - Stéphanie Valleau
- Department of Chemical Engineering, University of Washington, Seattle, Washington 98195, USA.
| |
Collapse
|
18
|
Prasad VK, Pei Z, Edelmann S, Otero-de-la-Roza A, DiLabio GA. BH9, a New Comprehensive Benchmark Data Set for Barrier Heights and Reaction Energies: Assessment of Density Functional Approximations and Basis Set Incompleteness Potentials. J Chem Theory Comput 2021; 18:151-166. [PMID: 34911294 DOI: 10.1021/acs.jctc.1c00694] [Citation(s) in RCA: 23] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023]
Abstract
The calculation of accurate reaction energies and barrier heights is essential in computational studies of reaction mechanisms and thermochemistry. To assess methods regarding their ability to predict these two properties, high-quality benchmark sets are required that comprise a reasonably large and diverse set of organic reactions. Due to the time-consuming nature of both locating transition states and computing accurate reference energies for reactions involving large molecules, previous benchmark sets have been limited in scope, the number of reactions considered, and the size of the reactant and product molecules. Recent advances in coupled-cluster theory, in particular local correlation methods like DLPNO-CCSD(T), now allow the calculation of reaction energies and barrier heights for relatively large systems. In this work, we present a comprehensive and diverse benchmark set of barrier heights and reaction energies based on DLPNO-CCSD(T)/CBS called BH9. BH9 comprises 449 chemical reactions belonging to nine types common in organic chemistry and biochemistry. We examine the accuracy of DLPNO-CCSD(T) vis-a-vis canonical CCSD(T) for a subset of BH9 and conclude that, although there is a penalty in using the DLPNO approximation, the reference data are accurate enough to serve as a benchmark for density functional theory (DFT) methods. We then present two applications of the BH9 set. First, we examine the performance of several density functional approximations commonly used in thermochemical and mechanistic studies. Second, we assess our basis set incompleteness potentials regarding their ability to mitigate basis set incompleteness errors. The number of data points, the diversity of the reactions considered, and the relatively large size of the reactant molecules make BH9 the most comprehensive thermochemical benchmark set to date and a useful tool for the development and assessment of computational methods.
Collapse
Affiliation(s)
- Viki Kumar Prasad
- Department of Chemistry, University of British Columbia, 3247 University Way, Kelowna, British Columbia, Canada V1V 1V7
| | - Zhipeng Pei
- Department of Chemistry, University of British Columbia, 3247 University Way, Kelowna, British Columbia, Canada V1V 1V7
| | - Simon Edelmann
- Department of Chemistry, University of British Columbia, 3247 University Way, Kelowna, British Columbia, Canada V1V 1V7
| | - Alberto Otero-de-la-Roza
- Departamento de Química Física y Analítica and MALTA Consolider Team, Facultad de Química, Universidad de Oviedo, 33006 Oviedo, Spain
| | - Gino A DiLabio
- Department of Chemistry, University of British Columbia, 3247 University Way, Kelowna, British Columbia, Canada V1V 1V7
| |
Collapse
|
19
|
Heid E, Green WH. Machine Learning of Reaction Properties via Learned Representations of the Condensed Graph of Reaction. J Chem Inf Model 2021; 62:2101-2110. [PMID: 34734699 PMCID: PMC9092344 DOI: 10.1021/acs.jcim.1c00975] [Citation(s) in RCA: 44] [Impact Index Per Article: 14.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
![]()
The estimation of
chemical reaction properties such as activation
energies, rates, or yields is a central topic of computational chemistry.
In contrast to molecular properties, where machine learning approaches
such as graph convolutional neural networks (GCNNs) have excelled
for a wide variety of tasks, no general and transferable adaptations
of GCNNs for reactions have been developed yet. We therefore combined
a popular cheminformatics reaction representation, the so-called condensed
graph of reaction (CGR), with a recent GCNN architecture to arrive
at a versatile, robust, and compact deep learning model. The CGR is
a superposition of the reactant and product graphs of a chemical reaction
and thus an ideal input for graph-based machine learning approaches.
The model learns to create a data-driven, task-dependent reaction
embedding that does not rely on expert knowledge, similar to current
molecular GCNNs. Our approach outperforms current state-of-the-art
models in accuracy, is applicable even to imbalanced reactions, and
possesses excellent predictive capabilities for diverse target properties,
such as activation energies, reaction enthalpies, rate constants,
yields, or reaction classes. We furthermore curated a large set of
atom-mapped reactions along with their target properties, which can
serve as benchmark data sets for future work. All data sets and the
developed reaction GCNN model are available online, free of charge,
and open source.
Collapse
Affiliation(s)
- Esther Heid
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - William H Green
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| |
Collapse
|
20
|
Okada H, Maeda S. A Dataset of Computational Reaction Barriers for the Claisen Rearrangement: Chemical and Numerical Analysis. Mol Inform 2021; 41:e2100216. [PMID: 34661976 DOI: 10.1002/minf.202100216] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2021] [Accepted: 09/29/2021] [Indexed: 01/24/2023]
Abstract
Theoretical reaction screening based on Gibbs energy barriers would be promising to accelerate chemical reactions mining. The number of quantum chemical calculations can be reduced by using an optimization algorithm such as genetic algorithm (GA) and Bayesian optimization (BO). The focus of this study is to generate a dataset of reaction barriers of size ∼100000. Such a dataset would be useful to quickly evaluate various implementations of an optimization algorithm such as GA and BO. The dataset includes Gibbs energy barriers of the Claisen rearrangement for ∼100000 molecules computed on the basis of a semiempirical theory PM7. After evaluating its chemical and numerical features, it is found that the dataset well reflects chemical trends of various substitutions and is useful in testing various implementations of an optimization algorithm. The dataset is available in the supplementary material of this paper.
Collapse
Affiliation(s)
- Hiroaki Okada
- Graduate School of Chemical Sciences and Engineering, Hokkaido University, Sapporo, Hokkaido 060-8628, Japan
| | - Satoshi Maeda
- Department of Chemistry, Graduate School of Science, Hokkaido University, Sapporo, Hokkaido 060-0810, Japan.,Institute for Chemical Reaction Design and Discovery (WPI-ICReDD), Hokkaido University, Sapporo, Hokkaido 001-0021, Japan.,ERATO Maeda Artificial Intelligence for Chemical Reaction Design and Discovery Project, Hokkaido University, Sapporo, Hokkaido 060-0810, Japan.,Research and Services Division of Materials Data and Integrated System (MaDIS), National Institute for Materials Science (NIMS), Tsukuba, Ibaraki 305-0044, Japan
| |
Collapse
|
21
|
Abstract
Chemical compound space (CCS), the set of all theoretically conceivable combinations of chemical elements and (meta-)stable geometries that make up matter, is colossal. The first-principles based virtual sampling of this space, for example, in search of novel molecules or materials which exhibit desirable properties, is therefore prohibitive for all but the smallest subsets and simplest properties. We review studies aimed at tackling this challenge using modern machine learning techniques based on (i) synthetic data, typically generated using quantum mechanics based methods, and (ii) model architectures inspired by quantum mechanics. Such Quantum mechanics based Machine Learning (QML) approaches combine the numerical efficiency of statistical surrogate models with an ab initio view on matter. They rigorously reflect the underlying physics in order to reach universality and transferability across CCS. While state-of-the-art approximations to quantum problems impose severe computational bottlenecks, recent QML based developments indicate the possibility of substantial acceleration without sacrificing the predictive power of quantum mechanics.
Collapse
Affiliation(s)
- Bing Huang
- Faculty
of Physics, University of Vienna, 1090 Vienna, Austria
| | - O. Anatole von Lilienfeld
- Faculty
of Physics, University of Vienna, 1090 Vienna, Austria
- Institute
of Physical Chemistry and National Center for Computational Design
and Discovery of Novel Materials (MARVEL), Department of Chemistry, University of Basel, 4056 Basel, Switzerland
| |
Collapse
|
22
|
Heinen S, von Rudorff GF, von Lilienfeld OA. Toward the design of chemical reactions: Machine learning barriers of competing mechanisms in reactant space. J Chem Phys 2021; 155:064105. [PMID: 34391351 DOI: 10.1063/5.0059742] [Citation(s) in RCA: 31] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023] Open
Abstract
The interplay of kinetics and thermodynamics governs reactive processes, and their control is key in synthesis efforts. While sophisticated numerical methods for studying equilibrium states have well advanced, quantitative predictions of kinetic behavior remain challenging. We introduce a reactant-to-barrier (R2B) machine learning model that rapidly and accurately infers activation energies and transition state geometries throughout the chemical compound space. R2B exhibits improving accuracy as training set sizes grow and requires as input solely the molecular graph of the reactant and the information of the reaction type. We provide numerical evidence for the applicability of R2B for two competing text-book reactions relevant to organic synthesis, E2 and SN2, trained and tested on chemically diverse quantum data from the literature. After training on 1-1.8k examples, R2B predicts activation energies on average within less than 2.5 kcal/mol with respect to the coupled-cluster singles doubles reference within milliseconds. Principal component analysis of kernel matrices reveals the hierarchy of the multiple scales underpinning reactivity in chemical space: Nucleophiles and leaving groups, substituents, and pairwise substituent combinations correspond to systematic lowering of eigenvalues. Analysis of R2B based predictions of ∼11.5k E2 and SN2 barriers in the gas-phase for previously undocumented reactants indicates that on average, E2 is favored in 75% of all cases and that SN2 becomes likely for chlorine as nucleophile/leaving group and for substituents consisting of hydrogen or electron-withdrawing groups. Experimental reaction design from first principles is enabled due to R2B, which is demonstrated by the construction of decision trees. Numerical R2B based results for interatomic distances and angles of reactant and transition state geometries suggest that Hammond's postulate is applicable to SN2, but not to E2.
Collapse
Affiliation(s)
- Stefan Heinen
- Faculty of Physics, University of Vienna, Kolingasse 14-16, AT-1090 Wien, Austria
| | | | | |
Collapse
|
23
|
Lemm D, von Rudorff GF, von Lilienfeld OA. Machine learning based energy-free structure predictions of molecules, transition states, and solids. Nat Commun 2021; 12:4468. [PMID: 34294693 PMCID: PMC8298673 DOI: 10.1038/s41467-021-24525-7] [Citation(s) in RCA: 31] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2021] [Accepted: 06/22/2021] [Indexed: 02/06/2023] Open
Abstract
The computational prediction of atomistic structure is a long-standing problem in physics, chemistry, materials, and biology. Conventionally, force-fields or ab initio methods determine structure through energy minimization, which is either approximate or computationally demanding. This accuracy/cost trade-off prohibits the generation of synthetic big data sets accounting for chemical space with atomistic detail. Exploiting implicit correlations among relaxed structures in training data sets, our machine learning model Graph-To-Structure (G2S) generalizes across compound space in order to infer interatomic distances for out-of-sample compounds, effectively enabling the direct reconstruction of coordinates, and thereby bypassing the conventional energy optimization task. The numerical evidence collected includes 3D coordinate predictions for organic molecules, transition states, and crystalline solids. G2S improves systematically with training set size, reaching mean absolute interatomic distance prediction errors of less than 0.2 Å for less than eight thousand training structures - on par or better than conventional structure generators. Applicability tests of G2S include successful predictions for systems which typically require manual intervention, improved initial guesses for subsequent conventional ab initio based relaxation, and input generation for subsequent use of structure based quantum machine learning models.
Collapse
Affiliation(s)
- Dominik Lemm
- Faculty of Physics, University of Vienna, Vienna, Austria
| | | | - O Anatole von Lilienfeld
- Faculty of Physics, University of Vienna, Vienna, Austria.
- Institute of Physical Chemistry and National Center for Computational Design and Discovery of Novel Materials (MARVEL), Department of Chemistry, University of Basel, Basel, Switzerland.
| |
Collapse
|
24
|
Stuyver T, Shaik S. Resolving Entangled Reactivity Modes through External Electric Fields and Substitution: Application to E 2/S N2 Reactions. J Org Chem 2021; 86:9030-9039. [PMID: 34152765 DOI: 10.1021/acs.joc.1c01010] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
In this study, we explore strategies to resolve entangled reactivity modes. More specifically, we consider the competition between SN2 and E2 reaction pathways for alkyl halides and nucleophiles/bases. We first demonstrate that the emergence of an E2-preference is associated with an enhancement of the magnitude of the resonance stabilization in the transition-state (TS) region, resulting from the improved mixing of electrostatically stabilized valence bond structures into the TS wavefunction. Subsequently, we show that the TS resonance energy can be tuned selectively and rationally either through the application of an oriented external electric field directed along the C-C axis of the alkyl halide or through a regular substitution approach of the C-C moiety. We end our study by demonstrating that the insights gained from our analysis enable one to rationalize the main reactivity trends emerging from a recently published large database of competing SN2 and E2 reaction pathways.
Collapse
Affiliation(s)
- Thijs Stuyver
- Department of Organic Chemistry, The Hebrew University, Jerusalem 91904, Israel
| | - Sason Shaik
- Department of Organic Chemistry, The Hebrew University, Jerusalem 91904, Israel
| |
Collapse
|
25
|
Schwaller P, Vaucher AC, Laino T, Reymond JL. Prediction of chemical reaction yields using deep learning. MACHINE LEARNING-SCIENCE AND TECHNOLOGY 2021. [DOI: 10.1088/2632-2153/abc81d] [Citation(s) in RCA: 41] [Impact Index Per Article: 13.7] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
|
26
|
Jorner K, Brinck T, Norrby PO, Buttar D. Machine learning meets mechanistic modelling for accurate prediction of experimental activation energies. Chem Sci 2021; 12:1163-1175. [PMID: 36299676 PMCID: PMC9528810 DOI: 10.1039/d0sc04896h] [Citation(s) in RCA: 71] [Impact Index Per Article: 23.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2020] [Accepted: 11/02/2020] [Indexed: 12/19/2022] Open
Abstract
Accurate prediction of chemical reactions in solution is challenging for current state-of-the-art approaches based on transition state modelling with density functional theory. Models based on machine learning have emerged as a promising alternative to address these problems, but these models currently lack the precision to give crucial information on the magnitude of barrier heights, influence of solvents and catalysts and extent of regio- and chemoselectivity. Here, we construct hybrid models which combine the traditional transition state modelling and machine learning to accurately predict reaction barriers. We train a Gaussian Process Regression model to reproduce high-quality experimental kinetic data for the nucleophilic aromatic substitution reaction and use it to predict barriers with a mean absolute error of 0.77 kcal mol−1 for an external test set. The model was further validated on regio- and chemoselectivity prediction on patent reaction data and achieved a competitive top-1 accuracy of 86%, despite not being trained explicitly for this task. Importantly, the model gives error bars for its predictions that can be used for risk assessment by the end user. Hybrid models emerge as the preferred alternative for accurate reaction prediction in the very common low-data situation where only 100–150 rate constants are available for a reaction class. With recent advances in deep learning for quickly predicting barriers and transition state geometries from density functional theory, we envision that hybrid models will soon become a standard alternative to complement current machine learning approaches based on ground-state physical organic descriptors or structural information such as molecular graphs or fingerprints. Hybrid reactivity models, combining mechanistic calculations and machine learning with descriptors, are used to predict barriers for nucleophilic aromatic substitution.![]()
Collapse
Affiliation(s)
- Kjell Jorner
- Early Chemical Development
- Pharmaceutical Sciences
- R&D
- AstraZeneca
- Macclesfield
| | - Tore Brinck
- Applied Physical Chemistry
- Department of Chemistry
- CBH
- KTH Royal Institute of Technology
- Stockholm
| | - Per-Ola Norrby
- Data Science & Modelling
- Pharmaceutical Sciences
- R&D
- AstraZeneca
- Gothenburg
| | - David Buttar
- Early Chemical Development
- Pharmaceutical Sciences
- R&D
- AstraZeneca
- Macclesfield
| |
Collapse
|