1
|
van Gerwen P, Briling KR, Bunne C, Somnath VR, Laplaza R, Krause A, Corminboeuf C. 3DReact: Geometric Deep Learning for Chemical Reactions. J Chem Inf Model 2024; 64:5771-5785. [PMID: 39007724 PMCID: PMC11323278 DOI: 10.1021/acs.jcim.4c00104] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2024] [Revised: 07/03/2024] [Accepted: 07/08/2024] [Indexed: 07/16/2024]
Abstract
Geometric deep learning models, which incorporate the relevant molecular symmetries within the neural network architecture, have considerably improved the accuracy and data efficiency of predictions of molecular properties. Building on this success, we introduce 3DReact, a geometric deep learning model to predict reaction properties from three-dimensional structures of reactants and products. We demonstrate that the invariant version of the model is sufficient for existing reaction data sets. We illustrate its competitive performance on the prediction of activation barriers on the GDB7-22-TS, Cyclo-23-TS, and Proparg-21-TS data sets in different atom-mapping regimes. We show that, compared to existing models for reaction property prediction, 3DReact offers a flexible framework that exploits atom-mapping information, if available, as well as geometries of reactants and products (in an invariant or equivariant fashion). Accordingly, it performs systematically well across different data sets, atom-mapping regimes, as well as both interpolation and extrapolation tasks.
Collapse
Affiliation(s)
- Puck van Gerwen
- Laboratory
for Computational Molecular Design, Institute of Chemical Sciences
and Engineering, École Polytechnique
Fédérale de Lausanne, 1015 Lausanne, Switzerland
- National
Center for Competence in Research − Catalysis (NCCR-Catalysis), École Polytechnique Fédérale
de Lausanne, 1015 Lausanne, Switzerland
| | - Ksenia R. Briling
- Laboratory
for Computational Molecular Design, Institute of Chemical Sciences
and Engineering, École Polytechnique
Fédérale de Lausanne, 1015 Lausanne, Switzerland
| | - Charlotte Bunne
- National
Center for Competence in Research − Catalysis (NCCR-Catalysis), École Polytechnique Fédérale
de Lausanne, 1015 Lausanne, Switzerland
- Learning
& Adaptive Systems Group, Department of Computer Science, ETH Zurich, 8092 Zurich, Switzerland
| | - Vignesh Ram Somnath
- National
Center for Competence in Research − Catalysis (NCCR-Catalysis), École Polytechnique Fédérale
de Lausanne, 1015 Lausanne, Switzerland
- Learning
& Adaptive Systems Group, Department of Computer Science, ETH Zurich, 8092 Zurich, Switzerland
| | - Ruben Laplaza
- Laboratory
for Computational Molecular Design, Institute of Chemical Sciences
and Engineering, École Polytechnique
Fédérale de Lausanne, 1015 Lausanne, Switzerland
- National
Center for Competence in Research − Catalysis (NCCR-Catalysis), École Polytechnique Fédérale
de Lausanne, 1015 Lausanne, Switzerland
| | - Andreas Krause
- National
Center for Competence in Research − Catalysis (NCCR-Catalysis), École Polytechnique Fédérale
de Lausanne, 1015 Lausanne, Switzerland
- Learning
& Adaptive Systems Group, Department of Computer Science, ETH Zurich, 8092 Zurich, Switzerland
| | - Clemence Corminboeuf
- Laboratory
for Computational Molecular Design, Institute of Chemical Sciences
and Engineering, École Polytechnique
Fédérale de Lausanne, 1015 Lausanne, Switzerland
- National
Center for Competence in Research − Catalysis (NCCR-Catalysis), École Polytechnique Fédérale
de Lausanne, 1015 Lausanne, Switzerland
| |
Collapse
|
2
|
van Gerwen P, Briling KR, Calvino Alonso Y, Franke M, Corminboeuf C. Benchmarking machine-readable vectors of chemical reactions on computed activation barriers. DIGITAL DISCOVERY 2024; 3:932-943. [PMID: 38756222 PMCID: PMC11094696 DOI: 10.1039/d3dd00175j] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/06/2023] [Accepted: 02/28/2024] [Indexed: 05/18/2024]
Abstract
In recent years, there has been a surge of interest in predicting computed activation barriers, to enable the acceleration of the automated exploration of reaction networks. Consequently, various predictive approaches have emerged, ranging from graph-based models to methods based on the three-dimensional structure of reactants and products. In tandem, many representations have been developed to predict experimental targets, which may hold promise for barrier prediction as well. Here, we bring together all of these efforts and benchmark various methods (Morgan fingerprints, the DRFP, the CGR representation-based Chemprop, SLATMd, B2Rl2, EquiReact and language model BERT + RXNFP) for the prediction of computed activation barriers on three diverse datasets.
Collapse
Affiliation(s)
- Puck van Gerwen
- Laboratory for Computational Molecular Design, Institute of Chemical Sciences and Engineering, École Polytechnique Fédérale de Lausanne 1015 Lausanne Switzerland
- National Center for Competence in Research-Catalysis (NCCR-Catalysis), École Polytechnique Fédérale de Lausanne 1015 Lausanne Switzerland
| | - Ksenia R Briling
- Laboratory for Computational Molecular Design, Institute of Chemical Sciences and Engineering, École Polytechnique Fédérale de Lausanne 1015 Lausanne Switzerland
| | - Yannick Calvino Alonso
- Laboratory for Computational Molecular Design, Institute of Chemical Sciences and Engineering, École Polytechnique Fédérale de Lausanne 1015 Lausanne Switzerland
| | - Malte Franke
- Laboratory for Computational Molecular Design, Institute of Chemical Sciences and Engineering, École Polytechnique Fédérale de Lausanne 1015 Lausanne Switzerland
| | - Clemence Corminboeuf
- Laboratory for Computational Molecular Design, Institute of Chemical Sciences and Engineering, École Polytechnique Fédérale de Lausanne 1015 Lausanne Switzerland
- National Center for Competence in Research-Catalysis (NCCR-Catalysis), École Polytechnique Fédérale de Lausanne 1015 Lausanne Switzerland
| |
Collapse
|
3
|
Ding Y, Qiang B, Chen Q, Liu Y, Zhang L, Liu Z. Exploring Chemical Reaction Space with Machine Learning Models: Representation and Feature Perspective. J Chem Inf Model 2024; 64:2955-2970. [PMID: 38489239 DOI: 10.1021/acs.jcim.4c00004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/17/2024]
Abstract
Chemical reactions serve as foundational building blocks for organic chemistry and drug design. In the era of large AI models, data-driven approaches have emerged to innovate the design of novel reactions, optimize existing ones for higher yields, and discover new pathways for synthesizing chemical structures comprehensively. To effectively address these challenges with machine learning models, it is imperative to derive robust and informative representations or engage in feature engineering using extensive data sets of reactions. This work aims to provide a comprehensive review of established reaction featurization approaches, offering insights into the selection of representations and the design of features for a wide array of tasks. The advantages and limitations of employing SMILES, molecular fingerprints, molecular graphs, and physics-based properties are meticulously elaborated. Solutions to bridge the gap between different representations will also be critically evaluated. Additionally, we introduce a new frontier in chemical reaction pretraining, holding promise as an innovative yet unexplored avenue.
Collapse
Affiliation(s)
- Yuheng Ding
- Department of Pharmaceutical Science, Peking University, Beijing 100191, China
| | - Bo Qiang
- Department of Pharmaceutical Science, Peking University, Beijing 100191, China
| | - Qixuan Chen
- Department of Pharmaceutical Science, Peking University, Beijing 100191, China
| | - Yiqiao Liu
- Department of Pharmaceutical Science, Peking University, Beijing 100191, China
| | - Liangren Zhang
- Department of Pharmaceutical Science, Peking University, Beijing 100191, China
| | - Zhenming Liu
- Department of Pharmaceutical Science, Peking University, Beijing 100191, China
| |
Collapse
|
4
|
Vadaddi SM, Zhao Q, Savoie BM. Graph to Activation Energy Models Easily Reach Irreducible Errors but Show Limited Transferability. J Phys Chem A 2024; 128:2543-2555. [PMID: 38517281 DOI: 10.1021/acs.jpca.3c07240] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/23/2024]
Abstract
Activation energy characterization of competing reactions is a costly but crucial step for understanding the kinetic relevance of distinct reaction pathways, product yields, and myriad other properties of reacting systems. The standard methodology for activation energy characterization has historically been a transition state search using the highest level of theory that can be afforded. However, recently, several groups have popularized the idea of predicting activation energies directly based on nothing more than the reactant and product graphs, a sufficiently complex neural network, and a broad enough data set. Here, we have revisited this task using the recently developed Reaction Graph Depth 1 (RGD1) transition state data set and several newly developed graph attention architectures. All of these new architectures achieve similar state-of-the-art results of ∼4 kcal/mol mean absolute error on withheld testing sets of reactions but poor performance on external testing sets composed of reactions with differing mechanisms, reaction molecularity, or reactant size distribution. Limited transferability is also shown to be shared by other contemporary graph to activation energy architectures through a series of case studies. We conclude that an array of standard graph architectures can already achieve results comparable to the irreducible error of available reaction data sets but that out-of-distribution performance remains poor.
Collapse
Affiliation(s)
- Sai Mahit Vadaddi
- Davidson School of Chemical Engineering, Purdue University, West Lafayette, Indiana 47906, United States
| | - Qiyuan Zhao
- Department of Medicinal Chemistry, University of Michigan, Ann Arbor, Michigan 48109, United States
| | - Brett M Savoie
- Davidson School of Chemical Engineering, Purdue University, West Lafayette, Indiana 47906, United States
| |
Collapse
|
5
|
Kirkland JK, Kumawat J, Shaban Tameh M, Tolman T, Lambert AC, Lief GR, Yang Q, Ess DH. Machine Learning Models for Predicting Zirconocene Properties and Barriers. J Chem Inf Model 2024; 64:775-784. [PMID: 38259142 DOI: 10.1021/acs.jcim.3c01575] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/24/2024]
Abstract
Zr metallocenes have significant potential to be highly tunable polyethylene catalysts through modification of the aromatic ligand framework. Here we report the development of multiple machine learning models using a large library (>700 systems) of DFT-calculated zirconocene properties and barriers for ethylene polymerization. We show that very accurate machine learning models are possible for HOMO-LUMO gaps of precatalysts but the performance significantly depends on the machine learning algorithm and type of featurization, such as fingerprints, Coulomb matrices, smooth overlap of atomic positions, or persistence images. Surprisingly, the description of the bonding hapticity, the number of direct connections between Zr and the ligand aromatic carbons, only has a moderate influence on the performance of most models. Despite robust models for HOMO-LUMO gaps, these types of machine learning models based on structure connectivity type features perform poorly in predicting ethylene migratory insertion barrier heights. Therefore, we developed several relatively robust and accurate machine learning models for barrier heights that are based on quantum-chemical descriptors (QCDs). The quantitative accuracy of these models depends on which potential energy surface structure QCDs were harvested from. This revealed a Hammett-type principle to naturally emerge showing that QCDs from the π-coordination complexes provide much better descriptions of the transition states than other potential-energy structures. Feature importance analysis of the QCDs provides several fundamental principles that influence zirconocene catalyst reactivity.
Collapse
Affiliation(s)
- Justin K Kirkland
- Department of Chemistry and Biochemistry, Brigham Young University, Provo, Utah 84604, United States
| | - Jugal Kumawat
- Department of Chemistry and Biochemistry, Brigham Young University, Provo, Utah 84604, United States
| | - Maliheh Shaban Tameh
- Department of Chemistry and Biochemistry, Brigham Young University, Provo, Utah 84604, United States
| | - Tyson Tolman
- Department of Chemistry and Biochemistry, Brigham Young University, Provo, Utah 84604, United States
| | - Allison C Lambert
- Department of Chemistry and Biochemistry, Brigham Young University, Provo, Utah 84604, United States
| | - Graham R Lief
- Research and Technology, Chevron Phillips Chemical Company, Highways 60 & 123, Bartlesville, Oklahoma 74003, United States
| | - Qing Yang
- Research and Technology, Chevron Phillips Chemical Company, Highways 60 & 123, Bartlesville, Oklahoma 74003, United States
| | - Daniel H Ess
- Department of Chemistry and Biochemistry, Brigham Young University, Provo, Utah 84604, United States
| |
Collapse
|
6
|
Kim S, Woo J, Kim WY. Diffusion-based generative AI for exploring transition states from 2D molecular graphs. Nat Commun 2024; 15:341. [PMID: 38184661 PMCID: PMC10771475 DOI: 10.1038/s41467-023-44629-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2023] [Accepted: 12/21/2023] [Indexed: 01/08/2024] Open
Abstract
The exploration of transition state (TS) geometries is crucial for elucidating chemical reaction mechanisms and modeling their kinetics. Recently, machine learning (ML) models have shown remarkable performance for prediction of TS geometries. However, they require 3D conformations of reactants and products often with their appropriate orientations as input, which demands substantial efforts and computational cost. Here, we propose a generative approach based on the stochastic diffusion method, namely TSDiff, for prediction of TS geometries just from 2D molecular graphs. TSDiff outperforms the existing ML models with 3D geometries in terms of both accuracy and efficiency. Moreover, it enables to sample various TS conformations, because it learns the distribution of TS geometries for diverse reactions in training. Thus, TSDiff finds more favorable reaction pathways with lower barrier heights than those in the reference database. These results demonstrate that TSDiff shows promising potential for an efficient and reliable TS exploration.
Collapse
Affiliation(s)
- Seonghwan Kim
- Department of Chemistry, KAIST, 291 Daehak-ro, Yuseong-gu, 34141, Daejeon, Republic of Korea
| | - Jeheon Woo
- Department of Chemistry, KAIST, 291 Daehak-ro, Yuseong-gu, 34141, Daejeon, Republic of Korea
| | - Woo Youn Kim
- Department of Chemistry, KAIST, 291 Daehak-ro, Yuseong-gu, 34141, Daejeon, Republic of Korea.
- AI Institute, KAIST, 291 Daehak-ro, Yuseong-gu, 34141, Daejeon, Republic of Korea.
| |
Collapse
|
7
|
Lewis-Atwell T, Beechey D, Şimşek Ö, Grayson MN. Reformulating Reactivity Design for Data-Efficient Machine Learning. ACS Catal 2023; 13:13506-13515. [PMID: 37881791 PMCID: PMC10594582 DOI: 10.1021/acscatal.3c02513] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2023] [Revised: 08/24/2023] [Indexed: 10/27/2023]
Abstract
Machine learning (ML) can deliver rapid and accurate reaction barrier predictions for use in rational reactivity design. However, model training requires large data sets of typically thousands or tens of thousands of barriers that are very expensive to obtain computationally or experimentally. Furthermore, bespoke data sets are required for each region of interest in reaction space as models typically struggle to generalize. We have therefore reformulated the ML barrier prediction problem toward a much more data-efficient process: finding a reaction from a prespecified set with a desired target value. Our reformulation enables the rapid selection of reactions with purpose-specific activation barriers, for example, in the design of reactivity and selectivity in synthesis, catalyst design, toxicology, and covalent drug discovery, requiring just tens of accurately measured barriers. Importantly, our reformulation does not require generalization beyond the domain of the data set at hand, and we show excellent results for the highly toxicologically and synthetically relevant data sets of aza-Michael addition and transition-metal-catalyzed dihydrogen activation, typically requiring less than 20 accurately measured density functional theory (DFT) barriers. Even for incomplete data sets of E2 and SN2 reactions, with high numbers of missing barriers (74% and 56% respectively), our chosen ML search method still requires significantly fewer data points than the hundreds or thousands needed for more conventional uses of ML to predict activation barriers. Finally, we include a case study in which we use our process to guide the optimization of the dihydrogen activation catalyst. Our approach was able to identify a reaction within 1 kcal mol-1 of the target barrier by only having to run 12 DFT reaction barrier calculations, which illustrates the usage and real-world applicability of this reformulation for systems of high synthetic importance.
Collapse
Affiliation(s)
- Toby Lewis-Atwell
- Department
of Chemistry, University of Bath, Claverton Down, Bath BA2
7AY, U.K.
- Department
of Computer Science, University of Bath, Claverton Down, Bath BA2
7AY, U.K.
| | - Daniel Beechey
- Department
of Computer Science, University of Bath, Claverton Down, Bath BA2
7AY, U.K.
| | - Özgür Şimşek
- Department
of Computer Science, University of Bath, Claverton Down, Bath BA2
7AY, U.K.
| | - Matthew N. Grayson
- Department
of Chemistry, University of Bath, Claverton Down, Bath BA2
7AY, U.K.
| |
Collapse
|
8
|
García-Andrade X, García Tahoces P, Pérez-Ríos J, Martínez Núñez E. Barrier Height Prediction by Machine Learning Correction of Semiempirical Calculations. J Phys Chem A 2023; 127:2274-2283. [PMID: 36877614 PMCID: PMC10845151 DOI: 10.1021/acs.jpca.2c08340] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2022] [Revised: 02/19/2023] [Indexed: 03/07/2023]
Abstract
Different machine learning (ML) models are proposed in the present work to predict density functional theory-quality barrier heights (BHs) from semiempirical quantum mechanical (SQM) calculations. The ML models include a multitask deep neural network, gradient-boosted trees by means of the XGBoost interface, and Gaussian process regression. The obtained mean absolute errors are similar to those of previous models considering the same number of data points. The ML corrections proposed in this paper could be useful for rapid screening of the large reaction networks that appear in combustion chemistry or in astrochemistry. Finally, our results show that 70% of the features with the highest impact on model output are bespoke predictors. This custom-made set of predictors could be employed by future Δ-ML models to improve the quantitative prediction of other reaction properties.
Collapse
Affiliation(s)
| | - Pablo García Tahoces
- Department
of Electronics and Computer Science, University
of Santiago de Compostela, Santiago de Compostela 15782, Spain
| | - Jesús Pérez-Ríos
- Department
of Physics, Stony Brook University, Stony Brook, New York 11794, United States
- Institute
for Advanced Computational Science, Stony
Brook University, Stony
Brook, New York 11794-3800, United States
| | - Emilio Martínez Núñez
- Department
of Physical Chemistry, University of Santiago
de Compostela, Santiago
de Compostela 15782, Spain
| |
Collapse
|
9
|
Marques E, de Gendt S, Pourtois G, van Setten MJ. Improving Accuracy and Transferability of Machine Learning Chemical Activation Energies by Adding Electronic Structure Information. J Chem Inf Model 2023; 63:1454-1461. [PMID: 36864757 DOI: 10.1021/acs.jcim.2c01502] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/04/2023]
Abstract
Predicting chemical activation energies is one of the longstanding and important challenges in computational chemistry. Recent advances have shown that machine learning can be used to create tools to predict them. Such tools can significantly decrease the computational cost for these predictions compared to traditional methods, which require an optimal path search along a high-dimensional potential energy surface. To enable this new route, we need both large and accurate datasets and a compact yet complete description of the reactions. Although data for chemical reactions is becoming increasingly available, the key step of encoding the reaction as an efficient descriptor remains a big challenge. In this paper, we demonstrate that including electronic energy levels in the description of the reaction significantly improves the prediction accuracy and transferability. Feature importance analysis further demonstrates that electronic energy levels have a higher importance than some structural information and typically require less space in the reaction encoding vector. In general, we observe that the results of the feature importance analysis relate well to the domain knowledge of fundamental chemical principles. This work can help to build better chemical reaction encodings for machine learning and thus improve the predictions of machine learning models for reaction activation energies. These models could ultimately be used to recognize reaction limiting steps in large reaction systems, allowing to account for bottlenecks at the design stage.
Collapse
Affiliation(s)
- Esteban Marques
- Department of Chemistry, KU Leuven (University of Leuven), Celestijnenlaan 200 F, Heverlee 3001, Belgium.,IMEC, Kapeldreef 75, Leuven 3001, Belgium
| | - Stefan de Gendt
- Department of Chemistry, KU Leuven (University of Leuven), Celestijnenlaan 200 F, Heverlee 3001, Belgium.,IMEC, Kapeldreef 75, Leuven 3001, Belgium
| | - Geoffrey Pourtois
- IMEC, Kapeldreef 75, Leuven 3001, Belgium.,Department of Chemistry, University of Antwerp, Campus Drie Eiken, Universiteitsplein 1, Wilrijk 2610, Belgium
| | - Michiel J van Setten
- IMEC, Kapeldreef 75, Leuven 3001, Belgium.,ETSF European Theoretical Spectroscopy Facility, Institut de Physique, Université de Liège, Allée du 6 août 17, Liège 4000, Belgium
| |
Collapse
|
10
|
Choi S. Prediction of transition state structures of gas-phase chemical reactions via machine learning. Nat Commun 2023; 14:1168. [PMID: 36859495 PMCID: PMC9977841 DOI: 10.1038/s41467-023-36823-3] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2022] [Accepted: 02/15/2023] [Indexed: 03/03/2023] Open
Abstract
The elucidation of transition state (TS) structures is essential for understanding the mechanisms of chemical reactions and exploring reaction networks. Despite significant advances in computational approaches, TS searching remains a challenging problem owing to the difficulty of constructing an initial structure and heavy computational costs. In this paper, a machine learning (ML) model for predicting the TS structures of general organic reactions is proposed. The proposed model derives the interatomic distances of a TS structure from atomic pair features reflecting reactant, product, and linearly interpolated structures. The model exhibits excellent accuracy, particularly for atomic pairs in which bond formation or breakage occurs. The predicted TS structures yield a high success ratio (93.8%) for quantum chemical saddle point optimizations, and 88.8% of the optimization results have energy errors of less than 0.1 kcal mol-1. Additionally, as a proof of concept, the exploration of multiple reaction paths of an organic reaction is demonstrated based on ML inferences. I envision that the proposed approach will aid in the construction of initial geometries for TS optimization and reaction path exploration.
Collapse
Affiliation(s)
- Sunghwan Choi
- Division of National Supercomputing, Korea Institute of Science and Technology Information, 245 Daehak-ro, Yuseong-gu, 34141, Daejeon, Republic of Korea.
| |
Collapse
|
11
|
Chen Y, Ou Y, Zheng P, Huang Y, Ge F, Dral PO. Benchmark of general-purpose machine learning-based quantum mechanical method AIQM1 on reaction barrier heights. J Chem Phys 2023; 158:074103. [PMID: 36813722 DOI: 10.1063/5.0137101] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023] Open
Abstract
Artificial intelligence-enhanced quantum mechanical method 1 (AIQM1) is a general-purpose method that was shown to achieve high accuracy for many applications with a speed close to its baseline semiempirical quantum mechanical (SQM) method ODM2*. Here, we evaluate the hitherto unknown performance of out-of-the-box AIQM1 without any refitting for reaction barrier heights on eight datasets, including a total of ∼24 thousand reactions. This evaluation shows that AIQM1's accuracy strongly depends on the type of transition state and ranges from excellent for rotation barriers to poor for, e.g., pericyclic reactions. AIQM1 clearly outperforms its baseline ODM2* method and, even more so, a popular universal potential, ANI-1ccx. Overall, however, AIQM1 accuracy largely remains similar to SQM methods (and B3LYP/6-31G* for most reaction types) suggesting that it is desirable to focus on improving AIQM1 performance for barrier heights in the future. We also show that the built-in uncertainty quantification helps in identifying confident predictions. The accuracy of confident AIQM1 predictions is approaching the level of popular density functional theory methods for most reaction types. Encouragingly, AIQM1 is rather robust for transition state optimizations, even for the type of reactions it struggles with the most. Single-point calculations with high-level methods on AIQM1-optimized geometries can be used to significantly improve barrier heights, which cannot be said for its baseline ODM2* method.
Collapse
Affiliation(s)
- Yuxinxin Chen
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, Department of Chemistry, and College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, China
| | - Yanchi Ou
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, Department of Chemistry, and College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, China
| | - Peikun Zheng
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, Department of Chemistry, and College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, China
| | - Yaohuang Huang
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, Department of Chemistry, and College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, China
| | - Fuchun Ge
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, Department of Chemistry, and College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, China
| | - Pavlo O Dral
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, Department of Chemistry, and College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, China
| |
Collapse
|
12
|
Ismail I, Chantreau Majerus R, Habershon S. Graph-Driven Reaction Discovery: Progress, Challenges, and Future Opportunities. J Phys Chem A 2022; 126:7051-7069. [PMID: 36190262 PMCID: PMC9574932 DOI: 10.1021/acs.jpca.2c06408] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2022] [Revised: 09/22/2022] [Indexed: 11/29/2022]
Abstract
Graph-based descriptors, such as bond-order matrices and adjacency matrices, offer a simple and compact way of categorizing molecular structures; furthermore, such descriptors can be readily used to catalog chemical reactions (i.e., bond-making and -breaking). As such, a number of graph-based methodologies have been developed with the goal of automating the process of generating chemical reaction network models describing the possible mechanistic chemistry in a given set of reactant species. Here, we outline the evolution of these graph-based reaction discovery schemes, with particular emphasis on more recent methods incorporating graph-based methods with semiempirical and ab initio electronic structure calculations, minimum-energy path refinements, and transition state searches. Using representative examples from homogeneous catalysis and interstellar chemistry, we highlight how these schemes increasingly act as "virtual reaction vessels" for interrogating mechanistic questions. Finally, we highlight where challenges remain, including issues of chemical accuracy and calculation speeds, as well as the inherent challenge of dealing with the vast size of accessible chemical reaction space.
Collapse
Affiliation(s)
- Idil Ismail
- Department of Chemistry, University
of Warwick, CoventryCV4 7AL, United Kingdom
| | | | - Scott Habershon
- Department of Chemistry, University
of Warwick, CoventryCV4 7AL, United Kingdom
| |
Collapse
|
13
|
Lewis‐Atwell T, Townsend PA, Grayson MN. Machine learning activation energies of chemical reactions. WIRES COMPUTATIONAL MOLECULAR SCIENCE 2022. [DOI: 10.1002/wcms.1593] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Affiliation(s)
- Toby Lewis‐Atwell
- Department of Computer Science, Faculty of Science University of Bath Bath UK
| | - Piers A. Townsend
- Department of Chemistry, Faculty of Science University of Bath Bath UK
| | | |
Collapse
|
14
|
Farrar EHE, Grayson MN. Machine learning and semi-empirical calculations: a synergistic approach to rapid, accurate, and mechanism-based reaction barrier prediction. Chem Sci 2022; 13:7594-7603. [PMID: 35872815 PMCID: PMC9242013 DOI: 10.1039/d2sc02925a] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2022] [Accepted: 06/08/2022] [Indexed: 11/21/2022] Open
Abstract
Modern QM modelling methods, such as DFT, have provided detailed mechanistic insights into countless reactions. However, their computational cost inhibits their ability to rapidly screen large numbers of substrates and catalysts in reaction discovery. For a C-C bond forming nitro-Michael addition, we introduce a synergistic semi-empirical quantum mechanical (SQM) and machine learning (ML) approach that allows the prediction of DFT-quality reaction barriers in minutes, even on a standard laptop using widely available modelling software. Mean absolute errors (MAEs) are obtained that are below the accepted chemical accuracy threshold of 1 kcal mol-1 and substantially better than SQM methods without ML correction (5.71 kcal mol-1). Predictive power is shown to hold when the ML models are applied to an unseen set of compounds from the toxicology literature. Mechanistic insight is also achieved via the generation of full SQM transition state (TS) structures which are found to be very good approximations for the DFT-level geometries, revealing important steric interactions in some TSs. This combination of speed, accuracy, and mechanistic insight is unprecedented; current ML barrier models compromise on at least one of these important criteria.
Collapse
Affiliation(s)
- Elliot H E Farrar
- Department of Chemistry, University of Bath Claverton Down Bath BA2 7AY UK
| | - Matthew N Grayson
- Department of Chemistry, University of Bath Claverton Down Bath BA2 7AY UK
| |
Collapse
|
15
|
Lustosa DM, Milo A. Mechanistic Inference from Statistical Models at Different Data-Size Regimes. ACS Catal 2022. [DOI: 10.1021/acscatal.2c01741] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
- Danilo M. Lustosa
- Department of Chemistry, Ben-Gurion University of the Negev, Beer Sheva 84105, Israel
| | - Anat Milo
- Department of Chemistry, Ben-Gurion University of the Negev, Beer Sheva 84105, Israel
| |
Collapse
|
16
|
Ismail I, Robertson C, Habershon S. Successes and challenges in using machine-learned activation energies in kinetic simulations. J Chem Phys 2022; 157:014109. [DOI: 10.1063/5.0096027] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
The prediction of the thermodynamic and kinetic properties of chemical reactions is increasingly being addressed by machine-learning (ML) methods such as artificial neural networks (ANNs). While a number of recent studies have reported success in predicting chemical reaction activation energies, less attention has focused on how the accuracy of ML predictions filter through to predictions of macroscopic observables. Here, we consider the impact of the uncertainty associated with ML prediction of activation energies on observable properties of chemical reaction networks, as given by microkinetics simulations based on ML-predicted reaction rates. After training an ANN to predict activation energies given standard molecular descriptors for reactants and products alone, we performed microkinetics simulations of three different prototypical reaction networks: formamide decomposition, aldol reactions and decomposition of 3-hydroperoxypropanal. We find that the kinetic modelling predictions can be in excellent agreement with corresponding simulations performed with ab initio calculations, but this is dependent on the inherent energetic landscape of the networks. We use these simulations to suggest some guidelines for when ML-based activation energies can be reliable, and when one should take more care in applications to kinetics modelling.
Collapse
Affiliation(s)
| | | | - Scott Habershon
- Department of Chemistry, University of Warwick, United Kingdom
| |
Collapse
|
17
|
Kim JH, Kim H, Kim WY. Effect of molecular representation on deep learning performance for prediction of molecular electronic properties. B KOREAN CHEM SOC 2022. [DOI: 10.1002/bkcs.12516] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Affiliation(s)
- Jun Hyeong Kim
- Department of Chemistry Korea Advanced Institute of Science and Technology Daejeon South Korea
| | - Hyeonsu Kim
- Department of Chemistry Korea Advanced Institute of Science and Technology Daejeon South Korea
| | - Woo Youn Kim
- Department of Chemistry Korea Advanced Institute of Science and Technology Daejeon South Korea
- KI for Artificial Intelligence Korea Advanced Institute of Science and Technology Daejeon South Korea
| |
Collapse
|
18
|
Chen L, Zhang X, Chen A, Yao S, Hu X, Zhou Z. Targeted design of advanced electrocatalysts by machine learning. CHINESE JOURNAL OF CATALYSIS 2022. [DOI: 10.1016/s1872-2067(21)63852-4] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
|
19
|
Artificial Neural Network and Support Vector Regression Modeling for Prediction of Mixing Time in Wet Granulation. J Pharm Innov 2021. [DOI: 10.1007/s12247-021-09597-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
20
|
Gong S, Wang Y, Tian Y, Wang L, Liu G. Rapid enthalpy prediction of transition states using molecular graph convolutional network. AIChE J 2021. [DOI: 10.1002/aic.17269] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Affiliation(s)
- Siyuan Gong
- Key Laboratory for Green Chemical Technology of Ministry of Education, School of Chemical Engineering and Technology Tianjin University Tianjin China
| | - Yutong Wang
- Key Laboratory for Green Chemical Technology of Ministry of Education, School of Chemical Engineering and Technology Tianjin University Tianjin China
| | - Yajie Tian
- Key Laboratory for Green Chemical Technology of Ministry of Education, School of Chemical Engineering and Technology Tianjin University Tianjin China
- Henan Engineering Research Center of Resource and Energy Recovery from Waste, College of Chemistry and Chemical Engineering Henan University Kaifeng China
| | - Li Wang
- Key Laboratory for Green Chemical Technology of Ministry of Education, School of Chemical Engineering and Technology Tianjin University Tianjin China
- Collaborative Innovation Center of Chemical Science and Engineering (Tianjin) Tianjin University Tianjin China
| | - Guozhu Liu
- Key Laboratory for Green Chemical Technology of Ministry of Education, School of Chemical Engineering and Technology Tianjin University Tianjin China
- Collaborative Innovation Center of Chemical Science and Engineering (Tianjin) Tianjin University Tianjin China
| |
Collapse
|
21
|
Grambow C, Pattanaik L, Green WH. Deep Learning of Activation Energies. J Phys Chem Lett 2020; 11:2992-2997. [PMID: 32216310 PMCID: PMC7311089 DOI: 10.1021/acs.jpclett.0c00500] [Citation(s) in RCA: 81] [Impact Index Per Article: 16.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2020] [Accepted: 03/27/2020] [Indexed: 05/27/2023]
Abstract
Quantitative predictions of reaction properties, such as activation energy, have been limited due to a lack of available training data. Such predictions would be useful for computer-assisted reaction mechanism generation and organic synthesis planning. We develop a template-free deep learning model to predict the activation energy given reactant and product graphs and train the model on a new, diverse data set of gas-phase quantum chemistry reactions. We demonstrate that our model achieves accurate predictions and agrees with an intuitive understanding of chemical reactivity. With the continued generation of quantitative chemical reaction data and the development of methods that leverage such data, we expect many more methods for reactivity prediction to become available in the near future.
Collapse
Affiliation(s)
- Colin
A. Grambow
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Lagnajit Pattanaik
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - William H. Green
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| |
Collapse
|
22
|
Townsend J, Vogiatzis KD. Data-Driven Acceleration of the Coupled-Cluster Singles and Doubles Iterative Solver. J Phys Chem Lett 2019; 10:4129-4135. [PMID: 31290671 DOI: 10.1021/acs.jpclett.9b01442] [Citation(s) in RCA: 33] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/26/2023]
Abstract
Solving the coupled-cluster (CC) equations is a cost-prohibitive process that exhibits poor scaling with system size. These equations are solved by determining the set of amplitudes (t) that minimize the system energy with respect to the coupled-cluster equations at the selected level of truncation. Here, a novel approach to predict the converged coupled-cluster singles and doubles (CCSD) amplitudes, thus the coupled-cluster wave function, is explored by using machine learning and electronic structure properties inherent to the MP2 level. Features are collected from quantum chemical data, such as orbital energies, one-electron Hamiltonian, Coulomb, and exchange terms. The data-driven CCSD (DDCCSD) is not an alchemical method because the actual iterative coupled-cluster equations are solved. However, accurate energetics can also be obtained by bypassing solving the CC equations entirely. Our preliminary data show that it is possible to achieve remarkable speedups in solving the CCSD equations, especially when the correct physics are encoded and used for training of machine learning models.
Collapse
Affiliation(s)
- Jacob Townsend
- Department of Chemistry , University of Tennessee , Knoxville , Tennessee 37996 , United States
| | | |
Collapse
|
23
|
Kim H, Park JY, Choi S. Energy refinement and analysis of structures in the QM9 database via a highly accurate quantum chemical method. Sci Data 2019; 6:109. [PMID: 31270326 PMCID: PMC6610095 DOI: 10.1038/s41597-019-0121-7] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2019] [Accepted: 06/13/2019] [Indexed: 12/12/2022] Open
Abstract
A wide variety of data-driven approaches have been introduced in the field of quantum chemistry. To extend the applicable range and improve the prediction power of those approaches, highly accurate quantum chemical benchmarks that cover extremely large chemical spaces are required. Here, we report ~134 k quantum chemical calculations performed with G4MP2, the fourth generation of the G-n series in which second-order perturbation theory is employed. A single composite method calculation executes several low-level calculations to reproduce the results of high-level ab initio calculations with the aim of saving computational costs. Therefore, our database reports the results of the various methods (e.g., density functional theory, Hartree-Fock, Møller-Plesset perturbation theory, and coupled-cluster theory). Additionally, we examined the structure information of both the QM9 and the revised databases via chemical graph analysis. Our database can be applied to refine and improve the quality of data-driven quantum chemical prediction. Furthermore, we reported the raw outputs of all calculations performed in this work for other potential applications.
Collapse
Affiliation(s)
- Hyungjun Kim
- Department of Chemistry, Incheon National University, 119 Academy-ro, Yeonsu-gu, Incheon, 22012, Republic of Korea
| | - Ji Young Park
- Department of Chemistry, Korea Advanced Institute of Science and Technology (KAIST), 291 Daehak-ro, Yuseong-gu, Daejeon, 34141, Republic of Korea
| | - Sunghwan Choi
- National Institute of Supercomputing and Network, Korea Institute of Science and Technology Information, Daejeon, 34141, Republic of Korea.
| |
Collapse
|
24
|
Wang H, Ji Y, Li Y. Simulation and design of energy materials accelerated by machine learning. WILEY INTERDISCIPLINARY REVIEWS-COMPUTATIONAL MOLECULAR SCIENCE 2019. [DOI: 10.1002/wcms.1421] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Affiliation(s)
- Hongshuai Wang
- Jiangsu Key Laboratory for Carbon‐Based Functional Materials and Devices, Institute of Functional Nano & Soft Materials (FUNSOM) Soochow University Suzhou PR China
| | - Yujin Ji
- Jiangsu Key Laboratory for Carbon‐Based Functional Materials and Devices, Institute of Functional Nano & Soft Materials (FUNSOM) Soochow University Suzhou PR China
| | - Youyong Li
- Jiangsu Key Laboratory for Carbon‐Based Functional Materials and Devices, Institute of Functional Nano & Soft Materials (FUNSOM) Soochow University Suzhou PR China
| |
Collapse
|