1
|
Ulanov E, Qadir GA, Riedmiller K, Friederich P, Gräter F. Predicting hydrogen atom transfer energy barriers using Gaussian process regression. DIGITAL DISCOVERY 2025:d4dd00174e. [PMID: 39850148 PMCID: PMC11747964 DOI: 10.1039/d4dd00174e] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/24/2024] [Accepted: 01/06/2025] [Indexed: 01/25/2025]
Abstract
Predicting reaction barriers for arbitrary configurations based on only a limited set of density functional theory (DFT) calculations would render the design of catalysts or the simulation of reactions within complex materials highly efficient. We here propose Gaussian process regression (GPR) as a method of choice if DFT calculations are limited to hundreds or thousands of barrier calculations. For the case of hydrogen atom transfer in proteins, an important reaction in chemistry and biology, we obtain a mean absolute error of 3.23 kcal mol-1 for the range of barriers in the data set using SOAP descriptors and similar values using the marginalized graph kernel. Thus, the two GPR models can robustly estimate reaction barriers within the large chemical and conformational space of proteins. Their predictive power is comparable to a graph neural network-based model, and GPR even outcompetes the latter in the low data regime. We propose GPR as a valuable tool for an approximate but data-efficient model of chemical reactivity in a complex and highly variable environment.
Collapse
Affiliation(s)
- Evgeni Ulanov
- Heidelberg Institute for Theoretical Studies Heidelberg Germany
- Max Planck Institute for Polymer Research Mainz Germany
| | - Ghulam A Qadir
- Heidelberg Institute for Theoretical Studies Heidelberg Germany
| | - Kai Riedmiller
- Heidelberg Institute for Theoretical Studies Heidelberg Germany
| | - Pascal Friederich
- Institute of Theoretical Informatics, Karlsruhe Institute of Technology Kaiserstr. 12 76131 Karlsruhe Germany
- Institute of Nanotechnology, Karlsruhe Institute of Technology Kaiserstr. 12 76131 Karlsruhe Germany
| | - Frauke Gräter
- Heidelberg Institute for Theoretical Studies Heidelberg Germany
- Interdisciplinary Center for Scientific Computing, Heidelberg University Heidelberg Germany
- Max Planck Institute for Polymer Research Mainz Germany
| |
Collapse
|
2
|
Li Y, Ma F, Wang Z, Chen X. Transferable and Interpretable Prediction of Site-Specific Dehydrogenation Reaction Rate Constants with NMR Spectra. J Phys Chem Lett 2024; 15:11282-11290. [PMID: 39495481 DOI: 10.1021/acs.jpclett.4c02647] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2024]
Abstract
Accurate and efficient determination of site-specific reaction rate constants over a wide temperature range remains challenging, both experimentally and theoretically. Taking the dehydrogenation reaction as an example, our study addresses this issue by an innovative combination of machine learning techniques and cost-effective NMR spectra. Through descriptor screening, we identified a minimal set of NMR chemical shifts that can effectively determine reaction rate constants. The constructed model performs exceptionally well on theoretical data sets and demonstrates impressive generalization capabilities, extending from small molecules to larger ones. Furthermore, this model shows outstanding performance when applied to limited experimental data sets, highlighting its robust applicability and transferability. Utilizing the Sure Independence Screening and Sparsifying Operator (SISSO) algorithm, we also present an interpretable rate constant-temperature-NMR (k-T-NMR) relationship with a mathematical formula. This study reveals the great potential of combining machine learning with easily accessible spectroscopic descriptors in the study of reaction kinetics, enabling the rapid determination of reaction rate constants and promoting our understanding of reactivity.
Collapse
Affiliation(s)
- Yanbo Li
- School of Chemistry and Materials Science, University of Science and Technology of China, Hefei 230026, China
- GuSu Laboratory of Materials, Suzhou 215123, China
| | - Fenfen Ma
- GuSu Laboratory of Materials, Suzhou 215123, China
| | - Zhandong Wang
- National Synchrotron Radiation Laboratory, University of Science and Technology of China, Hefei, Anhui 230029, China
| | - Xin Chen
- Suzhou Laboratory, Suzhou 215123, China
| |
Collapse
|
3
|
Liu Y, Mo Y, Cheng Y. Uncertainty Qualification for Deep Learning-Based Elementary Reaction Property Prediction. J Chem Inf Model 2024; 64:8131-8141. [PMID: 39441973 DOI: 10.1021/acs.jcim.4c01358] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2024]
Abstract
The prediction of the thermodynamic and kinetic properties of elementary reactions has shown rapid improvement due to the implementation of deep learning (DL) methods. While various studies have reported the success in predicting reaction properties, the quantification of prediction uncertainty has seldom been investigated, thus compromising the confidence in using these predicted properties in practical applications. Here, we integrated graph convolutional neural networks (GCNN) with three uncertainty prediction techniques, including deep ensemble, Monte Carlo (MC)-dropout, and evidential learning, to provide insights into the uncertainty quantification and utility. The deep ensemble model outperforms others in accuracy and shows the highest reliability in estimating prediction uncertainty across all elementary reaction property data sets. We also verified that the deep ensemble model showed a satisfactory capability in recognizing epistemic and aleatoric uncertainties. Additionally, we adopted a Monte Carlo Tree Search method for extracting the explainable reaction substructures, providing a chemical explanation for DL predicted properties and corresponding uncertainties. Finally, to demonstrate the utility of uncertainty qualification in practical applications, we performed an uncertainty-guided calibration of the DL-constructed kinetic model, which achieved a 25% higher hit ratio in identifying dominant reaction pathways compared to that of the calibration without uncertainty guidance.
Collapse
Affiliation(s)
- Yan Liu
- College of Chemical and Biological Engineering, Zhejiang University, Hangzhou 310027, China
- Key Laboratory of Biomass Chemical Engineering of Ministry of Education, College of Chemical and Biological Engineering, Zhejiang University, Hangzhou 310027, China
| | - Yiming Mo
- College of Chemical and Biological Engineering, Zhejiang University, Hangzhou 310027, China
- ZJU-Hangzhou Global Scientific and Technological Innovation Center, Zhejiang University, Hangzhou 311215, China
| | - Youwei Cheng
- College of Chemical and Biological Engineering, Zhejiang University, Hangzhou 310027, China
- Key Laboratory of Biomass Chemical Engineering of Ministry of Education, College of Chemical and Biological Engineering, Zhejiang University, Hangzhou 310027, China
- Zhejiang Hengyi Petrochemical Research Institute Co., Ltd., Hangzhou 311215, China
| |
Collapse
|
4
|
Stuyver T. TS-tools: Rapid and automated localization of transition states based on a textual reaction SMILES input. J Comput Chem 2024; 45:2308-2317. [PMID: 38850166 DOI: 10.1002/jcc.27374] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2024] [Revised: 03/08/2024] [Accepted: 03/20/2024] [Indexed: 06/10/2024]
Abstract
Here, TS-tools is presented, a Python package facilitating the automated localization of transition states (TS) based on a textual reaction SMILES input. TS searches can either be performed at xTB or DFT level of theory, with the former yielding guesses at marginal computational cost, and the latter directly yielding accurate structures at greater expense. On a benchmarking dataset of mono- and bimolecular reactions, TS-tools reaches an excellent success rate of 95% already at xTB level of theory. For tri- and multimolecular reaction pathways - which are typically not benchmarked when developing new automated TS search approaches, yet are relevant for various types of reactivity, cf. solvent- and autocatalysis and enzymatic reactivity - TS-tools retains its ability to identify TS geometries, though a DFT treatment becomes essential in many cases. Throughout the presented applications, a particular emphasis is placed on solvation-induced mechanistic changes, another issue that received limited attention in the automated TS search literature so far.
Collapse
Affiliation(s)
- Thijs Stuyver
- Ecole Nationale Supérieure de Chimie de Paris, Université PSL, CNRS, Institute of Chemistry for Life and Health Sciences, Paris, France
| |
Collapse
|
5
|
Chen LY, Li YP. Machine learning-guided strategies for reaction conditions design and optimization. Beilstein J Org Chem 2024; 20:2476-2492. [PMID: 39376489 PMCID: PMC11457048 DOI: 10.3762/bjoc.20.212] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2024] [Accepted: 09/19/2024] [Indexed: 10/09/2024] Open
Abstract
This review surveys the recent advances and challenges in predicting and optimizing reaction conditions using machine learning techniques. The paper emphasizes the importance of acquiring and processing large and diverse datasets of chemical reactions, and the use of both global and local models to guide the design of synthetic processes. Global models exploit the information from comprehensive databases to suggest general reaction conditions for new reactions, while local models fine-tune the specific parameters for a given reaction family to improve yield and selectivity. The paper also identifies the current limitations and opportunities in this field, such as the data quality and availability, and the integration of high-throughput experimentation. The paper demonstrates how the combination of chemical engineering, data science, and ML algorithms can enhance the efficiency and effectiveness of reaction conditions design, and enable novel discoveries in synthetic chemistry.
Collapse
Affiliation(s)
- Lung-Yi Chen
- Department of Chemical Engineering, National Taiwan University, No. 1, Sec. 4, Roosevelt Road, Taipei 10617, Taiwan
| | - Yi-Pei Li
- Department of Chemical Engineering, National Taiwan University, No. 1, Sec. 4, Roosevelt Road, Taipei 10617, Taiwan
- Taiwan International Graduate Program on Sustainable Chemical Science and Technology (TIGP-SCST), No. 128, Sec. 2, Academia Road, Taipei 11529, Taiwan
| |
Collapse
|
6
|
Li SC, Wu H, Menon A, Spiekermann KA, Li YP, Green WH. When Do Quantum Mechanical Descriptors Help Graph Neural Networks to Predict Chemical Properties? J Am Chem Soc 2024; 146:23103-23120. [PMID: 39106041 DOI: 10.1021/jacs.4c04670] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/07/2024]
Abstract
Deep graph neural networks are extensively utilized to predict chemical reactivity and molecular properties. However, because of the complexity of chemical space, such models often have difficulty extrapolating beyond the chemistry contained in the training set. Augmenting the model with quantum mechanical (QM) descriptors is anticipated to improve its generalizability. However, obtaining QM descriptors often requires CPU-intensive computational chemistry calculations. To identify when QM descriptors help graph neural networks predict chemical properties, we conduct a systematic investigation of the impact of atom, bond, and molecular QM descriptors on the performance of directed message passing neural networks (D-MPNNs) for predicting 16 molecular properties. The analysis surveys computational and experimental targets, as well as classification and regression tasks, and varied data set sizes from several hundred to hundreds of thousands of data points. Our results indicate that QM descriptors are mostly beneficial for D-MPNN performance on small data sets, provided that the descriptors correlate well with the targets and can be readily computed with high accuracy. Otherwise, using QM descriptors can add cost without benefit or even introduce unwanted noise that can degrade model performance. Strategic integration of QM descriptors with D-MPNN unlocks potential for physics-informed, data-efficient modeling with some interpretability that can streamline de novo drug and material designs. To facilitate the use of QM descriptors in machine learning workflows for chemistry, we provide a set of guidelines regarding when and how to best leverage QM descriptors, a high-throughput workflow to compute them, and an enhancement to Chemprop, a widely adopted open-source D-MPNN implementation for chemical property prediction.
Collapse
Affiliation(s)
- Shih-Cheng Li
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
- Department of Chemical Engineering, National Taiwan University, Taipei 10617, Taiwan
| | - Haoyang Wu
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Angiras Menon
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Kevin A Spiekermann
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Yi-Pei Li
- Department of Chemical Engineering, National Taiwan University, Taipei 10617, Taiwan
| | - William H Green
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| |
Collapse
|
7
|
van Gerwen P, Briling KR, Bunne C, Somnath VR, Laplaza R, Krause A, Corminboeuf C. 3DReact: Geometric Deep Learning for Chemical Reactions. J Chem Inf Model 2024; 64:5771-5785. [PMID: 39007724 PMCID: PMC11323278 DOI: 10.1021/acs.jcim.4c00104] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2024] [Revised: 07/03/2024] [Accepted: 07/08/2024] [Indexed: 07/16/2024]
Abstract
Geometric deep learning models, which incorporate the relevant molecular symmetries within the neural network architecture, have considerably improved the accuracy and data efficiency of predictions of molecular properties. Building on this success, we introduce 3DReact, a geometric deep learning model to predict reaction properties from three-dimensional structures of reactants and products. We demonstrate that the invariant version of the model is sufficient for existing reaction data sets. We illustrate its competitive performance on the prediction of activation barriers on the GDB7-22-TS, Cyclo-23-TS, and Proparg-21-TS data sets in different atom-mapping regimes. We show that, compared to existing models for reaction property prediction, 3DReact offers a flexible framework that exploits atom-mapping information, if available, as well as geometries of reactants and products (in an invariant or equivariant fashion). Accordingly, it performs systematically well across different data sets, atom-mapping regimes, as well as both interpolation and extrapolation tasks.
Collapse
Affiliation(s)
- Puck van Gerwen
- Laboratory
for Computational Molecular Design, Institute of Chemical Sciences
and Engineering, École Polytechnique
Fédérale de Lausanne, 1015 Lausanne, Switzerland
- National
Center for Competence in Research − Catalysis (NCCR-Catalysis), École Polytechnique Fédérale
de Lausanne, 1015 Lausanne, Switzerland
| | - Ksenia R. Briling
- Laboratory
for Computational Molecular Design, Institute of Chemical Sciences
and Engineering, École Polytechnique
Fédérale de Lausanne, 1015 Lausanne, Switzerland
| | - Charlotte Bunne
- National
Center for Competence in Research − Catalysis (NCCR-Catalysis), École Polytechnique Fédérale
de Lausanne, 1015 Lausanne, Switzerland
- Learning
& Adaptive Systems Group, Department of Computer Science, ETH Zurich, 8092 Zurich, Switzerland
| | - Vignesh Ram Somnath
- National
Center for Competence in Research − Catalysis (NCCR-Catalysis), École Polytechnique Fédérale
de Lausanne, 1015 Lausanne, Switzerland
- Learning
& Adaptive Systems Group, Department of Computer Science, ETH Zurich, 8092 Zurich, Switzerland
| | - Ruben Laplaza
- Laboratory
for Computational Molecular Design, Institute of Chemical Sciences
and Engineering, École Polytechnique
Fédérale de Lausanne, 1015 Lausanne, Switzerland
- National
Center for Competence in Research − Catalysis (NCCR-Catalysis), École Polytechnique Fédérale
de Lausanne, 1015 Lausanne, Switzerland
| | - Andreas Krause
- National
Center for Competence in Research − Catalysis (NCCR-Catalysis), École Polytechnique Fédérale
de Lausanne, 1015 Lausanne, Switzerland
- Learning
& Adaptive Systems Group, Department of Computer Science, ETH Zurich, 8092 Zurich, Switzerland
| | - Clemence Corminboeuf
- Laboratory
for Computational Molecular Design, Institute of Chemical Sciences
and Engineering, École Polytechnique
Fédérale de Lausanne, 1015 Lausanne, Switzerland
- National
Center for Competence in Research − Catalysis (NCCR-Catalysis), École Polytechnique Fédérale
de Lausanne, 1015 Lausanne, Switzerland
| |
Collapse
|
8
|
Lalith N, Singh AR, Gauthier JA. The Importance of Reaction Energy in Predicting Chemical Reaction Barriers with Machine Learning Models. Chemphyschem 2024; 25:e202300933. [PMID: 38517585 DOI: 10.1002/cphc.202300933] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2023] [Revised: 03/21/2024] [Accepted: 03/22/2024] [Indexed: 03/24/2024]
Abstract
Improving our fundamental understanding of complex heterocatalytic processes increasingly relies on electronic structure simulations and microkinetic models based on calculated energy differences. In particular, calculation of activation barriers, usually achieved through compute-intensive saddle point search routines, remains a serious bottleneck in understanding trends in catalytic activity for highly branched reaction networks. Although the well-known Brønsted-Evans-Polyani (BEP) scaling - a one-feature linear regression model - has been widely applied in such microkinetic models, they still rely on calculated reaction energies and may not generalize beyond a single facet on a single class of materials, e. g., a terrace sites on transition metals. For highly branched and energetically shallow reaction networks, such as electrochemical CO2 reduction or wastewater remediation, calculating even reaction energies on many surfaces can become computationally intractable due to the combinatorial explosion of states that must be considered. Here, we investigate the feasibility of activation barrier prediction without knowledge of the reaction energy using linear and nonlinear machine learning (ML) models trained on a new database of over 500 dehydrogenation activation barriers. We also find that inclusion of the reaction energy significantly improves both classes of ML models, but complex nonlinear models can achieve performance similar to the simplest BEP scaling when predicting activation barriers on new systems. Additionally, inclusion of the reaction energy significantly improves generalizability to new systems beyond the training set. Our results suggest that the reaction energy is a critical feature to consider when building models to predict activation barriers, indicating that efforts to reliably predict reaction energies through, e. g., the Open Catalyst Project and others, will be an important route to effective model development for more complex systems.
Collapse
Affiliation(s)
- Nithin Lalith
- Department of Chemical Engineering, Texas Tech University, Lubbock, TX 79409, USA
| | | | - Joseph A Gauthier
- Department of Chemical Engineering, Texas Tech University, Lubbock, TX 79409, USA
| |
Collapse
|
9
|
van Gerwen P, Briling KR, Calvino Alonso Y, Franke M, Corminboeuf C. Benchmarking machine-readable vectors of chemical reactions on computed activation barriers. DIGITAL DISCOVERY 2024; 3:932-943. [PMID: 38756222 PMCID: PMC11094696 DOI: 10.1039/d3dd00175j] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/06/2023] [Accepted: 02/28/2024] [Indexed: 05/18/2024]
Abstract
In recent years, there has been a surge of interest in predicting computed activation barriers, to enable the acceleration of the automated exploration of reaction networks. Consequently, various predictive approaches have emerged, ranging from graph-based models to methods based on the three-dimensional structure of reactants and products. In tandem, many representations have been developed to predict experimental targets, which may hold promise for barrier prediction as well. Here, we bring together all of these efforts and benchmark various methods (Morgan fingerprints, the DRFP, the CGR representation-based Chemprop, SLATMd, B2Rl2, EquiReact and language model BERT + RXNFP) for the prediction of computed activation barriers on three diverse datasets.
Collapse
Affiliation(s)
- Puck van Gerwen
- Laboratory for Computational Molecular Design, Institute of Chemical Sciences and Engineering, École Polytechnique Fédérale de Lausanne 1015 Lausanne Switzerland
- National Center for Competence in Research-Catalysis (NCCR-Catalysis), École Polytechnique Fédérale de Lausanne 1015 Lausanne Switzerland
| | - Ksenia R Briling
- Laboratory for Computational Molecular Design, Institute of Chemical Sciences and Engineering, École Polytechnique Fédérale de Lausanne 1015 Lausanne Switzerland
| | - Yannick Calvino Alonso
- Laboratory for Computational Molecular Design, Institute of Chemical Sciences and Engineering, École Polytechnique Fédérale de Lausanne 1015 Lausanne Switzerland
| | - Malte Franke
- Laboratory for Computational Molecular Design, Institute of Chemical Sciences and Engineering, École Polytechnique Fédérale de Lausanne 1015 Lausanne Switzerland
| | - Clemence Corminboeuf
- Laboratory for Computational Molecular Design, Institute of Chemical Sciences and Engineering, École Polytechnique Fédérale de Lausanne 1015 Lausanne Switzerland
- National Center for Competence in Research-Catalysis (NCCR-Catalysis), École Polytechnique Fédérale de Lausanne 1015 Lausanne Switzerland
| |
Collapse
|
10
|
Kishimoto A, Wu D, O'Shea DF. Forecasting vaping health risks through neural network model prediction of flavour pyrolysis reactions. Sci Rep 2024; 14:9591. [PMID: 38719814 PMCID: PMC11079048 DOI: 10.1038/s41598-024-59619-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2023] [Accepted: 04/11/2024] [Indexed: 05/12/2024] Open
Abstract
Vaping involves the heating of chemical solutions (e-liquids) to high temperatures prior to lung inhalation. A risk exists that these chemicals undergo thermal decomposition to new chemical entities, the composition and health implications of which are largely unknown. To address this concern, a graph-convolutional neural network (NN) model was used to predict pyrolysis reactivity of 180 e-liquid chemical flavours. The output of this supervised machine learning approach was a dataset of probability ranked pyrolysis transformations and their associated 7307 products. To refine this dataset, the molecular weight of each NN predicted product was automatically correlated with experimental mass spectrometry (MS) fragmentation data for each flavour chemical. This blending of deep learning methods with experimental MS data identified 1169 molecular weight matches that prioritized these compounds for further analysis. The average number of discrete matches per flavour between NN predictions and MS fragmentation was 6.4 with 92.8% of flavours having at least one match. Globally harmonized system classifications for NN/MS matches were extracted from PubChem, revealing that 127 acute toxic, 153 health hazard and 225 irritant classifications were predicted. This approach may reveal the longer-term health risks of vaping in advance of clinical diseases emerging in the general population.
Collapse
Affiliation(s)
| | - Dan Wu
- Department of Chemistry, Royal College of Surgeons in Ireland (RCSI), Dublin 2, Ireland.
| | - Donal F O'Shea
- Department of Chemistry, Royal College of Surgeons in Ireland (RCSI), Dublin 2, Ireland.
| |
Collapse
|
11
|
Zhao XG, Yang Q, Xu Y, Liu QY, Li ZY, Liu XX, Zhao YX, He SG. Machine Learning for Experimental Reactivity of a Set of Metal Clusters toward C-H Activation. J Am Chem Soc 2024; 146:12485-12495. [PMID: 38651836 DOI: 10.1021/jacs.4c00501] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/25/2024]
Abstract
Understanding the mechanisms of C-H activation of alkanes is a very important research topic. The reactions of metal clusters with alkanes have been extensively studied to reveal the electronic features governing C-H activation, while the experimental cluster reactivity was qualitatively interpreted case by case in the literature. Herein, we prepared and mass-selected over 100 rhodium-based clusters (RhxVyOz- and RhxCoyOz-) to react with light alkanes, enabling the determination of reaction rate constants spanning six orders of magnitude. A satisfactory model being able to quantitatively describe the rate data in terms of multiple cluster electronic features (average electron occupancy of valence s orbitals, the minimum natural charge on the metal atom, cluster polarizability, and energy gap involved in the agostic interaction) has been constructed through a machine learning approach. This study demonstrates that the general mechanisms governing the very important process of C-H activation by diverse metal centers can be discovered by interpreting experimental data with artificial intelligence.
Collapse
Affiliation(s)
- Xi-Guan Zhao
- State Key Laboratory for Structural Chemistry of Unstable and Stable Species, Institute of Chemistry, Chinese Academy of Sciences, Beijing 100190, People's Republic of China
- University of Chinese Academy of Sciences, Beijing 100049, People's Republic of China
- Beijing National Laboratory for Molecular Sciences and CAS Research/Education Centre of Excellence in Molecular Sciences, Beijing 100190, People's Republic of China
| | - Qi Yang
- State Key Laboratory for Structural Chemistry of Unstable and Stable Species, Institute of Chemistry, Chinese Academy of Sciences, Beijing 100190, People's Republic of China
- University of Chinese Academy of Sciences, Beijing 100049, People's Republic of China
- Beijing National Laboratory for Molecular Sciences and CAS Research/Education Centre of Excellence in Molecular Sciences, Beijing 100190, People's Republic of China
| | - Ying Xu
- State Key Laboratory for Structural Chemistry of Unstable and Stable Species, Institute of Chemistry, Chinese Academy of Sciences, Beijing 100190, People's Republic of China
- University of Chinese Academy of Sciences, Beijing 100049, People's Republic of China
- Beijing National Laboratory for Molecular Sciences and CAS Research/Education Centre of Excellence in Molecular Sciences, Beijing 100190, People's Republic of China
| | - Qing-Yu Liu
- State Key Laboratory for Structural Chemistry of Unstable and Stable Species, Institute of Chemistry, Chinese Academy of Sciences, Beijing 100190, People's Republic of China
- Beijing National Laboratory for Molecular Sciences and CAS Research/Education Centre of Excellence in Molecular Sciences, Beijing 100190, People's Republic of China
| | - Zi-Yu Li
- State Key Laboratory for Structural Chemistry of Unstable and Stable Species, Institute of Chemistry, Chinese Academy of Sciences, Beijing 100190, People's Republic of China
- Beijing National Laboratory for Molecular Sciences and CAS Research/Education Centre of Excellence in Molecular Sciences, Beijing 100190, People's Republic of China
| | - Xiao-Xiao Liu
- State Key Laboratory for Structural Chemistry of Unstable and Stable Species, Institute of Chemistry, Chinese Academy of Sciences, Beijing 100190, People's Republic of China
- University of Chinese Academy of Sciences, Beijing 100049, People's Republic of China
- Beijing National Laboratory for Molecular Sciences and CAS Research/Education Centre of Excellence in Molecular Sciences, Beijing 100190, People's Republic of China
| | - Yan-Xia Zhao
- State Key Laboratory for Structural Chemistry of Unstable and Stable Species, Institute of Chemistry, Chinese Academy of Sciences, Beijing 100190, People's Republic of China
- Beijing National Laboratory for Molecular Sciences and CAS Research/Education Centre of Excellence in Molecular Sciences, Beijing 100190, People's Republic of China
| | - Sheng-Gui He
- State Key Laboratory for Structural Chemistry of Unstable and Stable Species, Institute of Chemistry, Chinese Academy of Sciences, Beijing 100190, People's Republic of China
- University of Chinese Academy of Sciences, Beijing 100049, People's Republic of China
- Beijing National Laboratory for Molecular Sciences and CAS Research/Education Centre of Excellence in Molecular Sciences, Beijing 100190, People's Republic of China
| |
Collapse
|
12
|
Ding Y, Qiang B, Chen Q, Liu Y, Zhang L, Liu Z. Exploring Chemical Reaction Space with Machine Learning Models: Representation and Feature Perspective. J Chem Inf Model 2024; 64:2955-2970. [PMID: 38489239 DOI: 10.1021/acs.jcim.4c00004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/17/2024]
Abstract
Chemical reactions serve as foundational building blocks for organic chemistry and drug design. In the era of large AI models, data-driven approaches have emerged to innovate the design of novel reactions, optimize existing ones for higher yields, and discover new pathways for synthesizing chemical structures comprehensively. To effectively address these challenges with machine learning models, it is imperative to derive robust and informative representations or engage in feature engineering using extensive data sets of reactions. This work aims to provide a comprehensive review of established reaction featurization approaches, offering insights into the selection of representations and the design of features for a wide array of tasks. The advantages and limitations of employing SMILES, molecular fingerprints, molecular graphs, and physics-based properties are meticulously elaborated. Solutions to bridge the gap between different representations will also be critically evaluated. Additionally, we introduce a new frontier in chemical reaction pretraining, holding promise as an innovative yet unexplored avenue.
Collapse
Affiliation(s)
- Yuheng Ding
- Department of Pharmaceutical Science, Peking University, Beijing 100191, China
| | - Bo Qiang
- Department of Pharmaceutical Science, Peking University, Beijing 100191, China
| | - Qixuan Chen
- Department of Pharmaceutical Science, Peking University, Beijing 100191, China
| | - Yiqiao Liu
- Department of Pharmaceutical Science, Peking University, Beijing 100191, China
| | - Liangren Zhang
- Department of Pharmaceutical Science, Peking University, Beijing 100191, China
| | - Zhenming Liu
- Department of Pharmaceutical Science, Peking University, Beijing 100191, China
| |
Collapse
|
13
|
Yao S, Song J, Jia L, Cheng L, Zhong Z, Song M, Feng Z. Fast and effective molecular property prediction with transferability map. Commun Chem 2024; 7:85. [PMID: 38632308 PMCID: PMC11024153 DOI: 10.1038/s42004-024-01169-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2023] [Accepted: 04/05/2024] [Indexed: 04/19/2024] Open
Abstract
Effective transfer learning for molecular property prediction has shown considerable strength in addressing insufficient labeled molecules. Many existing methods either disregard the quantitative relationship between source and target properties, risking negative transfer, or require intensive training on target tasks. To quantify transferability concerning task-relatedness, we propose Principal Gradient-based Measurement (PGM) for transferring molecular property prediction ability. First, we design an optimization-free scheme to calculate a principal gradient for approximating the direction of model optimization on a molecular property prediction dataset. We have analyzed the close connection between the principal gradient and model optimization through mathematical proof. PGM measures the transferability as the distance between the principal gradient obtained from the source dataset and that derived from the target dataset. Then, we perform PGM on various molecular property prediction datasets to build a quantitative transferability map for source dataset selection. Finally, we evaluate PGM on multiple combinations of transfer learning tasks across 12 benchmark molecular property prediction datasets and demonstrate that it can serve as fast and effective guidance to improve the performance of a target task. This work contributes to more efficient discovery of drugs, materials, and catalysts by offering a task-relatedness quantification prior to transfer learning and understanding the relationship between chemical properties.
Collapse
Affiliation(s)
- Shaolun Yao
- Collaborative Innovation Center of Artificial Intelligence by MOE and Zhejiang Provincial Government, Zhejiang University, 310027, Hangzhou, China
- College of Computer Science and Technology, Zhejiang University, 310027, Hangzhou, China
- Shanghai Institute for Advanced Study of Zhejiang University, 201203, Shanghai, China
| | - Jie Song
- Shanghai Institute for Advanced Study of Zhejiang University, 201203, Shanghai, China
- School of Software Technology, Zhejiang University, 315048, Ningbo, China
| | - Lingxiang Jia
- College of Computer Science and Technology, Zhejiang University, 310027, Hangzhou, China
| | - Lechao Cheng
- School of Computer Science and Information Engineering, Hefei University of Technology, 230009, Hefei, China
| | - Zipeng Zhong
- College of Computer Science and Technology, Zhejiang University, 310027, Hangzhou, China
| | - Mingli Song
- College of Computer Science and Technology, Zhejiang University, 310027, Hangzhou, China
- Shanghai Institute for Advanced Study of Zhejiang University, 201203, Shanghai, China
| | - Zunlei Feng
- Shanghai Institute for Advanced Study of Zhejiang University, 201203, Shanghai, China.
- School of Software Technology, Zhejiang University, 315048, Ningbo, China.
| |
Collapse
|
14
|
Vadaddi SM, Zhao Q, Savoie BM. Graph to Activation Energy Models Easily Reach Irreducible Errors but Show Limited Transferability. J Phys Chem A 2024; 128:2543-2555. [PMID: 38517281 DOI: 10.1021/acs.jpca.3c07240] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/23/2024]
Abstract
Activation energy characterization of competing reactions is a costly but crucial step for understanding the kinetic relevance of distinct reaction pathways, product yields, and myriad other properties of reacting systems. The standard methodology for activation energy characterization has historically been a transition state search using the highest level of theory that can be afforded. However, recently, several groups have popularized the idea of predicting activation energies directly based on nothing more than the reactant and product graphs, a sufficiently complex neural network, and a broad enough data set. Here, we have revisited this task using the recently developed Reaction Graph Depth 1 (RGD1) transition state data set and several newly developed graph attention architectures. All of these new architectures achieve similar state-of-the-art results of ∼4 kcal/mol mean absolute error on withheld testing sets of reactions but poor performance on external testing sets composed of reactions with differing mechanisms, reaction molecularity, or reactant size distribution. Limited transferability is also shown to be shared by other contemporary graph to activation energy architectures through a series of case studies. We conclude that an array of standard graph architectures can already achieve results comparable to the irreducible error of available reaction data sets but that out-of-distribution performance remains poor.
Collapse
Affiliation(s)
- Sai Mahit Vadaddi
- Davidson School of Chemical Engineering, Purdue University, West Lafayette, Indiana 47906, United States
| | - Qiyuan Zhao
- Department of Medicinal Chemistry, University of Michigan, Ann Arbor, Michigan 48109, United States
| | - Brett M Savoie
- Davidson School of Chemical Engineering, Purdue University, West Lafayette, Indiana 47906, United States
| |
Collapse
|
15
|
Allen AEA, Csányi G. Toward transferable empirical valence bonds: Making classical force fields reactive. J Chem Phys 2024; 160:124108. [PMID: 38526105 DOI: 10.1063/5.0196952] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2024] [Accepted: 03/07/2024] [Indexed: 03/26/2024] Open
Abstract
The empirical valence bond technique allows classical force fields to model reactive processes. However, parametrization from experimental data or quantum mechanical calculations is required for each reaction present in the simulation. We show that the parameters present in the empirical valence bond method can be predicted using a neural network model and the SMILES strings describing a reaction. This removes the need for quantum calculations in the parametrization of the empirical valence bond technique. In doing so, we have taken the first steps toward defining a new procedure for enabling reactive atomistic simulations. This procedure would allow researchers to use existing classical force fields for reactive simulations, without performing additional quantum mechanical calculations.
Collapse
Affiliation(s)
- Alice E A Allen
- Center for Nonlinear Studies, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, USA
- Theoretical Division, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, USA
| | - Gábor Csányi
- Engineering Laboratory, University of Cambridge, Trumpington Street, Cambridge CB2 1PZ, United Kingdom
| |
Collapse
|
16
|
Vijay S, Venetos MC, Spotte-Smith EWC, Kaplan AD, Wen M, Persson KA. CoeffNet: predicting activation barriers through a chemically-interpretable, equivariant and physically constrained graph neural network. Chem Sci 2024; 15:2923-2936. [PMID: 38404391 PMCID: PMC10882514 DOI: 10.1039/d3sc04411d] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2023] [Accepted: 01/05/2024] [Indexed: 02/27/2024] Open
Abstract
Activation barriers of elementary reactions are essential to predict molecular reaction mechanisms and kinetics. However, computing these energy barriers by identifying transition states with electronic structure methods (e.g., density functional theory) can be time-consuming and computationally expensive. In this work, we introduce CoeffNet, an equivariant graph neural network that predicts activation barriers using coefficients of any frontier molecular orbital (such as the highest occupied molecular orbital) of reactant and product complexes as graph node features. We show that using coefficients as features offer several advantages, such as chemical interpretability and physical constraints on the network's behaviour and numerical range. Model outputs are either activation barriers or coefficients of the chosen molecular orbital of the transition state; the latter quantity allows us to interpret the results of the neural network through chemical intuition. We test CoeffNet on a dataset of SN2 reactions as a proof-of-concept and show that the activation barriers are predicted with a mean absolute error of less than 0.025 eV. The highest occupied molecular orbital of the transition state is visualized and the distribution of the orbital densities of the transition states is described for a few prototype SN2 reactions.
Collapse
Affiliation(s)
- Sudarshan Vijay
- Department of Materials Science and Engineering, University of California, Berkeley 210 Hearst Memorial Mining Building Berkeley CA 94720 USA
- Materials Science Division, Lawrence Berkeley National Laboratory 1 Cyclotron Road Berkeley CA 94720 USA
| | - Maxwell C Venetos
- Department of Materials Science and Engineering, University of California, Berkeley 210 Hearst Memorial Mining Building Berkeley CA 94720 USA
- Materials Science Division, Lawrence Berkeley National Laboratory 1 Cyclotron Road Berkeley CA 94720 USA
| | - Evan Walter Clark Spotte-Smith
- Department of Materials Science and Engineering, University of California, Berkeley 210 Hearst Memorial Mining Building Berkeley CA 94720 USA
- Materials Science Division, Lawrence Berkeley National Laboratory 1 Cyclotron Road Berkeley CA 94720 USA
| | - Aaron D Kaplan
- Materials Science Division, Lawrence Berkeley National Laboratory 1 Cyclotron Road Berkeley CA 94720 USA
| | - Mingjian Wen
- Department of Chemical and Biomolecular Engineering, University of Houston Houston Texas 77204 USA
| | - Kristin A Persson
- Department of Materials Science and Engineering, University of California, Berkeley 210 Hearst Memorial Mining Building Berkeley CA 94720 USA
- The Molecular Foundry, Lawrence Berkeley National Laboratory 1 Cyclotron Road Berkeley CA 94720 USA
| |
Collapse
|
17
|
Chung Y, Green WH. Machine learning from quantum chemistry to predict experimental solvent effects on reaction rates. Chem Sci 2024; 15:2410-2424. [PMID: 38362410 PMCID: PMC10866337 DOI: 10.1039/d3sc05353a] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2023] [Accepted: 01/04/2024] [Indexed: 02/17/2024] Open
Abstract
Fast and accurate prediction of solvent effects on reaction rates are crucial for kinetic modeling, chemical process design, and high-throughput solvent screening. Despite the recent advance in machine learning, a scarcity of reliable data has hindered the development of predictive models that are generalizable for diverse reactions and solvents. In this work, we generate a large set of data with the COSMO-RS method for over 28 000 neutral reactions and 295 solvents and train a machine learning model to predict the solvation free energy and solvation enthalpy of activation (ΔΔG‡solv, ΔΔH‡solv) for a solution phase reaction. On unseen reactions, the model achieves mean absolute errors of 0.71 and 1.03 kcal mol-1 for ΔΔG‡solv and ΔΔH‡solv, respectively, relative to the COSMO-RS calculations. The model also provides reliable predictions of relative rate constants within a factor of 4 when tested on experimental data. The presented model can provide nearly instantaneous predictions of kinetic solvent effects or relative rate constants for a broad range of neutral closed-shell or free radical reactions and solvents only based on atom-mapped reaction SMILES and solvent SMILES strings.
Collapse
Affiliation(s)
- Yunsie Chung
- Department of Chemical Engineering, Massachusetts Institute of Technology Cambridge MA 02139 USA
| | - William H Green
- Department of Chemical Engineering, Massachusetts Institute of Technology Cambridge MA 02139 USA
| |
Collapse
|
18
|
Kirkland JK, Kumawat J, Shaban Tameh M, Tolman T, Lambert AC, Lief GR, Yang Q, Ess DH. Machine Learning Models for Predicting Zirconocene Properties and Barriers. J Chem Inf Model 2024; 64:775-784. [PMID: 38259142 DOI: 10.1021/acs.jcim.3c01575] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/24/2024]
Abstract
Zr metallocenes have significant potential to be highly tunable polyethylene catalysts through modification of the aromatic ligand framework. Here we report the development of multiple machine learning models using a large library (>700 systems) of DFT-calculated zirconocene properties and barriers for ethylene polymerization. We show that very accurate machine learning models are possible for HOMO-LUMO gaps of precatalysts but the performance significantly depends on the machine learning algorithm and type of featurization, such as fingerprints, Coulomb matrices, smooth overlap of atomic positions, or persistence images. Surprisingly, the description of the bonding hapticity, the number of direct connections between Zr and the ligand aromatic carbons, only has a moderate influence on the performance of most models. Despite robust models for HOMO-LUMO gaps, these types of machine learning models based on structure connectivity type features perform poorly in predicting ethylene migratory insertion barrier heights. Therefore, we developed several relatively robust and accurate machine learning models for barrier heights that are based on quantum-chemical descriptors (QCDs). The quantitative accuracy of these models depends on which potential energy surface structure QCDs were harvested from. This revealed a Hammett-type principle to naturally emerge showing that QCDs from the π-coordination complexes provide much better descriptions of the transition states than other potential-energy structures. Feature importance analysis of the QCDs provides several fundamental principles that influence zirconocene catalyst reactivity.
Collapse
Affiliation(s)
- Justin K Kirkland
- Department of Chemistry and Biochemistry, Brigham Young University, Provo, Utah 84604, United States
| | - Jugal Kumawat
- Department of Chemistry and Biochemistry, Brigham Young University, Provo, Utah 84604, United States
| | - Maliheh Shaban Tameh
- Department of Chemistry and Biochemistry, Brigham Young University, Provo, Utah 84604, United States
| | - Tyson Tolman
- Department of Chemistry and Biochemistry, Brigham Young University, Provo, Utah 84604, United States
| | - Allison C Lambert
- Department of Chemistry and Biochemistry, Brigham Young University, Provo, Utah 84604, United States
| | - Graham R Lief
- Research and Technology, Chevron Phillips Chemical Company, Highways 60 & 123, Bartlesville, Oklahoma 74003, United States
| | - Qing Yang
- Research and Technology, Chevron Phillips Chemical Company, Highways 60 & 123, Bartlesville, Oklahoma 74003, United States
| | - Daniel H Ess
- Department of Chemistry and Biochemistry, Brigham Young University, Provo, Utah 84604, United States
| |
Collapse
|
19
|
Adebar N, Keupp J, Emenike VN, Kühlborn J, Vom Dahl L, Möckel R, Smiatek J. Scientific Deep Machine Learning Concepts for the Prediction of Concentration Profiles and Chemical Reaction Kinetics: Consideration of Reaction Conditions. J Phys Chem A 2024; 128:929-944. [PMID: 38271617 DOI: 10.1021/acs.jpca.3c06265] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/27/2024]
Abstract
Emerging concepts from scientific deep machine learning such as physics-informed neural networks (PINNs) enable a data-driven approach for the study of complex kinetic problems. We present an extended framework that combines the advantages of PINNs with the detailed consideration of experimental parameter variations for the simulation and prediction of chemical reaction kinetics. The approach is based on truncated Taylor series expansions for the underlying fundamental equations, whereby the external variations can be interpreted as perturbations of the kinetic parameters. Accordingly, our method allows for an efficient consideration of experimental parameter settings and their influence on the concentration profiles and reaction kinetics. A particular advantage of our approach, in addition to the consideration of univariate and multivariate parameter variations, is the robust model-based exploration of the parameter space to determine optimal reaction conditions in combination with advanced reaction insights. The benefits of this concept are demonstrated for higher-order chemical reactions including catalytic and oscillatory systems in combination with small amounts of training data. All predicted values show a high level of accuracy, demonstrating the broad applicability and flexibility of our approach.
Collapse
Affiliation(s)
- Niklas Adebar
- Development NCE, Chemical Development, Boehringer Ingelheim Pharma GmbH & Co. KG, D-55218 Ingelheim (Rhein), Germany
| | - Julian Keupp
- Development NCE, Chemical Development, Boehringer Ingelheim Pharma GmbH & Co. KG, D-55218 Ingelheim (Rhein), Germany
| | - Victor N Emenike
- HP BioP Launch and Innovation, Boehringer Ingelheim Pharma GmbH & Co. KG, D-55218 Ingelheim (Rhein), Germany
| | - Jonas Kühlborn
- Development NCE, Chemical Development, Boehringer Ingelheim Pharma GmbH & Co. KG, D-55218 Ingelheim (Rhein), Germany
| | - Lisa Vom Dahl
- Development NCE, Analytical Development, Boehringer Ingelheim Pharma GmbH & Co. KG, D-55218 Ingelheim (Rhein), Germany
| | - Robert Möckel
- Development NCE, Chemical Development, Boehringer Ingelheim Pharma GmbH & Co. KG, D-55218 Ingelheim (Rhein), Germany
| | - Jens Smiatek
- Institute for Computational Physics, University of Stuttgart, D-70569 Stuttgart, Germany
- Development NCE, Strategy NCEs, Boehringer Ingelheim Pharma GmbH & Co. KG, D-88397 Biberach (Riss), Germany
| |
Collapse
|
20
|
Kim S, Woo J, Kim WY. Diffusion-based generative AI for exploring transition states from 2D molecular graphs. Nat Commun 2024; 15:341. [PMID: 38184661 PMCID: PMC10771475 DOI: 10.1038/s41467-023-44629-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2023] [Accepted: 12/21/2023] [Indexed: 01/08/2024] Open
Abstract
The exploration of transition state (TS) geometries is crucial for elucidating chemical reaction mechanisms and modeling their kinetics. Recently, machine learning (ML) models have shown remarkable performance for prediction of TS geometries. However, they require 3D conformations of reactants and products often with their appropriate orientations as input, which demands substantial efforts and computational cost. Here, we propose a generative approach based on the stochastic diffusion method, namely TSDiff, for prediction of TS geometries just from 2D molecular graphs. TSDiff outperforms the existing ML models with 3D geometries in terms of both accuracy and efficiency. Moreover, it enables to sample various TS conformations, because it learns the distribution of TS geometries for diverse reactions in training. Thus, TSDiff finds more favorable reaction pathways with lower barrier heights than those in the reference database. These results demonstrate that TSDiff shows promising potential for an efficient and reliable TS exploration.
Collapse
Affiliation(s)
- Seonghwan Kim
- Department of Chemistry, KAIST, 291 Daehak-ro, Yuseong-gu, 34141, Daejeon, Republic of Korea
| | - Jeheon Woo
- Department of Chemistry, KAIST, 291 Daehak-ro, Yuseong-gu, 34141, Daejeon, Republic of Korea
| | - Woo Youn Kim
- Department of Chemistry, KAIST, 291 Daehak-ro, Yuseong-gu, 34141, Daejeon, Republic of Korea.
- AI Institute, KAIST, 291 Daehak-ro, Yuseong-gu, 34141, Daejeon, Republic of Korea.
| |
Collapse
|
21
|
Duan C, Du Y, Jia H, Kulik HJ. Accurate transition state generation with an object-aware equivariant elementary reaction diffusion model. NATURE COMPUTATIONAL SCIENCE 2023; 3:1045-1055. [PMID: 38177724 DOI: 10.1038/s43588-023-00563-7] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/12/2023] [Accepted: 11/03/2023] [Indexed: 01/06/2024]
Abstract
Transition state search is key in chemistry for elucidating reaction mechanisms and exploring reaction networks. The search for accurate 3D transition state structures, however, requires numerous computationally intensive quantum chemistry calculations due to the complexity of potential energy surfaces. Here we developed an object-aware SE(3) equivariant diffusion model that satisfies all physical symmetries and constraints for generating sets of structures-reactant, transition state and product-in an elementary reaction. Provided reactant and product, this model generates a transition state structure in seconds instead of hours, which is typically required when performing quantum-chemistry-based optimizations. The generated transition state structures achieve a median of 0.08 Å root mean square deviation compared to the true transition state. With a confidence scoring model for uncertainty quantification, we approach an accuracy required for reaction barrier estimation (2.6 kcal mol-1) by only performing quantum chemistry-based optimizations on 14% of the most challenging reactions. We envision usefulness for our approach in constructing large reaction networks with unknown mechanisms.
Collapse
Affiliation(s)
- Chenru Duan
- Department of Chemistry, Massachusetts Institute of Technology, Cambridge, MA, US.
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA, US.
| | - Yuanqi Du
- Department of Computer Science, Cornell University, Ithaca, NY, US
| | - Haojun Jia
- Department of Chemistry, Massachusetts Institute of Technology, Cambridge, MA, US
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA, US
| | - Heather J Kulik
- Department of Chemistry, Massachusetts Institute of Technology, Cambridge, MA, US
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA, US
| |
Collapse
|
22
|
Pattanaik L, Menon A, Settels V, Spiekermann KA, Tan Z, Vermeire FH, Sandfort F, Eiden P, Green WH. ConfSolv: Prediction of Solute Conformer-Free Energies across a Range of Solvents. J Phys Chem B 2023; 127:10151-10170. [PMID: 37966798 DOI: 10.1021/acs.jpcb.3c05904] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2023]
Abstract
Predicting Gibbs free energy of solution is key to understanding the solvent effects on thermodynamics and reaction rates for kinetic modeling. Accurately computing solution free energies requires the enumeration and evaluation of relevant solute conformers in solution. However, even after generation of relevant conformers, determining their free energy of solution requires an expensive workflow consisting of several ab initio computational chemistry calculations. To help address this challenge, we generate a large data set of solution free energies for nearly 44,000 solutes with almost 9 million conformers calculated in 41 different solvents using density functional theory and COSMO-RS and quantify the impact of solute conformers on the solution free energy. We then train a message passing neural network to predict the relative solution free energies of a set of solute conformers, enabling the identification of a small subset of thermodynamically relevant conformers. The model offers substantial computational time savings with predictions usually substantially within 1 kcal/mol of the free energy of the solution calculated by using computational chemical methods.
Collapse
Affiliation(s)
- Lagnajit Pattanaik
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Angiras Menon
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Volker Settels
- BASF SE, Scientific Modeling, Group Research, Ludwigshafen am Rhein 67056, Germany
| | - Kevin A Spiekermann
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Zipei Tan
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Florence H Vermeire
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
- Department of Chemical Engineering, KU Leuven, Celestijnenlaan 200F, Leuven 3001, Belgium
| | - Frederik Sandfort
- BASF SE, Scientific Modeling, Group Research, Ludwigshafen am Rhein 67056, Germany
| | - Philipp Eiden
- BASF SE, Scientific Modeling, Group Research, Ludwigshafen am Rhein 67056, Germany
| | - William H Green
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| |
Collapse
|
23
|
Zhao Q, Anstine DM, Isayev O, Savoie BM. Δ 2 machine learning for reaction property prediction. Chem Sci 2023; 14:13392-13401. [PMID: 38033903 PMCID: PMC10686042 DOI: 10.1039/d3sc02408c] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2023] [Accepted: 07/11/2023] [Indexed: 12/02/2023] Open
Abstract
The emergence of Δ-learning models, whereby machine learning (ML) is used to predict a correction to a low-level energy calculation, provides a versatile route to accelerate high-level energy evaluations at a given geometry. However, Δ-learning models are inapplicable to reaction properties like heats of reaction and activation energies that require both a high-level geometry and energy evaluation. Here, a Δ2-learning model is introduced that can predict high-level activation energies based on low-level critical-point geometries. The Δ2 model uses an atom-wise featurization typical of contemporary ML interatomic potentials (MLIPs) and is trained on a dataset of ∼167 000 reactions, using the GFN2-xTB energy and critical-point geometry as a low-level input and the B3LYP-D3/TZVP energy calculated at the B3LYP-D3/TZVP critical point as a high-level target. The excellent performance of the Δ2 model on unseen reactions demonstrates the surprising ease with which the model implicitly learns the geometric deviations between the low-level and high-level geometries that condition the activation energy prediction. The transferability of the Δ2 model is validated on several external testing sets where it shows near chemical accuracy, illustrating the benefits of combining ML models with readily available physical-based information from semi-empirical quantum chemistry calculations. Fine-tuning of the Δ2 model on a small number of Gaussian-4 calculations produced a 35% accuracy improvement over DFT activation energy predictions while retaining xTB-level cost. The Δ2 model approach proves to be an efficient strategy for accelerating chemical reaction characterization with minimal sacrifice in prediction accuracy.
Collapse
Affiliation(s)
- Qiyuan Zhao
- Davidson School of Chemical Engineering, Purdue University West Lafayette IN 47906 USA
| | - Dylan M Anstine
- Department of Chemistry, Carnegie Mellon University Pittsburgh PA 15213 USA
| | - Olexandr Isayev
- Department of Chemistry, Carnegie Mellon University Pittsburgh PA 15213 USA
| | - Brett M Savoie
- Davidson School of Chemical Engineering, Purdue University West Lafayette IN 47906 USA
| |
Collapse
|
24
|
Casetti N, Alfonso-Ramos JE, Coley CW, Stuyver T. Combining Molecular Quantum Mechanical Modeling and Machine Learning for Accelerated Reaction Screening and Discovery. Chemistry 2023; 29:e202301957. [PMID: 37526059 DOI: 10.1002/chem.202301957] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2023] [Revised: 07/30/2023] [Accepted: 07/31/2023] [Indexed: 08/02/2023]
Abstract
Molecular quantum mechanical modeling, accelerated by machine learning, has opened the door to high-throughput screening campaigns of complex properties, such as the activation energies of chemical reactions and absorption/emission spectra of materials and molecules; in silico. Here, we present an overview of the main principles, concepts, and design considerations involved in such hybrid computational quantum chemistry/machine learning screening workflows, with a special emphasis on some recent examples of their successful application. We end with a brief outlook of further advances that will benefit the field.
Collapse
Affiliation(s)
- Nicholas Casetti
- Department of Chemical Engineering, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, Massachusetts, 02139, United States
| | - Javier E Alfonso-Ramos
- Ecole Nationale Supérieure de Chimie de Paris, Université PSL, CNRS, Institute of Chemistry for Life and Health Sciences, 75005, Paris, France
| | - Connor W Coley
- Department of Chemical Engineering, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, Massachusetts, 02139, United States
- Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, Massachusetts, 02139, United States
| | - Thijs Stuyver
- Ecole Nationale Supérieure de Chimie de Paris, Université PSL, CNRS, Institute of Chemistry for Life and Health Sciences, 75005, Paris, France
| |
Collapse
|
25
|
Lewis-Atwell T, Beechey D, Şimşek Ö, Grayson MN. Reformulating Reactivity Design for Data-Efficient Machine Learning. ACS Catal 2023; 13:13506-13515. [PMID: 37881791 PMCID: PMC10594582 DOI: 10.1021/acscatal.3c02513] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2023] [Revised: 08/24/2023] [Indexed: 10/27/2023]
Abstract
Machine learning (ML) can deliver rapid and accurate reaction barrier predictions for use in rational reactivity design. However, model training requires large data sets of typically thousands or tens of thousands of barriers that are very expensive to obtain computationally or experimentally. Furthermore, bespoke data sets are required for each region of interest in reaction space as models typically struggle to generalize. We have therefore reformulated the ML barrier prediction problem toward a much more data-efficient process: finding a reaction from a prespecified set with a desired target value. Our reformulation enables the rapid selection of reactions with purpose-specific activation barriers, for example, in the design of reactivity and selectivity in synthesis, catalyst design, toxicology, and covalent drug discovery, requiring just tens of accurately measured barriers. Importantly, our reformulation does not require generalization beyond the domain of the data set at hand, and we show excellent results for the highly toxicologically and synthetically relevant data sets of aza-Michael addition and transition-metal-catalyzed dihydrogen activation, typically requiring less than 20 accurately measured density functional theory (DFT) barriers. Even for incomplete data sets of E2 and SN2 reactions, with high numbers of missing barriers (74% and 56% respectively), our chosen ML search method still requires significantly fewer data points than the hundreds or thousands needed for more conventional uses of ML to predict activation barriers. Finally, we include a case study in which we use our process to guide the optimization of the dihydrogen activation catalyst. Our approach was able to identify a reaction within 1 kcal mol-1 of the target barrier by only having to run 12 DFT reaction barrier calculations, which illustrates the usage and real-world applicability of this reformulation for systems of high synthetic importance.
Collapse
Affiliation(s)
- Toby Lewis-Atwell
- Department
of Chemistry, University of Bath, Claverton Down, Bath BA2
7AY, U.K.
- Department
of Computer Science, University of Bath, Claverton Down, Bath BA2
7AY, U.K.
| | - Daniel Beechey
- Department
of Computer Science, University of Bath, Claverton Down, Bath BA2
7AY, U.K.
| | - Özgür Şimşek
- Department
of Computer Science, University of Bath, Claverton Down, Bath BA2
7AY, U.K.
| | - Matthew N. Grayson
- Department
of Chemistry, University of Bath, Claverton Down, Bath BA2
7AY, U.K.
| |
Collapse
|
26
|
Zankov D, Madzhidov T, Baskin I, Varnek A. Conjugated quantitative structure-property relationship models: Prediction of kinetic characteristics linked by the Arrhenius equation. Mol Inform 2023; 42:e2200275. [PMID: 37488968 DOI: 10.1002/minf.202200275] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2022] [Revised: 07/08/2023] [Accepted: 07/24/2023] [Indexed: 07/26/2023]
Abstract
Conjugated QSPR models for reactions integrate fundamental chemical laws expressed by mathematical equations with machine learning algorithms. Herein we present a methodology for building conjugated QSPR models integrated with the Arrhenius equation. Conjugated QSPR models were used to predict kinetic characteristics of cycloaddition reactions related by the Arrhenius equation: rate constantl o g k ${{\rm l}{\rm o}{\rm g}k}$ , pre-exponential factorl o g A ${{\rm l}{\rm o}{\rm g}A}$ , and activation energyE a ${{E}_{{\rm a}}}$ . They were benchmarked against single-task (individual and equation-based models) and multi-task models. In individual models, all characteristics were modeled separately, while in multi-task modelsl o g k ${{\rm l}{\rm o}{\rm g}k}$ ,l o g A ${{\rm l}{\rm o}{\rm g}A}$ andE a ${{E}_{{\rm a}}}$ were treated cooperatively. An equation-based model assessedl o g k ${{\rm l}{\rm o}{\rm g}k}$ using the Arrhenius equation andl o g A ${{\rm l}{\rm o}{\rm g}A}$ andE a ${{E}_{{\rm a}}}$ values predicted by individual models. It has been demonstrated that the conjugated QSPR models can accurately predict the reaction rate constants at extreme temperatures, at which reaction rate constants hardly can be measured experimentally. Also, in the case of small training sets conjugated models are more robust than related single-task approaches.
Collapse
Affiliation(s)
- Dmitry Zankov
- Laboratory of Chemoinformatics, University of Strasbourg, France
| | - Timur Madzhidov
- Chemistry Solutions, Elsevier Ltd, Oxford, OX5 1GB, United Kingdom
| | - Igor Baskin
- Department of Materials Science and Engineering, Technion - Israel Institute of Technology, Israel
| | - Alexandre Varnek
- Laboratory of Chemoinformatics, University of Strasbourg, France
| |
Collapse
|
27
|
Zhang P, Yang W. Toward a general neural network force field for protein simulations: Refining the intramolecular interaction in protein. J Chem Phys 2023; 159:024118. [PMID: 37431910 PMCID: PMC10481389 DOI: 10.1063/5.0142280] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2023] [Accepted: 06/22/2023] [Indexed: 07/12/2023] Open
Abstract
Molecular dynamics (MD) is an extremely powerful, highly effective, and widely used approach to understanding the nature of chemical processes in atomic details for proteins. The accuracy of results from MD simulations is highly dependent on force fields. Currently, molecular mechanical (MM) force fields are mainly utilized in MD simulations because of their low computational cost. Quantum mechanical (QM) calculation has high accuracy, but it is exceedingly time consuming for protein simulations. Machine learning (ML) provides the capability for generating accurate potential at the QM level without increasing much computational effort for specific systems that can be studied at the QM level. However, the construction of general machine learned force fields, needed for broad applications and large and complex systems, is still challenging. Here, general and transferable neural network (NN) force fields based on CHARMM force fields, named CHARMM-NN, are constructed for proteins by training NN models on 27 fragments partitioned from the residue-based systematic molecular fragmentation (rSMF) method. The NN for each fragment is based on atom types and uses new input features that are similar to MM inputs, including bonds, angles, dihedrals, and non-bonded terms, which enhance the compatibility of CHARMM-NN to MM MD and enable the implementation of CHARMM-NN force fields in different MD programs. While the main part of the energy of the protein is based on rSMF and NN, the nonbonded interactions between the fragments and with water are taken from the CHARMM force field through mechanical embedding. The validations of the method for dipeptides on geometric data, relative potential energies, and structural reorganization energies demonstrate that the CHARMM-NN local minima on the potential energy surface are very accurate approximations to QM, showing the success of CHARMM-NN for bonded interactions. However, the MD simulations on peptides and proteins indicate that more accurate methods to represent protein-water interactions in fragments and non-bonded interactions between fragments should be considered in the future improvement of CHARMM-NN, which can increase the accuracy of approximation beyond the current mechanical embedding QM/MM level.
Collapse
Affiliation(s)
- Pan Zhang
- Department of Chemistry, Duke University, Durham, North Carolina 27708, USA
| | - Weitao Yang
- Department of Chemistry, Duke University, Durham, North Carolina 27708, USA
| |
Collapse
|
28
|
Xu R, Meisner J, Chang AM, Thompson KC, Martínez TJ. First principles reaction discovery: from the Schrodinger equation to experimental prediction for methane pyrolysis. Chem Sci 2023; 14:7447-7464. [PMID: 37449065 PMCID: PMC10337770 DOI: 10.1039/d3sc01202f] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2023] [Accepted: 06/02/2023] [Indexed: 07/18/2023] Open
Abstract
Our recent success in exploiting graphical processing units (GPUs) to accelerate quantum chemistry computations led to the development of the ab initio nanoreactor, a computational framework for automatic reaction discovery and kinetic model construction. In this work, we apply the ab initio nanoreactor to methane pyrolysis, from automatic reaction discovery to path refinement and kinetic modeling. Elementary reactions occurring during methane pyrolysis are revealed using GPU-accelerated ab initio molecular dynamics simulations. Subsequently, these reaction paths are refined at a higher level of theory with optimized reactant, product, and transition state geometries. Reaction rate coefficients are calculated by transition state theory based on the optimized reaction paths. The discovered reactions lead to a kinetic model with 53 species and 134 reactions, which is validated against experimental data and simulations using literature kinetic models. We highlight the advantage of leveraging local brute force and Monte Carlo sensitivity analysis approaches for efficient identification of important reactions. Both sensitivity approaches can further improve the accuracy of the methane pyrolysis kinetic model. The results in this work demonstrate the power of the ab initio nanoreactor framework for computationally affordable systematic reaction discovery and accurate kinetic modeling.
Collapse
Affiliation(s)
- Rui Xu
- Department of Chemistry, The PULSE Institute, Stanford University Stanford CA 94305 USA
- SLAC National Accelerator Laboratory 2575 Sand Hill Road Menlo Park CA 94025 USA
| | - Jan Meisner
- Department of Chemistry, The PULSE Institute, Stanford University Stanford CA 94305 USA
- SLAC National Accelerator Laboratory 2575 Sand Hill Road Menlo Park CA 94025 USA
| | - Alexander M Chang
- Department of Chemistry, The PULSE Institute, Stanford University Stanford CA 94305 USA
- SLAC National Accelerator Laboratory 2575 Sand Hill Road Menlo Park CA 94025 USA
| | - Keiran C Thompson
- Department of Chemistry, The PULSE Institute, Stanford University Stanford CA 94305 USA
- SLAC National Accelerator Laboratory 2575 Sand Hill Road Menlo Park CA 94025 USA
| | - Todd J Martínez
- Department of Chemistry, The PULSE Institute, Stanford University Stanford CA 94305 USA
- SLAC National Accelerator Laboratory 2575 Sand Hill Road Menlo Park CA 94025 USA
| |
Collapse
|
29
|
Heid E, McGill CJ, Vermeire FH, Green WH. Characterizing Uncertainty in Machine Learning for Chemistry. J Chem Inf Model 2023; 63:4012-4029. [PMID: 37338239 PMCID: PMC10336963 DOI: 10.1021/acs.jcim.3c00373] [Citation(s) in RCA: 16] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2023] [Indexed: 06/21/2023]
Abstract
Characterizing uncertainty in machine learning models has recently gained interest in the context of machine learning reliability, robustness, safety, and active learning. Here, we separate the total uncertainty into contributions from noise in the data (aleatoric) and shortcomings of the model (epistemic), further dividing epistemic uncertainty into model bias and variance contributions. We systematically address the influence of noise, model bias, and model variance in the context of chemical property predictions, where the diverse nature of target properties and the vast chemical chemical space give rise to many different distinct sources of prediction error. We demonstrate that different sources of error can each be significant in different contexts and must be individually addressed during model development. Through controlled experiments on data sets of molecular properties, we show important trends in model performance associated with the level of noise in the data set, size of the data set, model architecture, molecule representation, ensemble size, and data set splitting. In particular, we show that 1) noise in the test set can limit a model's observed performance when the actual performance is much better, 2) using size-extensive model aggregation structures is crucial for extensive property prediction, and 3) ensembling is a reliable tool for uncertainty quantification and improvement specifically for the contribution of model variance. We develop general guidelines on how to improve an underperforming model when falling into different uncertainty contexts.
Collapse
Affiliation(s)
- Esther Heid
- Department
of Chemical Engineering, Massachusetts Institute
of Technology, Cambridge, Massachusetts 02139, United States
- Institute
of Materials Chemistry, TU Wien, 1060 Vienna, Austria
| | - Charles J. McGill
- Department
of Chemical Engineering, Massachusetts Institute
of Technology, Cambridge, Massachusetts 02139, United States
- Department
of Chemical and Life Science Engineering, Virginia Commonwealth University, Richmond, Virginia 23284, United States
| | - Florence H. Vermeire
- Department
of Chemical Engineering, Massachusetts Institute
of Technology, Cambridge, Massachusetts 02139, United States
- Department
of Chemical Engineering, KU Leuven, Celestijnenlaan 200F, B-3001 Leuven, Belgium
| | - William H. Green
- Department
of Chemical Engineering, Massachusetts Institute
of Technology, Cambridge, Massachusetts 02139, United States
| |
Collapse
|
30
|
Liu Q, Tang K, Zhang L, Du J, Meng Q. Computer‐assisted synthetic planning considering reaction kinetics based on transition state automated generation method. AIChE J 2023. [DOI: 10.1002/aic.18092] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/05/2023]
Affiliation(s)
- Qilei Liu
- State Key Laboratory of Fine Chemical, Frontiers Science Center for Smart Materials Oriented Chemical Engineering, Institute of Chemical Process Systems Engineering, School of Chemical Engineering Dalian University of Technology Dalian 116024 China
| | - Kun Tang
- State Key Laboratory of Fine Chemical, Frontiers Science Center for Smart Materials Oriented Chemical Engineering, Institute of Chemical Process Systems Engineering, School of Chemical Engineering Dalian University of Technology Dalian 116024 China
| | - Lei Zhang
- State Key Laboratory of Fine Chemical, Frontiers Science Center for Smart Materials Oriented Chemical Engineering, Institute of Chemical Process Systems Engineering, School of Chemical Engineering Dalian University of Technology Dalian 116024 China
| | - Jian Du
- State Key Laboratory of Fine Chemical, Frontiers Science Center for Smart Materials Oriented Chemical Engineering, Institute of Chemical Process Systems Engineering, School of Chemical Engineering Dalian University of Technology Dalian 116024 China
| | - Qingwei Meng
- State Key Laboratory of Fine Chemical, Frontiers Science Center for Smart Materials Oriented Chemical Engineering, Institute of Chemical Process Systems Engineering, School of Chemical Engineering Dalian University of Technology Dalian 116024 China
- Ningbo Research Institute Dalian University of Technology Ningbo 315016 China
| |
Collapse
|
31
|
Zhao Q, Vaddadi SM, Woulfe M, Ogunfowora LA, Garimella SS, Isayev O, Savoie BM. Comprehensive exploration of graphically defined reaction spaces. Sci Data 2023; 10:145. [PMID: 36935430 PMCID: PMC10025260 DOI: 10.1038/s41597-023-02043-z] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2022] [Accepted: 02/27/2023] [Indexed: 03/21/2023] Open
Abstract
Existing reaction transition state (TS) databases are comparatively small and lack chemical diversity. Here, this data gap has been addressed using the concept of a graphically-defined model reaction to comprehensively characterize a reaction space associated with C, H, O, and N containing molecules with up to 10 heavy (non-hydrogen) atoms. The resulting dataset is composed of 176,992 organic reactions possessing at least one validated TS, activation energy, heat of reaction, reactant and product geometries, frequencies, and atom-mapping. For 33,032 reactions, more than one TS was discovered by conformational sampling, allowing conformational errors in TS prediction to be assessed. Data is supplied at the GFN2-xTB and B3LYP-D3/TZVP levels of theory. A subset of reactions were recalculated at the CCSD(T)-F12/cc-pVDZ-F12 and ωB97X-D2/def2-TZVP levels to establish relative errors. The resulting collection of reactions and properties are called the Reaction Graph Depth 1 (RGD1) dataset. RGD1 represents the largest and most chemically diverse TS dataset published to date and should find immediate use in developing novel machine learning models for predicting reaction properties.
Collapse
Affiliation(s)
- Qiyuan Zhao
- Davidson School of Chemical Engineering, Purdue University, West Lafayette, IN, 47906, USA
| | - Sai Mahit Vaddadi
- Davidson School of Chemical Engineering, Purdue University, West Lafayette, IN, 47906, USA
| | - Michael Woulfe
- Davidson School of Chemical Engineering, Purdue University, West Lafayette, IN, 47906, USA
| | - Lawal A Ogunfowora
- Department of Chemistry, Purdue University, West Lafayette, IN, 47906, USA
| | - Sanjay S Garimella
- Davidson School of Chemical Engineering, Purdue University, West Lafayette, IN, 47906, USA
| | - Olexandr Isayev
- Department of Chemistry, Carnegie Mellon University, Pittsburgh, PA, 15213, USA
| | - Brett M Savoie
- Davidson School of Chemical Engineering, Purdue University, West Lafayette, IN, 47906, USA.
| |
Collapse
|
32
|
Kjeldal FØ, Eriksen JJ. Decomposing Chemical Space: Applications to the Machine Learning of Atomic Energies. J Chem Theory Comput 2023; 19:2029-2038. [PMID: 36926874 DOI: 10.1021/acs.jctc.2c01290] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/18/2023]
Abstract
We apply a number of atomic decomposition schemes across the standard QM7 data set─a small model set of organic molecules at equilibrium geometry─to inspect the possible emergence of trends among contributions to atomization energies from distinct elements embedded within molecules. Specifically, a recent decomposition scheme of ours based on spatially localized molecular orbitals is compared to alternatives that instead partition molecular energies on account of which nuclei individual atomic orbitals are centered on. We find these partitioning schemes to expose the composition of chemical compound space in very dissimilar ways in terms of the grouping, binning, and heterogeneity of discrete atomic contributions, e.g., those associated with hydrogens bonded to different heavy atoms. Furthermore, unphysical dependencies on the one-electron basis set are found for some, but not all of these schemes. The relevance and importance of these compositional factors for training tailored neural network models based on atomic energies are next assessed. We identify both limitations and possible advantages with respect to contemporary machine learning models and discuss the design of potential counterparts based on atoms and the intrinsic energies of these as the principal decomposition units.
Collapse
Affiliation(s)
- Frederik Ø Kjeldal
- DTU Chemistry, Technical University of Denmark Kemitorvet Building 206, 2800 Kongens Lyngby, Denmark
| | - Janus J Eriksen
- DTU Chemistry, Technical University of Denmark Kemitorvet Building 206, 2800 Kongens Lyngby, Denmark
| |
Collapse
|
33
|
García-Andrade X, García Tahoces P, Pérez-Ríos J, Martínez Núñez E. Barrier Height Prediction by Machine Learning Correction of Semiempirical Calculations. J Phys Chem A 2023; 127:2274-2283. [PMID: 36877614 PMCID: PMC10845151 DOI: 10.1021/acs.jpca.2c08340] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2022] [Revised: 02/19/2023] [Indexed: 03/07/2023]
Abstract
Different machine learning (ML) models are proposed in the present work to predict density functional theory-quality barrier heights (BHs) from semiempirical quantum mechanical (SQM) calculations. The ML models include a multitask deep neural network, gradient-boosted trees by means of the XGBoost interface, and Gaussian process regression. The obtained mean absolute errors are similar to those of previous models considering the same number of data points. The ML corrections proposed in this paper could be useful for rapid screening of the large reaction networks that appear in combustion chemistry or in astrochemistry. Finally, our results show that 70% of the features with the highest impact on model output are bespoke predictors. This custom-made set of predictors could be employed by future Δ-ML models to improve the quantitative prediction of other reaction properties.
Collapse
Affiliation(s)
| | - Pablo García Tahoces
- Department
of Electronics and Computer Science, University
of Santiago de Compostela, Santiago de Compostela 15782, Spain
| | - Jesús Pérez-Ríos
- Department
of Physics, Stony Brook University, Stony Brook, New York 11794, United States
- Institute
for Advanced Computational Science, Stony
Brook University, Stony
Brook, New York 11794-3800, United States
| | - Emilio Martínez Núñez
- Department
of Physical Chemistry, University of Santiago
de Compostela, Santiago
de Compostela 15782, Spain
| |
Collapse
|
34
|
Marques E, de Gendt S, Pourtois G, van Setten MJ. Improving Accuracy and Transferability of Machine Learning Chemical Activation Energies by Adding Electronic Structure Information. J Chem Inf Model 2023; 63:1454-1461. [PMID: 36864757 DOI: 10.1021/acs.jcim.2c01502] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/04/2023]
Abstract
Predicting chemical activation energies is one of the longstanding and important challenges in computational chemistry. Recent advances have shown that machine learning can be used to create tools to predict them. Such tools can significantly decrease the computational cost for these predictions compared to traditional methods, which require an optimal path search along a high-dimensional potential energy surface. To enable this new route, we need both large and accurate datasets and a compact yet complete description of the reactions. Although data for chemical reactions is becoming increasingly available, the key step of encoding the reaction as an efficient descriptor remains a big challenge. In this paper, we demonstrate that including electronic energy levels in the description of the reaction significantly improves the prediction accuracy and transferability. Feature importance analysis further demonstrates that electronic energy levels have a higher importance than some structural information and typically require less space in the reaction encoding vector. In general, we observe that the results of the feature importance analysis relate well to the domain knowledge of fundamental chemical principles. This work can help to build better chemical reaction encodings for machine learning and thus improve the predictions of machine learning models for reaction activation energies. These models could ultimately be used to recognize reaction limiting steps in large reaction systems, allowing to account for bottlenecks at the design stage.
Collapse
Affiliation(s)
- Esteban Marques
- Department of Chemistry, KU Leuven (University of Leuven), Celestijnenlaan 200 F, Heverlee 3001, Belgium.,IMEC, Kapeldreef 75, Leuven 3001, Belgium
| | - Stefan de Gendt
- Department of Chemistry, KU Leuven (University of Leuven), Celestijnenlaan 200 F, Heverlee 3001, Belgium.,IMEC, Kapeldreef 75, Leuven 3001, Belgium
| | - Geoffrey Pourtois
- IMEC, Kapeldreef 75, Leuven 3001, Belgium.,Department of Chemistry, University of Antwerp, Campus Drie Eiken, Universiteitsplein 1, Wilrijk 2610, Belgium
| | - Michiel J van Setten
- IMEC, Kapeldreef 75, Leuven 3001, Belgium.,ETSF European Theoretical Spectroscopy Facility, Institut de Physique, Université de Liège, Allée du 6 août 17, Liège 4000, Belgium
| |
Collapse
|
35
|
Chen Y, Ou Y, Zheng P, Huang Y, Ge F, Dral PO. Benchmark of general-purpose machine learning-based quantum mechanical method AIQM1 on reaction barrier heights. J Chem Phys 2023; 158:074103. [PMID: 36813722 DOI: 10.1063/5.0137101] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023] Open
Abstract
Artificial intelligence-enhanced quantum mechanical method 1 (AIQM1) is a general-purpose method that was shown to achieve high accuracy for many applications with a speed close to its baseline semiempirical quantum mechanical (SQM) method ODM2*. Here, we evaluate the hitherto unknown performance of out-of-the-box AIQM1 without any refitting for reaction barrier heights on eight datasets, including a total of ∼24 thousand reactions. This evaluation shows that AIQM1's accuracy strongly depends on the type of transition state and ranges from excellent for rotation barriers to poor for, e.g., pericyclic reactions. AIQM1 clearly outperforms its baseline ODM2* method and, even more so, a popular universal potential, ANI-1ccx. Overall, however, AIQM1 accuracy largely remains similar to SQM methods (and B3LYP/6-31G* for most reaction types) suggesting that it is desirable to focus on improving AIQM1 performance for barrier heights in the future. We also show that the built-in uncertainty quantification helps in identifying confident predictions. The accuracy of confident AIQM1 predictions is approaching the level of popular density functional theory methods for most reaction types. Encouragingly, AIQM1 is rather robust for transition state optimizations, even for the type of reactions it struggles with the most. Single-point calculations with high-level methods on AIQM1-optimized geometries can be used to significantly improve barrier heights, which cannot be said for its baseline ODM2* method.
Collapse
Affiliation(s)
- Yuxinxin Chen
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, Department of Chemistry, and College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, China
| | - Yanchi Ou
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, Department of Chemistry, and College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, China
| | - Peikun Zheng
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, Department of Chemistry, and College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, China
| | - Yaohuang Huang
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, Department of Chemistry, and College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, China
| | - Fuchun Ge
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, Department of Chemistry, and College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, China
| | - Pavlo O Dral
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, Department of Chemistry, and College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, China
| |
Collapse
|
36
|
Tu Z, Stuyver T, Coley CW. Predictive chemistry: machine learning for reaction deployment, reaction development, and reaction discovery. Chem Sci 2023; 14:226-244. [PMID: 36743887 PMCID: PMC9811563 DOI: 10.1039/d2sc05089g] [Citation(s) in RCA: 29] [Impact Index Per Article: 14.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2022] [Accepted: 11/25/2022] [Indexed: 11/29/2022] Open
Abstract
The field of predictive chemistry relates to the development of models able to describe how molecules interact and react. It encompasses the long-standing task of computer-aided retrosynthesis, but is far more reaching and ambitious in its goals. In this review, we summarize several areas where predictive chemistry models hold the potential to accelerate the deployment, development, and discovery of organic reactions and advance synthetic chemistry.
Collapse
Affiliation(s)
- Zhengkai Tu
- Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology 77 Massachusetts Avenue Cambridge MA 02139 USA
| | - Thijs Stuyver
- Department of Chemical Engineering, Massachusetts Institute of Technology 77 Massachusetts Avenue Cambridge MA 02139 USA
| | - Connor W Coley
- Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology 77 Massachusetts Avenue Cambridge MA 02139 USA
- Department of Chemical Engineering, Massachusetts Institute of Technology 77 Massachusetts Avenue Cambridge MA 02139 USA
| |
Collapse
|
37
|
Wen M, Spotte-Smith EWC, Blau SM, McDermott MJ, Krishnapriyan AS, Persson KA. Chemical reaction networks and opportunities for machine learning. NATURE COMPUTATIONAL SCIENCE 2023; 3:12-24. [PMID: 38177958 DOI: 10.1038/s43588-022-00369-z] [Citation(s) in RCA: 27] [Impact Index Per Article: 13.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/24/2022] [Accepted: 11/08/2022] [Indexed: 01/06/2024]
Abstract
Chemical reaction networks (CRNs), defined by sets of species and possible reactions between them, are widely used to interrogate chemical systems. To capture increasingly complex phenomena, CRNs can be leveraged alongside data-driven methods and machine learning (ML). In this Perspective, we assess the diverse strategies available for CRN construction and analysis in pursuit of a wide range of scientific goals, discuss ML techniques currently being applied to CRNs and outline future CRN-ML approaches, presenting scientific and technical challenges to overcome.
Collapse
Affiliation(s)
- Mingjian Wen
- Chemical and Biomolecular Engineering, University of Houston, Houston, TX, USA
- Energy Technologies Area, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Evan Walter Clark Spotte-Smith
- Materials Science Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
- Materials Science and Engineering, University of California, Berkeley, Berkeley, CA, USA
| | - Samuel M Blau
- Energy Technologies Area, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Matthew J McDermott
- Materials Science Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
- Materials Science and Engineering, University of California, Berkeley, Berkeley, CA, USA
| | - Aditi S Krishnapriyan
- Computational Research Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
- Chemical and Biomolecular Engineering, University of California, Berkeley, Berkeley, CA, USA
- Electrical Engineering and Computer Science, University of California, Berkeley, Berkeley, CA, USA
| | - Kristin A Persson
- Materials Science and Engineering, University of California, Berkeley, Berkeley, CA, USA.
- Molecular Foundry, Lawrence Berkeley National Laboratory, Berkeley, CA, USA.
| |
Collapse
|
38
|
Reiser P, Neubert M, Eberhard A, Torresi L, Zhou C, Shao C, Metni H, van Hoesel C, Schopmans H, Sommer T, Friederich P. Graph neural networks for materials science and chemistry. COMMUNICATIONS MATERIALS 2022; 3:93. [PMID: 36468086 PMCID: PMC9702700 DOI: 10.1038/s43246-022-00315-6] [Citation(s) in RCA: 97] [Impact Index Per Article: 32.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/10/2022] [Accepted: 11/07/2022] [Indexed: 05/14/2023]
Abstract
Machine learning plays an increasingly important role in many areas of chemistry and materials science, being used to predict materials properties, accelerate simulations, design new structures, and predict synthesis routes of new materials. Graph neural networks (GNNs) are one of the fastest growing classes of machine learning models. They are of particular relevance for chemistry and materials science, as they directly work on a graph or structural representation of molecules and materials and therefore have full access to all relevant information required to characterize materials. In this Review, we provide an overview of the basic principles of GNNs, widely used datasets, and state-of-the-art architectures, followed by a discussion of a wide range of recent applications of GNNs in chemistry and materials science, and concluding with a road-map for the further development and application of GNNs.
Collapse
Affiliation(s)
- Patrick Reiser
- Institute of Theoretical Informatics, Karlsruhe Institute of Technology, Am Fasanengarten 5, 76131 Karlsruhe, Germany
- Institute of Nanotechnology, Karlsruhe Institute of Technology, Hermann-von-Helmholtz-Platz 1, 76344 Eggenstein-Leopoldshafen, Germany
| | - Marlen Neubert
- Institute of Theoretical Informatics, Karlsruhe Institute of Technology, Am Fasanengarten 5, 76131 Karlsruhe, Germany
| | - André Eberhard
- Institute of Theoretical Informatics, Karlsruhe Institute of Technology, Am Fasanengarten 5, 76131 Karlsruhe, Germany
| | - Luca Torresi
- Institute of Theoretical Informatics, Karlsruhe Institute of Technology, Am Fasanengarten 5, 76131 Karlsruhe, Germany
| | - Chen Zhou
- Institute of Theoretical Informatics, Karlsruhe Institute of Technology, Am Fasanengarten 5, 76131 Karlsruhe, Germany
| | - Chen Shao
- Institute of Theoretical Informatics, Karlsruhe Institute of Technology, Am Fasanengarten 5, 76131 Karlsruhe, Germany
- Present Address: Institute for Applied Informatics and Formal Description Systems, Karlsruhe Institute of Technology, Kaiserstr. 89, 76133 Karlsruhe, Germany
| | - Houssam Metni
- Institute of Theoretical Informatics, Karlsruhe Institute of Technology, Am Fasanengarten 5, 76131 Karlsruhe, Germany
- ECPM, Université de Strasbourg, 25 Rue Becquerel, 67087 Strasbourg, France
| | - Clint van Hoesel
- Institute of Theoretical Informatics, Karlsruhe Institute of Technology, Am Fasanengarten 5, 76131 Karlsruhe, Germany
- Department of Applied Physics, Eindhoven University of Technology, Groene Loper 19, 5612 AP Eindhoven, The Netherlands
| | - Henrik Schopmans
- Institute of Theoretical Informatics, Karlsruhe Institute of Technology, Am Fasanengarten 5, 76131 Karlsruhe, Germany
- Institute of Nanotechnology, Karlsruhe Institute of Technology, Hermann-von-Helmholtz-Platz 1, 76344 Eggenstein-Leopoldshafen, Germany
| | - Timo Sommer
- Institute of Theoretical Informatics, Karlsruhe Institute of Technology, Am Fasanengarten 5, 76131 Karlsruhe, Germany
- Institute for Theory of Condensed Matter, Karlsruhe Institute of Technology, Wolfgang-Gaede-Str. 1, 76131 Karlsruhe, Germany
- Present Address: School of Chemistry, Trinity College Dublin, College Green, Dublin 2, Ireland
| | - Pascal Friederich
- Institute of Theoretical Informatics, Karlsruhe Institute of Technology, Am Fasanengarten 5, 76131 Karlsruhe, Germany
- Institute of Nanotechnology, Karlsruhe Institute of Technology, Hermann-von-Helmholtz-Platz 1, 76344 Eggenstein-Leopoldshafen, Germany
| |
Collapse
|
39
|
Yang L, Chen P, He K, Wang R, Chen G, Shan G, Zhu L. Predicting bioconcentration factor and estrogen receptor bioactivity of bisphenol a and its analogues in adult zebrafish by directed message passing neural networks. ENVIRONMENT INTERNATIONAL 2022; 169:107536. [PMID: 36152365 DOI: 10.1016/j.envint.2022.107536] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/03/2022] [Revised: 08/23/2022] [Accepted: 09/19/2022] [Indexed: 06/16/2023]
Abstract
The bioconcentration factor (BCF) is a key parameter for bioavailability assessment of environmental pollutants in regulatory frameworks. The comparative toxicology and mechanism of action of congeners are also of concern. However, there are limitations to acquire them by conducting field and laboratory experiments while machinelearning is emerging as a promising predictive tool to fill the gap. In this study, the Direct Message Passing Neural Network (DMPNN) was applied to predict logBCFs of bisphenol A (BPA) and its four analogues (bisphenol AF (BPAF), bisphenol B (BPB), bisphenol F (BPF) and bisphenol S (BPS)). For the test set, the Pearson correlation coefficient (PCC) and mean square error (MSE) were 0.85 and 0.52 respectively, suggesting a good predictive performance. The predicted logBCFs values by the DMPNN ranging from 0.35 (BPS) to 2.14 (BPAF) coincided well with those by the classical EPI Suite (BCFBAF model). Besides, estrogen receptor α (ERα) bioactivity of these bisphenols was also predicted well by the DMPNN, with a probability of 97.0 % (BPB) to 99.7 % (BPAF), which was validated by the extent of vitellogenin (VTG) induction in male zebrafish as a biomarker except BPS. Thus, with little need for expert knowledge, DMPNN is confirmed to be a useful tool to accurately predict logBCF and screen for estrogenic activity from molecular structures. Moreover, a gender difference was noted in the changes of three endpoints (logBCF, ER binding affinity and VTG levels), the rank order of which was BPAF > BPB > BPA > BPF > BPS consistently, and abnormal amino acid metabolism is featured as an omics signature of abnormal hormone protein expression.
Collapse
Affiliation(s)
- Liping Yang
- Key Laboratory of Pollution Processes and Environmental Criteria, Ministry of Education, Tianjin Key Laboratory of Environmental Remediation and Pollution Control, College of Environmental Science and Engineering, Nankai University, Tianjin 300350, China
| | - Pengyu Chen
- Key Laboratory of Pollution Processes and Environmental Criteria, Ministry of Education, Tianjin Key Laboratory of Environmental Remediation and Pollution Control, College of Environmental Science and Engineering, Nankai University, Tianjin 300350, China; College of Oceanography, Hohai University, Nanjing 210098, China
| | - Keyan He
- Key Laboratory of Pollution Processes and Environmental Criteria, Ministry of Education, Tianjin Key Laboratory of Environmental Remediation and Pollution Control, College of Environmental Science and Engineering, Nankai University, Tianjin 300350, China
| | - Ruihan Wang
- College of Chemistry, Sichuan University, Chengdu, Sichuan 610064, China
| | - Geng Chen
- School of Pharmaceutical Science and Technology, Hangzhou Institute for Advanced Study, UCAS, Hangzhou 330106, China
| | - Guoqiang Shan
- Key Laboratory of Pollution Processes and Environmental Criteria, Ministry of Education, Tianjin Key Laboratory of Environmental Remediation and Pollution Control, College of Environmental Science and Engineering, Nankai University, Tianjin 300350, China.
| | - Lingyan Zhu
- Key Laboratory of Pollution Processes and Environmental Criteria, Ministry of Education, Tianjin Key Laboratory of Environmental Remediation and Pollution Control, College of Environmental Science and Engineering, Nankai University, Tianjin 300350, China
| |
Collapse
|
40
|
Yarish D, Garkot S, Grygorenko OO, Radchenko DS, Moroz YS, Gurbych O. Advancing molecular graphs with descriptors for the prediction of chemical reaction yields. J Comput Chem 2022; 44:76-92. [PMID: 36264601 DOI: 10.1002/jcc.27016] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2022] [Revised: 08/31/2022] [Accepted: 09/05/2022] [Indexed: 11/08/2022]
Abstract
Chemical yield is the percentage of the reactants converted to the desired products. Chemists use predictive algorithms to select high-yielding reactions and score synthesis routes, saving time and reagents. This study suggests a novel graph neural network architecture for chemical yield prediction. The network combines structural information about participants of the transformation as well as molecular and reaction-level descriptors. It works with incomplete chemical reactions and generates reactants-product atom mapping. We show that the network benefits from advanced information by comparing it with several machine learning models and molecular representations. Models included logistic regression, support vector machine, CatBoost, and Bidirectional Encoder Representations from Transformers. Molecular representations included extended-connectivity fingerprints, Morgan fingerprints, SMILESVec embeddings, and textual. Classification and regression objectives were assessed for each model and feature set. The goal of each classification model was to separate zero- and non-zero-yielding reactions. The models were trained and evaluated on a proprietary dataset of 10 reaction types. Also, the models were benchmarked on two public single reaction type datasets. The study was supplemented with analysis of data, results, and errors, as well as the impact of steric factors, side reactions, isolation, and purification efficiency. The supplementary code is available at https://github.com/SoftServeInc/yield-paper.
Collapse
Affiliation(s)
| | - Sofiya Garkot
- SoftServe, Inc., Lviv, Ukraine.,Ukrainian Catholic University, Lviv, Ukraine
| | - Oleksandr O Grygorenko
- Enamine Ltd., Kyiv, Ukraine.,Taras Shevchenko National University of Kyiv, Kyiv, Ukraine
| | - Dmytro S Radchenko
- Enamine Ltd., Kyiv, Ukraine.,Taras Shevchenko National University of Kyiv, Kyiv, Ukraine
| | - Yurii S Moroz
- Taras Shevchenko National University of Kyiv, Kyiv, Ukraine.,Chemspace LLC, Kyiv, Ukraine
| | - Oleksandr Gurbych
- Lviv Polytechnic National University, Lviv, Ukraine.,Blackthorn AI, Ltd., London, UK
| |
Collapse
|
41
|
Ismail I, Chantreau Majerus R, Habershon S. Graph-Driven Reaction Discovery: Progress, Challenges, and Future Opportunities. J Phys Chem A 2022; 126:7051-7069. [PMID: 36190262 PMCID: PMC9574932 DOI: 10.1021/acs.jpca.2c06408] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2022] [Revised: 09/22/2022] [Indexed: 11/29/2022]
Abstract
Graph-based descriptors, such as bond-order matrices and adjacency matrices, offer a simple and compact way of categorizing molecular structures; furthermore, such descriptors can be readily used to catalog chemical reactions (i.e., bond-making and -breaking). As such, a number of graph-based methodologies have been developed with the goal of automating the process of generating chemical reaction network models describing the possible mechanistic chemistry in a given set of reactant species. Here, we outline the evolution of these graph-based reaction discovery schemes, with particular emphasis on more recent methods incorporating graph-based methods with semiempirical and ab initio electronic structure calculations, minimum-energy path refinements, and transition state searches. Using representative examples from homogeneous catalysis and interstellar chemistry, we highlight how these schemes increasingly act as "virtual reaction vessels" for interrogating mechanistic questions. Finally, we highlight where challenges remain, including issues of chemical accuracy and calculation speeds, as well as the inherent challenge of dealing with the vast size of accessible chemical reaction space.
Collapse
Affiliation(s)
- Idil Ismail
- Department of Chemistry, University
of Warwick, CoventryCV4 7AL, United Kingdom
| | | | - Scott Habershon
- Department of Chemistry, University
of Warwick, CoventryCV4 7AL, United Kingdom
| |
Collapse
|
42
|
Johnson MS, Dong X, Grinberg Dana A, Chung Y, Farina D, Gillis RJ, Liu M, Yee NW, Blondal K, Mazeau E, Grambow CA, Payne AM, Spiekermann KA, Pang HW, Goldsmith CF, West RH, Green WH. RMG Database for Chemical Property Prediction. J Chem Inf Model 2022; 62:4906-4915. [PMID: 36222558 DOI: 10.1021/acs.jcim.2c00965] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
The Reaction Mechanism Generator (RMG) database for chemical property prediction is presented. The RMG database consists of curated datasets and estimators for accurately predicting the parameters necessary for constructing a wide variety of chemical kinetic mechanisms. These datasets and estimators are mostly published and enable prediction of thermodynamics, kinetics, solvation effects, and transport properties. For thermochemistry prediction, the RMG database contains 45 libraries of thermochemical parameters with a combination of 4564 entries and a group additivity scheme with 9 types of corrections including radical, polycyclic, and surface absorption corrections with 1580 total curated groups and parameters for a graph convolutional neural network trained using transfer learning from a set of >130 000 DFT calculations to 10 000 high-quality values. Correction schemes for solvent-solute effects, important for thermochemistry in the liquid phase, are available. They include tabulated values for 195 pure solvents and 152 common solutes and a group additivity scheme for predicting the properties of arbitrary solutes. For kinetics estimation, the database contains 92 libraries of kinetic parameters containing a combined 21 000 reactions and contains rate rule schemes for 87 reaction classes trained on 8655 curated training reactions. Additional libraries and estimators are available for transport properties. All of this information is easily accessible through the graphical user interface at https://rmg.mit.edu. Bulk or on-the-fly use can be facilitated by interfacing directly with the RMG Python package which can be installed from Anaconda. The RMG database provides kineticists with easy access to estimates of the many parameters they need to model and analyze kinetic systems. This helps to speed up and facilitate kinetic analysis by enabling easy hypothesis testing on pathways, by providing parameters for model construction, and by providing checks on kinetic parameters from other sources.
Collapse
Affiliation(s)
- Matthew S Johnson
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts02139, United States
| | - Xiaorui Dong
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts02139, United States
| | - Alon Grinberg Dana
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts02139, United States.,The Wolfson Department of Chemical Engineering, Grand Technion Energy Program (GTEP), Technion─Israel Institute of Technology, Haifa3200003, Israel
| | - Yunsie Chung
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts02139, United States
| | - David Farina
- Department of Chemical Engineering, Northeastern University, Boston, Massachusetts02115, United States
| | - Ryan J Gillis
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts02139, United States
| | - Mengjie Liu
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts02139, United States
| | - Nathan W Yee
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts02139, United States
| | - Katrin Blondal
- School of Engineering, Brown University, Providence, Rhode Island02912, United States
| | - Emily Mazeau
- Department of Chemical Engineering, Northeastern University, Boston, Massachusetts02115, United States
| | - Colin A Grambow
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts02139, United States
| | - A Mark Payne
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts02139, United States
| | - Kevin A Spiekermann
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts02139, United States
| | - Hao-Wei Pang
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts02139, United States
| | - C Franklin Goldsmith
- School of Engineering, Brown University, Providence, Rhode Island02912, United States
| | - Richard H West
- Department of Chemical Engineering, Northeastern University, Boston, Massachusetts02115, United States
| | - William H Green
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts02139, United States
| |
Collapse
|
43
|
Shmilovich K, Willmott D, Batalov I, Kornbluth M, Mailoa J, Kolter JZ. Orbital Mixer: Using Atomic Orbital Features for Basis-Dependent Prediction of Molecular Wavefunctions. J Chem Theory Comput 2022; 18:6021-6030. [PMID: 36122312 DOI: 10.1021/acs.jctc.2c00555] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Leveraging ab initio data at scale has enabled the development of machine learning models capable of extremely accurate and fast molecular property prediction. A central paradigm of many previous studies focuses on generating predictions for only a fixed set of properties. Recent lines of research instead aim to explicitly learn the electronic structure via molecular wavefunctions, from which other quantum chemical properties can be directly derived. While previous methods generate predictions as a function of only the atomic configuration, in this work we present an alternate approach that directly purposes basis-dependent information to predict molecular electronic structure. Our model, Orbital Mixer, is composed entirely of multi-layer perceptrons (MLPs) using MLP-Mixer layers within a simple, intuitive, and scalable architecture that achieves competitive Hamiltonian and molecular orbital energy and coefficient prediction accuracies compared to the state-of-the-art.
Collapse
Affiliation(s)
- Kirill Shmilovich
- Pritzker School of Molecular Engineering, University of Chicago, Chicago, Illinois 60637, United States
| | - Devin Willmott
- Bosch Center for Artificial Intelligence, Pittsburgh, Pennsylvania 15222, United States
| | - Ivan Batalov
- Bosch Center for Artificial Intelligence, Pittsburgh, Pennsylvania 15222, United States
| | - Mordechai Kornbluth
- Bosch Research and Technology Center, Cambridge, Massachusetts 02139, United States
| | - Jonathan Mailoa
- Tencent Quantum Laboratory, Shenzhen, Guangdong 518057, China
| | - J Zico Kolter
- Bosch Center for Artificial Intelligence, Pittsburgh, Pennsylvania 15222, United States.,Carnegie Mellon University, Pittsburgh, Pennsylvania 15222, United States
| |
Collapse
|
44
|
Houston PL, Nandi A, Bowman JM. A Machine Learning Approach for Rate Constants. III. Application to the Cl( 2P) + CH 4 → CH 3 + HCl Reaction. J Phys Chem A 2022; 126:5672-5679. [PMID: 35960874 DOI: 10.1021/acs.jpca.2c04376] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
The temperature dependence of the thermal rate constant for the reaction Cl(3P) + CH4 → HCl + CH3 is calculated using a Gaussian Process machine learning (ML) approach to train on and predict thermal rate constants over a large temperature range. Following procedures developed in two previous reports, we use a training data set of approximately 40 reaction/potential surface combinations, each of which is used to calculate the corresponding database of rate constant at approximately eight temperatures. For the current application, we train on the entire data set and then predict the temperature dependence of the title reaction employing a "split" data set for correction at low and high temperatures to capture both tunneling and recrossing. The results are an improvement on recent RPMD calculations compared to accurate quantum ones, using the same high-level ab initio potential energy surface. Both tunneling at low temperatures and significant recrossing at high temperatures are observed to influence the rate constants. The recrossing effects, which are not described by TST and even sophisticated tunneling corrections, do appear in experiment at temperatures above around 600 K. The ML results describe these effects and in fact merge at 600 K with RPMD results (which can describe recrossing), and both are close to experiment at the highest experimental temperatures. These results are in accord with a recent high-level experiment-theory study of this reaction.
Collapse
Affiliation(s)
- Paul L Houston
- Department of Chemistry and Chemical Biology, Cornell University, Ithaca, New York 14853, United States.,Department of Chemistry and Biochemistry, Georgia Institute of Technology, Atlanta, Georgia 30332, United States
| | - Apurba Nandi
- Cherry L. Emerson Center for Scientific Computation and Department of Chemistry, Emory University, Atlanta, Georgia 30322, United States
| | - Joel M Bowman
- Cherry L. Emerson Center for Scientific Computation and Department of Chemistry, Emory University, Atlanta, Georgia 30322, United States
| |
Collapse
|
45
|
Zhu LT, Chen XZ, Ouyang B, Yan WC, Lei H, Chen Z, Luo ZH. Review of Machine Learning for Hydrodynamics, Transport, and Reactions in Multiphase Flows and Reactors. Ind Eng Chem Res 2022. [DOI: 10.1021/acs.iecr.2c01036] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Affiliation(s)
- Li-Tao Zhu
- Department of Chemical Engineering, School of Chemistry and Chemical Engineering, State Key Laboratory of Metal Matrix Composites, Shanghai Jiao Tong University, Shanghai, 200240, P. R. China
| | - Xi-Zhong Chen
- Department of Chemical and Biological Engineering, University of Sheffield, Sheffield, S1 3JD, U.K
| | - Bo Ouyang
- Department of Chemical Engineering, School of Chemistry and Chemical Engineering, State Key Laboratory of Metal Matrix Composites, Shanghai Jiao Tong University, Shanghai, 200240, P. R. China
| | - Wei-Cheng Yan
- School of Chemistry and Chemical Engineering, Jiangsu University, Zhenjiang, Jiangsu 212013, China
| | - He Lei
- Department of Chemical Engineering, School of Chemistry and Chemical Engineering, State Key Laboratory of Metal Matrix Composites, Shanghai Jiao Tong University, Shanghai, 200240, P. R. China
| | - Zhe Chen
- Department of Chemical Engineering, School of Chemistry and Chemical Engineering, State Key Laboratory of Metal Matrix Composites, Shanghai Jiao Tong University, Shanghai, 200240, P. R. China
| | - Zheng-Hong Luo
- Department of Chemical Engineering, School of Chemistry and Chemical Engineering, State Key Laboratory of Metal Matrix Composites, Shanghai Jiao Tong University, Shanghai, 200240, P. R. China
| |
Collapse
|
46
|
Spiekermann K, Pattanaik L, Green WH. High accuracy barrier heights, enthalpies, and rate coefficients for chemical reactions. Sci Data 2022; 9:417. [PMID: 35851390 PMCID: PMC9293986 DOI: 10.1038/s41597-022-01529-6] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2022] [Accepted: 06/30/2022] [Indexed: 12/13/2022] Open
Abstract
Quantitative chemical reaction data, including activation energies and reaction rates, are crucial for developing detailed kinetic mechanisms and accurately predicting reaction outcomes. However, such data are often difficult to find, and high-quality datasets are especially rare. Here, we use CCSD(T)-F12a/cc-pVDZ-F12//ωB97X-D3/def2-TZVP to obtain high-quality single point calculations for nearly 22,000 unique stable species and transition states. We report the results from these quantum chemistry calculations and extract the barrier heights and reaction enthalpies to create a kinetics dataset of nearly 12,000 gas-phase reactions. These reactions involve H, C, N, and O, contain up to seven heavy atoms, and have cleaned atom-mapped SMILES. Our higher-accuracy coupled-cluster barrier heights differ significantly (RMSE of ∼5 kcal mol-1) relative to those calculated at ωB97X-D3/def2-TZVP. We also report accurate transition state theory rate coefficients [Formula: see text] between 300 K and 2000 K and the corresponding Arrhenius parameters for a subset of rigid reactions. We believe this data will accelerate development of automated and reliable methods for quantitative reaction prediction.
Collapse
Affiliation(s)
- Kevin Spiekermann
- Department of Chemical Engineering, Massachusetts Institute of Technology, 77 Massachusetts Ave, Cambridge, MA, 02139, USA
| | - Lagnajit Pattanaik
- Department of Chemical Engineering, Massachusetts Institute of Technology, 77 Massachusetts Ave, Cambridge, MA, 02139, USA
| | - William H Green
- Department of Chemical Engineering, Massachusetts Institute of Technology, 77 Massachusetts Ave, Cambridge, MA, 02139, USA.
| |
Collapse
|
47
|
Komp E, Valleau S. Low-cost prediction of molecular and transition state partition functions via machine learning. Chem Sci 2022; 13:7900-7906. [PMID: 35865893 PMCID: PMC9258343 DOI: 10.1039/d2sc01334g] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2022] [Accepted: 06/10/2022] [Indexed: 11/21/2022] Open
Abstract
We have generated an open-source dataset of over 30 000 organic chemistry gas phase partition functions. With this data, a machine learning deep neural network estimator was trained to predict partition functions of unknown organic chemistry gas phase transition states. This estimator only relies on reactant and product geometries and partition functions. A second machine learning deep neural network was trained to predict partition functions of chemical species from their geometry. Our models accurately predict the logarithm of test set partition functions with a maximum mean absolute error of 2.7%. Thus, this approach provides a means to reduce the cost of computing reaction rate constants ab initio. The models were also used to compute transition state theory reaction rate constant prefactors and the results were in quantitative agreement with the corresponding ab initio calculations with an accuracy of 98.3% on the log scale.
Collapse
Affiliation(s)
- Evan Komp
- Chemical Engineering, University of Washington 3781 Okanogan Ln Seattle WA 98195 USA
| | - Stéphanie Valleau
- Chemical Engineering, University of Washington 3781 Okanogan Ln Seattle WA 98195 USA
| |
Collapse
|
48
|
Lewis‐Atwell T, Townsend PA, Grayson MN. Machine learning activation energies of chemical reactions. WIRES COMPUTATIONAL MOLECULAR SCIENCE 2022. [DOI: 10.1002/wcms.1593] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Affiliation(s)
- Toby Lewis‐Atwell
- Department of Computer Science, Faculty of Science University of Bath Bath UK
| | - Piers A. Townsend
- Department of Chemistry, Faculty of Science University of Bath Bath UK
| | | |
Collapse
|
49
|
Spiekermann KA, Pattanaik L, Green WH. Fast Predictions of Reaction Barrier Heights: Toward Coupled-Cluster Accuracy. J Phys Chem A 2022; 126:3976-3986. [PMID: 35727075 DOI: 10.1021/acs.jpca.2c02614] [Citation(s) in RCA: 25] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Abstract
Quantitative estimates of reaction barriers are essential for developing kinetic mechanisms and predicting reaction outcomes. However, the lack of experimental data and the steep scaling of accurate quantum calculations often hinder the ability to obtain reliable kinetic values. Here, we train a directed message passing neural network on nearly 24,000 diverse gas-phase reactions calculated at CCSD(T)-F12a/cc-pVDZ-F12//ωB97X-D3/def2-TZVP. Our model uses 75% fewer parameters than previous studies, an improved reaction representation, and proper data splits to accurately estimate performance on unseen reactions. Using information from only the reactant and product, our model quickly predicts barrier heights with a testing MAE of 2.6 kcal mol-1 relative to the coupled-cluster data, making it more accurate than a good density functional theory calculation. Furthermore, our results show that future modeling efforts to estimate reaction properties would significantly benefit from fine-tuning calibration using a transfer learning technique. We anticipate this model will accelerate and improve kinetic predictions for small molecule chemistry.
Collapse
Affiliation(s)
- Kevin A Spiekermann
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Lagnajit Pattanaik
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - William H Green
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| |
Collapse
|
50
|
Farrar EHE, Grayson MN. Machine learning and semi-empirical calculations: a synergistic approach to rapid, accurate, and mechanism-based reaction barrier prediction. Chem Sci 2022; 13:7594-7603. [PMID: 35872815 PMCID: PMC9242013 DOI: 10.1039/d2sc02925a] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2022] [Accepted: 06/08/2022] [Indexed: 11/21/2022] Open
Abstract
Modern QM modelling methods, such as DFT, have provided detailed mechanistic insights into countless reactions. However, their computational cost inhibits their ability to rapidly screen large numbers of substrates and catalysts in reaction discovery. For a C-C bond forming nitro-Michael addition, we introduce a synergistic semi-empirical quantum mechanical (SQM) and machine learning (ML) approach that allows the prediction of DFT-quality reaction barriers in minutes, even on a standard laptop using widely available modelling software. Mean absolute errors (MAEs) are obtained that are below the accepted chemical accuracy threshold of 1 kcal mol-1 and substantially better than SQM methods without ML correction (5.71 kcal mol-1). Predictive power is shown to hold when the ML models are applied to an unseen set of compounds from the toxicology literature. Mechanistic insight is also achieved via the generation of full SQM transition state (TS) structures which are found to be very good approximations for the DFT-level geometries, revealing important steric interactions in some TSs. This combination of speed, accuracy, and mechanistic insight is unprecedented; current ML barrier models compromise on at least one of these important criteria.
Collapse
Affiliation(s)
- Elliot H E Farrar
- Department of Chemistry, University of Bath Claverton Down Bath BA2 7AY UK
| | - Matthew N Grayson
- Department of Chemistry, University of Bath Claverton Down Bath BA2 7AY UK
| |
Collapse
|