1
|
Overstreet R, King E, Clopton G, Nguyen J, Ciesielski D. QC-GN 2oMS 2: a Graph Neural Net for High Resolution Mass Spectra Prediction. J Chem Inf Model 2024; 64:5806-5816. [PMID: 39013165 DOI: 10.1021/acs.jcim.4c00446] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/18/2024]
Abstract
Predicting the mass spectrum of a molecular ion is often accomplished via three generalized approaches: rules-based methods for bond breaking, deep learning, or quantum chemical (QC) modeling. Rules-based approaches are often limited by the conditions for different chemical subspaces and perform poorly under chemical regimes with few defined rules. QC modeling is theoretically robust but requires significant amounts of computational time to produce a spectrum for a given target. Among deep learning techniques, graph neural networks (GNNs) have performed better than previous work with fingerprint-based neural networks in mass spectra prediction. To explore this technique further, we investigate the effects of including quantum chemically derived information as edge features in the GNN to increase predictive accuracy. The models we investigated include categorical bond order, bond force constants derived from extended tight-binding (xTB) quantum chemistry, and acyclic bond dissociation energies. We evaluated these models against a control GNN with no edge features in the input graphs. Bond dissociation enthalpies yielded the best improvement with a cosine similarity score of 0.462 relative to the baseline model (0.437). In this work we also apply dynamic graph attention which improves performance on benchmark problems and supports the inclusion of edge features. Between implementations, we investigate the nature of the molecular embedding for spectra prediction and discuss the recognition of fragment topographies in distinct chemistries for further development in tandem mass spectrometry prediction.
Collapse
Affiliation(s)
- Richard Overstreet
- Signature Science and Technology Division, Pacific Northwest National Laboratory, Richland, Washington 99352, United States
| | - Ethan King
- Computing and Analytics Division, Pacific Northwest National Laboratory, Richland, Washington 99352, United States
| | - Grady Clopton
- Department of Chemistry, Tennessee State University, Nashville, Tennessee 37209, United States
| | - Julia Nguyen
- Computing and Analytics Division, Pacific Northwest National Laboratory, Richland, Washington 99352, United States
| | - Danielle Ciesielski
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, Washington 99352, United States
| |
Collapse
|
2
|
An H, Liu X, Cai W, Shao X. AttenGpKa: A Universal Predictor of Solvation Acidity Using Graph Neural Network and Molecular Topology. J Chem Inf Model 2024; 64:5480-5491. [PMID: 38982757 DOI: 10.1021/acs.jcim.4c00449] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/11/2024]
Abstract
Rapid and accurate calculation of acid dissociation constant (pKa) is crucial for designing chemical synthesis routes, optimizing catalysts, and predicting chemical behavior. Despite recent progress in machine learning, predicting solvation acidity, especially in nonaqueous solvents, remains challenging due to limited experimental data. This challenge arises from treating experimental values in different solvents as distinct data domains and modeling them separately. In this work, we treat both the solutes and solvents equally from a perspective of molecular topology and propose a highly universal framework called AttenGpKa for predicting solvation acidity. AttenGpKa is trained using 26,522 experimental pKa values from 60 pure and mixed solvents in the iBonD database. As a result, our model can simultaneously predict the pKa values of a compound in various solvents, including pure water, pure nonaqueous, and mixed solvents. AttenGpKa achieves universality by using graph neural networks and attention mechanisms to learn complex effects within solute and solvent molecules. Furthermore, encodings of both solute and solvent molecules are adaptively fused to simulate the influence of the solvent on acid dissociation. AttenGpKa demonstrates robust generalization in extensive validations. The interpretability studies further indicate that our model has effectively learnt electronic and solvent effects. A free-to-use software is provided to facilitate the use of AttenGpKa for pKa prediction.
Collapse
Affiliation(s)
- Hongle An
- Research Center for Analytical Sciences, Tianjin Key Laboratory of Biosensing and Molecular Recognition, State Key Laboratory of Medicinal Chemical Biology, College of Chemistry, Nankai University, Tianjin 300071, China
- Haihe Laboratory of Sustainable Chemical Transformations, Tianjin 300192, China
| | - Xuyang Liu
- Research Center for Analytical Sciences, Tianjin Key Laboratory of Biosensing and Molecular Recognition, State Key Laboratory of Medicinal Chemical Biology, College of Chemistry, Nankai University, Tianjin 300071, China
- Haihe Laboratory of Sustainable Chemical Transformations, Tianjin 300192, China
| | - Wensheng Cai
- Research Center for Analytical Sciences, Tianjin Key Laboratory of Biosensing and Molecular Recognition, State Key Laboratory of Medicinal Chemical Biology, College of Chemistry, Nankai University, Tianjin 300071, China
- Haihe Laboratory of Sustainable Chemical Transformations, Tianjin 300192, China
| | - Xueguang Shao
- Research Center for Analytical Sciences, Tianjin Key Laboratory of Biosensing and Molecular Recognition, State Key Laboratory of Medicinal Chemical Biology, College of Chemistry, Nankai University, Tianjin 300071, China
- Haihe Laboratory of Sustainable Chemical Transformations, Tianjin 300192, China
| |
Collapse
|
3
|
Shao Y, Ren Z, Han Z, Chen L, Li Y, Xue XS. Predicting bond dissociation energies of cyclic hypervalent halogen reagents using DFT calculations and graph attention network model. Beilstein J Org Chem 2024; 20:1444-1452. [PMID: 38952960 PMCID: PMC11216094 DOI: 10.3762/bjoc.20.127] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2024] [Accepted: 06/17/2024] [Indexed: 07/03/2024] Open
Abstract
Although hypervalent iodine(III) reagents have become staples in organic chemistry, the exploration of their isoelectronic counterparts, namely hypervalent bromine(III) and chlorine(III) reagents, has been relatively limited, partly due to challenges in synthesizing and stabilizing these compounds. In this study, we conduct a thorough examination of both homolytic and heterolytic bond dissociation energies (BDEs) critical for assessing the chemical stability and functional group transfer capability of cyclic hypervalent halogen compounds using density functional theory (DFT) analysis. A moderate linear correlation was observed between the homolytic BDEs across different halogen centers, while a strong linear correlation was noted among the heterolytic BDEs across these centers. Furthermore, we developed a predictive model for both homolytic and heterolytic BDEs of cyclic hypervalent halogen compounds using machine learning algorithms. The results of this study could aid in estimating the chemical stability and functional group transfer capabilities of hypervalent bromine(III) and chlorine(III) reagents, thereby facilitating their development.
Collapse
Affiliation(s)
- Yingbo Shao
- State Key Laboratory of Elemento-Organic Chemistry, College of Chemistry, Nankai University, Tianjin 300071, P. R. China
| | - Zhiyuan Ren
- State Key Laboratory of Elemento-Organic Chemistry, College of Chemistry, Nankai University, Tianjin 300071, P. R. China
| | - Zhihui Han
- State Key Laboratory of Elemento-Organic Chemistry, College of Chemistry, Nankai University, Tianjin 300071, P. R. China
| | - Li Chen
- State Key Laboratory of Elemento-Organic Chemistry, College of Chemistry, Nankai University, Tianjin 300071, P. R. China
| | - Yao Li
- Key Laboratory of Fluorine and Nitrogen Chemistry and Advanced Materials, Shanghai Institute of Organic Chemistry, University of Chinese Academy of Sciences, Shanghai 200032, P. R. China,
| | - Xiao-Song Xue
- Key Laboratory of Fluorine and Nitrogen Chemistry and Advanced Materials, Shanghai Institute of Organic Chemistry, University of Chinese Academy of Sciences, Shanghai 200032, P. R. China,
- School of Chemistry and Material Sciences, Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Hangzhou 310024, P. R. China
| |
Collapse
|
4
|
Wang R, He Z, Chen H, Guo S, Zhang S, Wang K, Wang M, Ho SH. Enhancing biomass conversion to bioenergy with machine learning: Gains and problems. THE SCIENCE OF THE TOTAL ENVIRONMENT 2024; 927:172310. [PMID: 38599406 DOI: 10.1016/j.scitotenv.2024.172310] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/18/2024] [Revised: 03/20/2024] [Accepted: 04/06/2024] [Indexed: 04/12/2024]
Abstract
The growing concerns about environmental sustainability and energy security, such as exhaustion of traditional fossil fuels and global carbon footprint growth have led to an increasing interest in alternative energy sources, especially bioenergy. Recently, numerous scenarios have been proposed regarding the use of bioenergy from different sources in the future energy systems. In this regard, one of the biggest challenges for scientists is managing, modeling, decision-making, and future forecasting of bioenergy systems. The development of machine learning (ML) techniques can provide new opportunities for modeling, optimizing and managing the production, consumption and environmental effects of bioenergy. However, researchers in bioenergy fields have not widely utilized the ML concepts and practices. Therefore, a comparative review of the current ML techniques used for bioenergy productions is presented in this paper. This review summarizes the common issues and difficulties existing in integrating ML with bioenergy studies, and discusses and proposes the possible solutions. Additionally, a detailed discussion of the appropriate ML application scenarios is also conducted in every sector of the entire bioenergy chain. This indicates the modernized conversion processes supported by ML techniques are imperative to accurately capture process-level subtleties, and thus improving techno-economic resilience and socio-ecological integrity of bioenergy production. All the efforts are believed to help in sustainable bioenergy production with ML technologies for the future.
Collapse
Affiliation(s)
- Rupeng Wang
- State Key Laboratory of Urban Water Resource and Environment, School of Environment, Harbin Institute of Technology, Harbin 150040, PR China
| | - Zixiang He
- State Key Laboratory of Urban Water Resource and Environment, School of Environment, Harbin Institute of Technology, Harbin 150040, PR China
| | - Honglin Chen
- State Key Laboratory of Urban Water Resource and Environment, School of Environment, Harbin Institute of Technology, Harbin 150040, PR China
| | - Silin Guo
- School of Medicine and Health, Harbin Institute of Technology, Harbin 150040, PR China
| | - Shiyu Zhang
- State Key Laboratory of Urban Water Resource and Environment, School of Environment, Harbin Institute of Technology, Harbin 150040, PR China
| | - Ke Wang
- State Key Laboratory of Urban Water Resource and Environment, School of Environment, Harbin Institute of Technology, Harbin 150040, PR China
| | - Meng Wang
- State Key Laboratory of Urban Water Resource and Environment, School of Environment, Harbin Institute of Technology, Harbin 150040, PR China
| | - Shih-Hsin Ho
- State Key Laboratory of Urban Water Resource and Environment, School of Environment, Harbin Institute of Technology, Harbin 150040, PR China.
| |
Collapse
|
5
|
Luan Y, Li X, Kong D, Li W, Li W, Zhang Q, Pang A. Development and uniqueness test of highly selective atomic topological indices based on the number of attached hydrogen atoms. J Mol Graph Model 2024; 129:108752. [PMID: 38479237 DOI: 10.1016/j.jmgm.2024.108752] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2023] [Accepted: 02/27/2024] [Indexed: 04/15/2024]
Abstract
On the basis of the atomic graph-theoretical index - aEAID (atomic Extended Adjacency matrix IDentification) and molecular adjacent topological index - ATID (Adjacent Topological IDentification) suggested by one of the authors (Zhang Q), a highly selective atomic topological index - aATID (atomic Adjacent Topological IDentification) index was suggested to identify the equivalent atoms in this study. The aATID index of an atom was derived from the number of the attached hydrogen atoms of the atom but omitting bond types. In this case, the suggested index can be used to identify equivalent atoms in chemistry but perhaps not equivalent in the molecular graph. To test the uniqueness of aATID indices, the virtual atomic data sets were derived from alkanes containing 15-20 carbon atoms and the isomers of Octogen, as well as a real data set was derived from the NCI database. Only four pairs of atoms from alkanes containing 20 carbons can't be discriminated by aATID, that is, four pairs of degenerates were found for this data set. To solve this problem, the aATID index was modified by introducing distance factors between atoms, and the 2-aATID index was suggested. Its uniqueness was examined by 5,939,902 atoms derived from alkanes containing 20 carbons and further 16,166,984 atoms from alkanes of 21 carbons, and no degenerates were found. In addition, another large real data set of 16,650,688 atoms derived from the PubChem database was also used to test the uniqueness of both aATID and 2-aATID. As a result, each atom was successfully discriminated by any of the two indices. Finally, the suggested aATID index was applied to the identification of duplicate atoms as data pretreatment for QSPR (Quantitative Structure-Property Relationships) studies.
Collapse
Affiliation(s)
- Yue Luan
- Henan Engineering Research Center of Industrial Circulating Water Treatment, Henan Joint International Research Laboratory of Environmental Pollution Control Materials, Henan University, Kaifeng, 475004, China
| | - Xianlan Li
- Henan Engineering Research Center of Industrial Circulating Water Treatment, Henan Joint International Research Laboratory of Environmental Pollution Control Materials, Henan University, Kaifeng, 475004, China
| | - Dingling Kong
- Henan Engineering Research Center of Industrial Circulating Water Treatment, Henan Joint International Research Laboratory of Environmental Pollution Control Materials, Henan University, Kaifeng, 475004, China
| | - Wanli Li
- Henan Engineering Research Center of Industrial Circulating Water Treatment, Henan Joint International Research Laboratory of Environmental Pollution Control Materials, Henan University, Kaifeng, 475004, China
| | - Wei Li
- Science and Technology on Aerospace Chemical Power Laboratory, Hubei Institute of Aerospace Chemotechnology, Xiangyang, 441003, Hubei, China
| | - Qingyou Zhang
- Henan Engineering Research Center of Industrial Circulating Water Treatment, Henan Joint International Research Laboratory of Environmental Pollution Control Materials, Henan University, Kaifeng, 475004, China.
| | - Aimin Pang
- Science and Technology on Aerospace Chemical Power Laboratory, Hubei Institute of Aerospace Chemotechnology, Xiangyang, 441003, Hubei, China.
| |
Collapse
|
6
|
Gou Q, Liu J, Su H, Guo Y, Chen J, Zhao X, Pu X. Exploring an accurate machine learning model to quickly estimate stability of diverse energetic materials. iScience 2024; 27:109452. [PMID: 38523799 PMCID: PMC10960145 DOI: 10.1016/j.isci.2024.109452] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2023] [Revised: 01/27/2024] [Accepted: 03/06/2024] [Indexed: 03/26/2024] Open
Abstract
High energy and low sensitivity have been the focus of developing new energetic materials (EMs). However, there has been a lack of a quick and accurate method for evaluating the stability of diverse EMs. Here, we develop a machine learning prediction model with high accuracy for bond dissociation energy (BDE) of EMs. A reliable and representative BDE dataset of EMs is constructed by collecting 778 experimental energetic compounds and quantum mechanics calculation. To sufficiently characterize the BDE of EMs, a hybrid feature representation is proposed by coupling the local target bond into the global structure characteristics. To alleviate the limitation of the low dataset, pairwise difference regression is utilized as a data augmentation with the advantage of reducing systematic errors and improving diversity. Benefiting from these improvements, the XGBoost model achieves the best prediction accuracy with R2 of 0.98 and MAE of 8.8 kJ mol-1, significantly outperforming competitive models.
Collapse
Affiliation(s)
- Qiaolin Gou
- College of Chemistry, Sichuan University, Chengdu 610064, China
| | - Jing Liu
- College of Chemistry, Sichuan University, Chengdu 610064, China
| | - Haoming Su
- College of Chemistry, Sichuan University, Chengdu 610064, China
| | - Yanzhi Guo
- College of Chemistry, Sichuan University, Chengdu 610064, China
| | - Jiayi Chen
- College of Chemistry, Sichuan University, Chengdu 610064, China
| | - Xueyan Zhao
- Institute of Chemical Materials, China Academy of Engineering Physics, Mianyang 621900, China
| | - Xuemei Pu
- College of Chemistry, Sichuan University, Chengdu 610064, China
| |
Collapse
|
7
|
Liu H, Chen P, Huang X, Wei X. A physical organic strategy to predict and interpret stabilities of chemical bonds in energetic compounds for the discovery of thermal-resistant properties. J Mol Model 2024; 30:84. [PMID: 38407671 DOI: 10.1007/s00894-024-05877-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2023] [Accepted: 02/09/2024] [Indexed: 02/27/2024]
Abstract
CONTEXT The in-depth understanding about the stability of chemical bonds in energetic compounds plays a central role for molecular design and safety-related evaluations. Most energetic compounds contain nitro as explosophores, and nitro cleavage is fundamental for thermal and mechanical stability. However, the quantum chemistry approach to accurately predict energy and temperature properties related to bond stability is challenging, due to the tradeoff between computational costs and deviations. Herein, the bond orders are proposed as accurate and computational-cost efficient descriptors for predicting the chemical bond stability and thermal-resistant properties. The intrinsic bond strength index (IBSI) demonstrates the best prediction for experimental homolytic bond dissociation energies (R2 > 0.996), which is on par with the results from high-precision quantum chemistry methods. The effects from bond connectivity and steric hindrance hierarchy were analyzed to reveal underlying mechanisms. Additionally, the IBSI descriptors are successfully applied to predict the thermal decomposition temperatures of 24 heat-resistant energetic compounds (R2 = 0.995), thus validating the effectiveness for the prediction and interpretation of chemical bond stability in energetic compounds via a physical organic approach. METHODS All DFT calculations were performed with Gaussian 09 software. To investigate the dependence of the method on functionals and basis sets, 9 DFT methods were considered (B3LYP/6-31G(d,p), B3LYP/6-311G(d,p), B3LYP/def2-TZVP, M062X/6-31G(d,p), M062X/6-311G(d,p), M062X/def2-TZVP, ωB97XD/6-31G(d,p), ωB97XD/6-311G(d,p), and ωB97XD/def2-TZVP). The bond order descriptors LBO and IBSI are obtained through the bond order analysis module in the Multiwfn software.
Collapse
Affiliation(s)
- Haitao Liu
- School of National Defense & Nuclear Science and Technology, Southwest University of Science and Technology, Mianyang, 621010, People's Republic of China
- Institute of Chemical Materials, China Academy of Engineering Physics (CAEP), Mianyang, 621900, People's Republic of China
| | - Peng Chen
- School of National Defense & Nuclear Science and Technology, Southwest University of Science and Technology, Mianyang, 621010, People's Republic of China
- Institute of Chemical Materials, China Academy of Engineering Physics (CAEP), Mianyang, 621900, People's Republic of China
| | - Xin Huang
- Institute of Chemical Materials, China Academy of Engineering Physics (CAEP), Mianyang, 621900, People's Republic of China
| | - Xianfeng Wei
- School of National Defense & Nuclear Science and Technology, Southwest University of Science and Technology, Mianyang, 621010, People's Republic of China.
| |
Collapse
|
8
|
Kim Y, Jung H, Kumar S, Paton RS, Kim S. Designing solvent systems using self-evolving solubility databases and graph neural networks. Chem Sci 2024; 15:923-939. [PMID: 38239675 PMCID: PMC10793204 DOI: 10.1039/d3sc03468b] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2023] [Accepted: 12/04/2023] [Indexed: 01/22/2024] Open
Abstract
Designing solvent systems is key to achieving the facile synthesis and separation of desired products from chemical processes, so many machine learning models have been developed to predict solubilities. However, breakthroughs are needed to address deficiencies in the model's predictive accuracy and generalizability; this can be addressed by expanding and integrating experimental and computational solubility databases. To maximize predictive accuracy, these two databases should not be trained separately, and they should not be simply combined without reconciling the discrepancies from different magnitudes of errors and uncertainties. Here, we introduce self-evolving solubility databases and graph neural networks developed through semi-supervised self-training approaches. Solubilities from quantum-mechanical calculations are referred to during semi-supervised learning, but they are not directly added to the experimental database. Dataset augmentation is performed from 11 637 experimental solubilities to >900 000 data points in the integrated database, while correcting for the discrepancies between experiment and computation. Our model was successfully applied to study solvent selection in organic reactions and separation processes. The accuracy (mean absolute error around 0.2 kcal mol-1 for the test set) is quantitatively useful in exploring Linear Free Energy Relationships between reaction rates and solvation free energies for 11 organic reactions. Our model also accurately predicted the partition coefficients of lignin-derived monomers and drug-like molecules. While there is room for expanding solubility predictions to transition states, radicals, charged species, and organometallic complexes, this approach will be attractive to predictive chemistry areas where experimental, computational, and other heterogeneous data should be combined.
Collapse
Affiliation(s)
- Yeonjoon Kim
- Department of Chemistry, Colorado State University Fort Collins CO 80523 USA
- Department of Chemistry, Pukyong National University Busan 48513 Republic of Korea
| | - Hojin Jung
- Department of Chemistry, Colorado State University Fort Collins CO 80523 USA
| | - Sabari Kumar
- Department of Chemistry, Colorado State University Fort Collins CO 80523 USA
| | - Robert S Paton
- Department of Chemistry, Colorado State University Fort Collins CO 80523 USA
| | - Seonah Kim
- Department of Chemistry, Colorado State University Fort Collins CO 80523 USA
| |
Collapse
|
9
|
Pu C, Zeng T. Comparative Evaluation of Chemical and Photolytic Denitrosation Methods for Chemiluminescence Detection of Total N-Nitrosamines in Wastewater Samples. ENVIRONMENTAL SCIENCE & TECHNOLOGY 2023; 57:7526-7536. [PMID: 37140470 DOI: 10.1021/acs.est.2c09769] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/05/2023]
Abstract
N-Nitrosamines form as byproducts during oxidative water treatment and occur as impurities in consumer and industrial products. To date, two methods based on chemiluminescence (CL) detection of nitric oxide liberated from N-nitrosamines via denitrosation with acidic triiodide (HI3) treatment or ultraviolet (UV) photolysis have been developed to enable the quantification of total N-nitrosamines (TONO) in environmental water samples. In this work, we configured an integrated experimental setup to compare the performance of HI3-CL and UV-CL methods with a focus on their applicability for TONO measurements in wastewater samples. With the use of a large-volume purge vessel for chemical denitrosation, the HI3-CL method achieved signal stability and detection limits comparable to those achieved by the UV-CL method which utilized a microphotochemical reactor for photolytic denitrosation. Sixty-six structurally diverse N-nitroso compounds (NOCs) yielded a range of conversion efficiencies relative to N-nitrosodimethylamine (NDMA) regardless of the conditions applied for denitrosation. On average, TONO measured in preconcentrated raw and chloraminated wastewater samples by the HI3-CL method were 2.1 ± 1.1 times those measured by the UV-CL method, pointing to potential matrix interferences as further confirmed by spike recovery tests. Overall, our comparative assessment of the HI3-CL and UV-CL methods serves as a basis for addressing methodological gaps in TONO analysis.
Collapse
Affiliation(s)
- Changcheng Pu
- Department of Civil and Environmental Engineering, Syracuse University, 151 Link Hall, Syracuse, New York 13244, United States
| | - Teng Zeng
- Department of Civil and Environmental Engineering, Syracuse University, 151 Link Hall, Syracuse, New York 13244, United States
| |
Collapse
|
10
|
Venetos MC, Wen M, Persson KA. Machine Learning Full NMR Chemical Shift Tensors of Silicon Oxides with Equivariant Graph Neural Networks. J Phys Chem A 2023; 127:2388-2398. [PMID: 36862997 PMCID: PMC10026072 DOI: 10.1021/acs.jpca.2c07530] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/04/2023]
Abstract
The nuclear magnetic resonance (NMR) chemical shift tensor is a highly sensitive probe of the electronic structure of an atom and furthermore its local structure. Recently, machine learning has been applied to NMR in the prediction of isotropic chemical shifts from a structure. Current machine learning models, however, often ignore the full chemical shift tensor for the easier-to-predict isotropic chemical shift, effectively ignoring a multitude of structural information available in the NMR chemical shift tensor. Here we use an equivariant graph neural network (GNN) to predict full 29Si chemical shift tensors in silicate materials. The equivariant GNN model predicts full tensors to a mean absolute error of 1.05 ppm and is able to accurately determine the magnitude, anisotropy, and tensor orientation in a diverse set of silicon oxide local structures. When compared with other models, the equivariant GNN model outperforms the state-of-the-art machine learning models by 53%. The equivariant GNN model also outperforms historic analytical models by 57% for isotropic chemical shift and 91% for anisotropy. The software is available as a simple-to-use open-source repository, allowing similar models to be created and trained with ease.
Collapse
Affiliation(s)
- Maxwell C Venetos
- Department of Materials Science and Engineering, University of California, Berkeley, California 94720, United States
| | - Mingjian Wen
- Department of Chemical and Biomolecular Engineering, University of Houston, Houston, Texas 77204, United States
| | - Kristin A Persson
- Department of Materials Science and Engineering, University of California, Berkeley, California 94720, United States
- Molecular Foundry, Lawrence Berkeley National Laboratory, Berkeley, California 94720, United States
| |
Collapse
|
11
|
Kjeldal FØ, Eriksen JJ. Decomposing Chemical Space: Applications to the Machine Learning of Atomic Energies. J Chem Theory Comput 2023; 19:2029-2038. [PMID: 36926874 DOI: 10.1021/acs.jctc.2c01290] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/18/2023]
Abstract
We apply a number of atomic decomposition schemes across the standard QM7 data set─a small model set of organic molecules at equilibrium geometry─to inspect the possible emergence of trends among contributions to atomization energies from distinct elements embedded within molecules. Specifically, a recent decomposition scheme of ours based on spatially localized molecular orbitals is compared to alternatives that instead partition molecular energies on account of which nuclei individual atomic orbitals are centered on. We find these partitioning schemes to expose the composition of chemical compound space in very dissimilar ways in terms of the grouping, binning, and heterogeneity of discrete atomic contributions, e.g., those associated with hydrogens bonded to different heavy atoms. Furthermore, unphysical dependencies on the one-electron basis set are found for some, but not all of these schemes. The relevance and importance of these compositional factors for training tailored neural network models based on atomic energies are next assessed. We identify both limitations and possible advantages with respect to contemporary machine learning models and discuss the design of potential counterparts based on atoms and the intrinsic energies of these as the principal decomposition units.
Collapse
Affiliation(s)
- Frederik Ø Kjeldal
- DTU Chemistry, Technical University of Denmark Kemitorvet Building 206, 2800 Kongens Lyngby, Denmark
| | - Janus J Eriksen
- DTU Chemistry, Technical University of Denmark Kemitorvet Building 206, 2800 Kongens Lyngby, Denmark
| |
Collapse
|
12
|
Li W, Luan Y, Zhang Q, Aires‐de‐Sousa J. Machine Learning to Predict Homolytic Dissociation Energies of C-H Bonds: Calibration of DFT-based Models with Experimental Data. Mol Inform 2023; 42:e2200193. [PMID: 36167940 PMCID: PMC10078411 DOI: 10.1002/minf.202200193] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2022] [Accepted: 09/27/2022] [Indexed: 01/12/2023]
Abstract
Random Forest (RF) QSPR models were developed with a data set of homolytic bond dissociation energies (BDE) previously calculated by B3LYP/6-311++G(d,p)//DFTB for 2263 sp3C-H covalent bonds. The best set of attributes consisted in 114 descriptors of the carbon atom (counts of atom types in 5 spheres around the kernel atom and ring descriptors). The optimized model predicted the DFT-calculated BDE of an independent test set of 224 bonds with MAE=2.86 kcal/mol. A new data set of 409 bonds from the iBonD database (http://ibond.nankai.edu.cn) was predicted by the RF with a modest MAE (5.36 kcal/mol) but a relatively high R2 (0.75) against experimental energies. A prediction scheme was explored that corrects the RF prediction with the average deviation observed for the k nearest neighbours (KNN) in an additional memory of experimental data. The corrected predictions achieved MAE=2.22 kcal/mol for an independent test set of 145 bonds and the corresponding experimental bond energies.
Collapse
Affiliation(s)
- Wanli Li
- Henan Engineering Research Center of Industrial Circulating Water TreatmentHenan Joint International Research Laboratory of Environmental Pollution Control MaterialsHenan UniversityKaifeng475004P.R. China
| | - Yue Luan
- Henan Engineering Research Center of Industrial Circulating Water TreatmentHenan Joint International Research Laboratory of Environmental Pollution Control MaterialsHenan UniversityKaifeng475004P.R. China
| | - Qingyou Zhang
- Henan Engineering Research Center of Industrial Circulating Water TreatmentHenan Joint International Research Laboratory of Environmental Pollution Control MaterialsHenan UniversityKaifeng475004P.R. China
| | - Joao Aires‐de‐Sousa
- LAQV and REQUIMTEChemistry DepartmentNOVA School of Science and TechnologyUniversidade Nova de Lisboa2829-516CaparicaPortugal
| |
Collapse
|
13
|
Wen M, Spotte-Smith EWC, Blau SM, McDermott MJ, Krishnapriyan AS, Persson KA. Chemical reaction networks and opportunities for machine learning. NATURE COMPUTATIONAL SCIENCE 2023; 3:12-24. [PMID: 38177958 DOI: 10.1038/s43588-022-00369-z] [Citation(s) in RCA: 21] [Impact Index Per Article: 21.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/24/2022] [Accepted: 11/08/2022] [Indexed: 01/06/2024]
Abstract
Chemical reaction networks (CRNs), defined by sets of species and possible reactions between them, are widely used to interrogate chemical systems. To capture increasingly complex phenomena, CRNs can be leveraged alongside data-driven methods and machine learning (ML). In this Perspective, we assess the diverse strategies available for CRN construction and analysis in pursuit of a wide range of scientific goals, discuss ML techniques currently being applied to CRNs and outline future CRN-ML approaches, presenting scientific and technical challenges to overcome.
Collapse
Affiliation(s)
- Mingjian Wen
- Chemical and Biomolecular Engineering, University of Houston, Houston, TX, USA
- Energy Technologies Area, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Evan Walter Clark Spotte-Smith
- Materials Science Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
- Materials Science and Engineering, University of California, Berkeley, Berkeley, CA, USA
| | - Samuel M Blau
- Energy Technologies Area, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Matthew J McDermott
- Materials Science Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
- Materials Science and Engineering, University of California, Berkeley, Berkeley, CA, USA
| | - Aditi S Krishnapriyan
- Computational Research Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
- Chemical and Biomolecular Engineering, University of California, Berkeley, Berkeley, CA, USA
- Electrical Engineering and Computer Science, University of California, Berkeley, Berkeley, CA, USA
| | - Kristin A Persson
- Materials Science and Engineering, University of California, Berkeley, Berkeley, CA, USA.
- Molecular Foundry, Lawrence Berkeley National Laboratory, Berkeley, CA, USA.
| |
Collapse
|
14
|
Low K, Coote ML, Izgorodina EI. Explainable Solvation Free Energy Prediction Combining Graph Neural Networks with Chemical Intuition. J Chem Inf Model 2022; 62:5457-5470. [PMID: 36317829 DOI: 10.1021/acs.jcim.2c01013] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
The prediction of a molecule's solvation Gibbs free (ΔGsolv) energy in a given solvent is an important task which has traditionally been carried out via quantum chemical continuum methods or force field-based molecular simulations. Machine learning (ML) and graph neural networks in particular have emerged as powerful techniques for elucidating structure-property relationships. This work presents a graph neural network (GNN) for the prediction of ΔGsolv which, in addition to encoding typical atom and bond-level features, incorporates chemically intuitive, solvation-relevant parameters into the featurization process: semiempirical partial atomic charges and solvent dielectric constant. Solute-solvent interactions are included via an interaction map layer which can be visualized to examine solubility-enhancing or -decreasing interactions learnt by the model. On a test set of small organic molecules, our GNN predicts ΔGsolv in water and cyclohexane with an accuracy comparable to polarizable and ab initio generated force field methods [mean absolute error (MAE) = 0.4 and 0.2 kcal mol-1, respectively], without the need for any molecular simulation. For the FreeSolv data set of hydration free energies, the test MAE is 0.7 kcal mol-1. Interpretability and applicability of the model is highlighted through several examples including rationalizing the increased solubility of modified diaminoanthraquinones in organic solvents. The clear explanations afforded by our GNN allow for easy understanding of the model's predictions, giving the experimental chemist confidence in employing ML models toward more optimized synthetic routes.
Collapse
Affiliation(s)
- Kaycee Low
- Monash Computational Chemistry Group, School of Chemistry, Monash University, Clayton, Victoria3800, Australia
| | - Michelle L Coote
- Institute for Nanoscale Science and Technology, College of Science and Engineering, Flinders University, Bedford Park, South Australia5042, Australia
| | - Ekaterina I Izgorodina
- Monash Computational Chemistry Group, School of Chemistry, Monash University, Clayton, Victoria3800, Australia
| |
Collapse
|
15
|
Reiser P, Neubert M, Eberhard A, Torresi L, Zhou C, Shao C, Metni H, van Hoesel C, Schopmans H, Sommer T, Friederich P. Graph neural networks for materials science and chemistry. COMMUNICATIONS MATERIALS 2022; 3:93. [PMID: 36468086 PMCID: PMC9702700 DOI: 10.1038/s43246-022-00315-6] [Citation(s) in RCA: 68] [Impact Index Per Article: 34.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/10/2022] [Accepted: 11/07/2022] [Indexed: 05/14/2023]
Abstract
Machine learning plays an increasingly important role in many areas of chemistry and materials science, being used to predict materials properties, accelerate simulations, design new structures, and predict synthesis routes of new materials. Graph neural networks (GNNs) are one of the fastest growing classes of machine learning models. They are of particular relevance for chemistry and materials science, as they directly work on a graph or structural representation of molecules and materials and therefore have full access to all relevant information required to characterize materials. In this Review, we provide an overview of the basic principles of GNNs, widely used datasets, and state-of-the-art architectures, followed by a discussion of a wide range of recent applications of GNNs in chemistry and materials science, and concluding with a road-map for the further development and application of GNNs.
Collapse
Affiliation(s)
- Patrick Reiser
- Institute of Theoretical Informatics, Karlsruhe Institute of Technology, Am Fasanengarten 5, 76131 Karlsruhe, Germany
- Institute of Nanotechnology, Karlsruhe Institute of Technology, Hermann-von-Helmholtz-Platz 1, 76344 Eggenstein-Leopoldshafen, Germany
| | - Marlen Neubert
- Institute of Theoretical Informatics, Karlsruhe Institute of Technology, Am Fasanengarten 5, 76131 Karlsruhe, Germany
| | - André Eberhard
- Institute of Theoretical Informatics, Karlsruhe Institute of Technology, Am Fasanengarten 5, 76131 Karlsruhe, Germany
| | - Luca Torresi
- Institute of Theoretical Informatics, Karlsruhe Institute of Technology, Am Fasanengarten 5, 76131 Karlsruhe, Germany
| | - Chen Zhou
- Institute of Theoretical Informatics, Karlsruhe Institute of Technology, Am Fasanengarten 5, 76131 Karlsruhe, Germany
| | - Chen Shao
- Institute of Theoretical Informatics, Karlsruhe Institute of Technology, Am Fasanengarten 5, 76131 Karlsruhe, Germany
- Present Address: Institute for Applied Informatics and Formal Description Systems, Karlsruhe Institute of Technology, Kaiserstr. 89, 76133 Karlsruhe, Germany
| | - Houssam Metni
- Institute of Theoretical Informatics, Karlsruhe Institute of Technology, Am Fasanengarten 5, 76131 Karlsruhe, Germany
- ECPM, Université de Strasbourg, 25 Rue Becquerel, 67087 Strasbourg, France
| | - Clint van Hoesel
- Institute of Theoretical Informatics, Karlsruhe Institute of Technology, Am Fasanengarten 5, 76131 Karlsruhe, Germany
- Department of Applied Physics, Eindhoven University of Technology, Groene Loper 19, 5612 AP Eindhoven, The Netherlands
| | - Henrik Schopmans
- Institute of Theoretical Informatics, Karlsruhe Institute of Technology, Am Fasanengarten 5, 76131 Karlsruhe, Germany
- Institute of Nanotechnology, Karlsruhe Institute of Technology, Hermann-von-Helmholtz-Platz 1, 76344 Eggenstein-Leopoldshafen, Germany
| | - Timo Sommer
- Institute of Theoretical Informatics, Karlsruhe Institute of Technology, Am Fasanengarten 5, 76131 Karlsruhe, Germany
- Institute for Theory of Condensed Matter, Karlsruhe Institute of Technology, Wolfgang-Gaede-Str. 1, 76131 Karlsruhe, Germany
- Present Address: School of Chemistry, Trinity College Dublin, College Green, Dublin 2, Ireland
| | - Pascal Friederich
- Institute of Theoretical Informatics, Karlsruhe Institute of Technology, Am Fasanengarten 5, 76131 Karlsruhe, Germany
- Institute of Nanotechnology, Karlsruhe Institute of Technology, Hermann-von-Helmholtz-Platz 1, 76344 Eggenstein-Leopoldshafen, Germany
| |
Collapse
|
16
|
Miao J, Cao F, Ye H, Li M, Yang B. Revisiting graph neural networks from hybrid regularized graph signal reconstruction. Neural Netw 2022; 157:444-459. [DOI: 10.1016/j.neunet.2022.11.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2022] [Revised: 10/23/2022] [Accepted: 11/03/2022] [Indexed: 11/13/2022]
|
17
|
Kim S, Chen J, Cheng T, Gindulyte A, He J, He S, Li Q, Shoemaker BA, Thiessen PA, Yu B, Zaslavsky L, Zhang J, Bolton EE. PubChem 2023 update. Nucleic Acids Res 2022; 51:D1373-D1380. [PMID: 36305812 PMCID: PMC9825602 DOI: 10.1093/nar/gkac956] [Citation(s) in RCA: 697] [Impact Index Per Article: 348.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2022] [Revised: 10/06/2022] [Accepted: 10/13/2022] [Indexed: 01/30/2023] Open
Abstract
PubChem (https://pubchem.ncbi.nlm.nih.gov) is a popular chemical information resource that serves a wide range of use cases. In the past two years, a number of changes were made to PubChem. Data from more than 120 data sources was added to PubChem. Some major highlights include: the integration of Google Patents data into PubChem, which greatly expanded the coverage of the PubChem Patent data collection; the creation of the Cell Line and Taxonomy data collections, which provide quick and easy access to chemical information for a given cell line and taxon, respectively; and the update of the bioassay data model. In addition, new functionalities were added to the PubChem programmatic access protocols, PUG-REST and PUG-View, including support for target-centric data download for a given protein, gene, pathway, cell line, and taxon and the addition of the 'standardize' option to PUG-REST, which returns the standardized form of an input chemical structure. A significant update was also made to PubChemRDF. The present paper provides an overview of these changes.
Collapse
Affiliation(s)
- Sunghwan Kim
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Department of Health and Human Services, Bethesda, MD, 20894, USA
| | - Jie Chen
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Department of Health and Human Services, Bethesda, MD, 20894, USA
| | - Tiejun Cheng
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Department of Health and Human Services, Bethesda, MD, 20894, USA
| | - Asta Gindulyte
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Department of Health and Human Services, Bethesda, MD, 20894, USA
| | - Jia He
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Department of Health and Human Services, Bethesda, MD, 20894, USA
| | - Siqian He
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Department of Health and Human Services, Bethesda, MD, 20894, USA
| | - Qingliang Li
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Department of Health and Human Services, Bethesda, MD, 20894, USA
| | - Benjamin A Shoemaker
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Department of Health and Human Services, Bethesda, MD, 20894, USA
| | - Paul A Thiessen
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Department of Health and Human Services, Bethesda, MD, 20894, USA
| | - Bo Yu
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Department of Health and Human Services, Bethesda, MD, 20894, USA
| | - Leonid Zaslavsky
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Department of Health and Human Services, Bethesda, MD, 20894, USA
| | - Jian Zhang
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Department of Health and Human Services, Bethesda, MD, 20894, USA
| | - Evan E Bolton
- To whom correspondence should be addressed. Tel: +1 301 451 1811; Fax: +1 301 480 4559;
| |
Collapse
|
18
|
Nie W, Liu D, Li S, Yu H, Fu Y. Nucleophilicity Prediction Using Graph Neural Networks. J Chem Inf Model 2022; 62:4319-4328. [PMID: 36097394 DOI: 10.1021/acs.jcim.2c00696] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
The quantitative description between chemical reaction rates and nucleophilicity parameters plays a crucial role in organic chemistry. In this regard, the formula proposed by Mayr et al. and the constructed reactivity database are important representatives. However, the determination of Mayr's nucleophilicity parameter N often requires time-consuming experiments with reference electrophiles in the solvent. Several machine learning (ML)-based models have been proposed to realize the data-driven prediction of N in recent years. However, in addition to DFT-calculated electronic descriptors, most of them also use a set of artificially predefined structural descriptors as input, which may result in a biased representation of the nucleophile's structural information depending on descriptors' definition preference. Compared with traditional ML algorithms, graph neural networks (GNNs) can naturally take the molecule's structural information into account by applying the message passing technique. We herein proposed a SchNet-based GNN model that only takes the molecular conformation and solvent type as input. The model achieves a comparable performance to the previous benchmark study on 10-fold cross-validation of 894 data points (R2 = 0.91, RMSE = 2.25). To enhance the model's ability to capture the molecule's electronic information, some DFT-calculated parameters are then incorporated into the model via graph global features, and substantial improvement is achieved in the prediction precision (R2 = 0.95, RMSE = 1.63). These results demonstrate that both structural and electronic information are important for the prediction of N, and GNN can integrate these two kinds of information more effectively.
Collapse
Affiliation(s)
- Wan Nie
- Hefei National Laboratory for Physical Sciences at the Microscale, CAS Key Laboratory of Urban Pollutant Conversion, Anhui Province Key Laboratory of Biomass Clean Energy, Center for Excellence in Molecular Synthesis of CAS, Institute of Energy, Hefei Comprehensive National Science Center, University of Science and Technology of China, Hefei 230026, China.,Department of Computer Science, City University of Hong Kong, Hong Kong 999077, China
| | - Deguang Liu
- Hefei National Laboratory for Physical Sciences at the Microscale, CAS Key Laboratory of Urban Pollutant Conversion, Anhui Province Key Laboratory of Biomass Clean Energy, Center for Excellence in Molecular Synthesis of CAS, Institute of Energy, Hefei Comprehensive National Science Center, University of Science and Technology of China, Hefei 230026, China
| | - Shuaicheng Li
- Department of Computer Science, City University of Hong Kong, Hong Kong 999077, China
| | - Haizhu Yu
- Department of Chemistry and Centre for Atomic Engineering of Advanced Materials, Anhui Province Key Laboratory of Chemistry for Inorganic/Organic Hybrid Functionalized Materials, Anhui University, Hefei 230601, China
| | - Yao Fu
- Hefei National Laboratory for Physical Sciences at the Microscale, CAS Key Laboratory of Urban Pollutant Conversion, Anhui Province Key Laboratory of Biomass Clean Energy, Center for Excellence in Molecular Synthesis of CAS, Institute of Energy, Hefei Comprehensive National Science Center, University of Science and Technology of China, Hefei 230026, China
| |
Collapse
|
19
|
Xu W, Reuter K, Andersen M. Predicting binding motifs of complex adsorbates using machine learning with a physics-inspired graph representation. NATURE COMPUTATIONAL SCIENCE 2022; 2:443-450. [PMID: 38177870 DOI: 10.1038/s43588-022-00280-7] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/27/2022] [Accepted: 06/17/2022] [Indexed: 01/06/2024]
Abstract
Computational screening in heterogeneous catalysis relies increasingly on machine learning models for predicting key input parameters due to the high cost of computing these directly using first-principles methods. This becomes especially relevant when considering complex materials spaces such as alloys, or complex reaction mechanisms with adsorbates that may exhibit bi- or higher-dentate adsorption motifs. Here we present a data-efficient approach to the prediction of binding motifs and associated adsorption enthalpies of complex adsorbates at transition metals and their alloys based on a customized Wasserstein Weisfeiler-Lehman graph kernel and Gaussian process regression. The model shows good predictive performance, not only for the elemental transition metals on which it was trained, but also for an alloy based on these transition metals. Furthermore, incorporation of minimal new training data allows for predicting an out-of-domain transition metal. We believe the model may be useful in active learning approaches, for which we present an ensemble uncertainty estimation approach.
Collapse
Affiliation(s)
- Wenbin Xu
- Chair for Theoretical Chemistry and Catalysis Research Center, Technische Universität München, Garching, Germany
- Fritz-Haber-Institut der Max-Planck-Gesellschaft, Berlin, Germany
| | - Karsten Reuter
- Fritz-Haber-Institut der Max-Planck-Gesellschaft, Berlin, Germany
| | - Mie Andersen
- Aarhus Institute of Advanced Studies, Aarhus University, Aarhus, Denmark.
- Department of Physics and Astronomy-Center for Interstellar Catalysis, Aarhus University, Aarhus, Denmark.
| |
Collapse
|
20
|
Stuyver T, Coley CW. Quantum chemistry-augmented neural networks for reactivity prediction: Performance, generalizability, and explainability. J Chem Phys 2022; 156:084104. [DOI: 10.1063/5.0079574] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023] Open
Abstract
There is a perceived dichotomy between structure-based and descriptor-based molecular representations used for predictive chemistry tasks. Here, we study the performance, generalizability, and explainability of the quantum mechanics-augmented graph neural network (ml-QM-GNN) architecture as applied to the prediction of regioselectivity (classification) and of activation energies (regression). In our hybrid QM-augmented model architecture, structure-based representations are first used to predict a set of atom- and bond-level reactivity descriptors derived from density functional theory calculations. These estimated reactivity descriptors are combined with the original structure-based representation to make the final reactivity prediction. We demonstrate that our model architecture leads to significant improvements over structure-based GNNs in not only overall accuracy but also in generalization to unseen compounds. Even when provided training sets of only a couple hundred labeled data points, the ml-QM-GNN outperforms other state-of-the-art structure-based architectures that have been applied to these tasks as well as descriptor-based (linear) regressions. As a primary contribution of this work, we demonstrate a bridge between data-driven predictions and conceptual frameworks commonly used to gain qualitative insights into reactivity phenomena, taking advantage of the fact that our models are grounded in (but not restricted to) QM descriptors. This effort results in a productive synergy between theory and data science, wherein QM-augmented models provide a data-driven confirmation of previous qualitative analyses, and these analyses in turn facilitate insights into the decision-making process occurring within ml-QM-GNNs.
Collapse
Affiliation(s)
- Thijs Stuyver
- Department of Chemical Engineering, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, Massachusetts 02139, USA
| | - Connor W. Coley
- Department of Chemical Engineering, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, Massachusetts 02139, USA
- Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, Massachusetts 02139, USA
| |
Collapse
|
21
|
Wen M, Blau SM, Xie X, Dwaraknath S, Persson KA. Improving machine learning performance on small chemical reaction data with unsupervised contrastive pretraining. Chem Sci 2022; 13:1446-1458. [PMID: 35222929 PMCID: PMC8809395 DOI: 10.1039/d1sc06515g] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2021] [Accepted: 01/09/2022] [Indexed: 11/21/2022] Open
Abstract
Machine learning (ML) methods have great potential to transform chemical discovery by accelerating the exploration of chemical space and drawing scientific insights from data. However, modern chemical reaction ML models, such as those based on graph neural networks (GNNs), must be trained on a large amount of labelled data in order to avoid overfitting the data and thus possessing low accuracy and transferability. In this work, we propose a strategy to leverage unlabelled data to learn accurate ML models for small labelled chemical reaction data. We focus on an old and prominent problem-classifying reactions into distinct families-and build a GNN model for this task. We first pretrain the model on unlabelled reaction data using unsupervised contrastive learning and then fine-tune it on a small number of labelled reactions. The contrastive pretraining learns by making the representations of two augmented versions of a reaction similar to each other but distinct from other reactions. We propose chemically consistent reaction augmentation methods that protect the reaction center and find they are the key for the model to extract relevant information from unlabelled data to aid the reaction classification task. The transfer learned model outperforms a supervised model trained from scratch by a large margin. Further, it consistently performs better than models based on traditional rule-driven reaction fingerprints, which have long been the default choice for small datasets, as well as those based on reaction fingerprints derived from masked language modelling. In addition to reaction classification, the effectiveness of the strategy is tested on regression datasets; the learned GNN-based reaction fingerprints can also be used to navigate the chemical reaction space, which we demonstrate by querying for similar reactions. The strategy can be readily applied to other predictive reaction problems to uncover the power of unlabelled data for learning better models with a limited supply of labels.
Collapse
Affiliation(s)
- Mingjian Wen
- Energy Technologies Area, Lawrence Berkeley National Laboratory Berkeley CA 94720 USA
| | - Samuel M Blau
- Energy Technologies Area, Lawrence Berkeley National Laboratory Berkeley CA 94720 USA
| | - Xiaowei Xie
- College of Chemistry, University of California Berkeley CA 94720 USA
- Materials Science Division, Lawrence Berkeley National Laboratory Berkeley CA 94720 USA
| | | | - Kristin A Persson
- Department of Materials Science and Engineering, University of California Berkeley CA 94720 USA
- Molecular Foundry, Lawrence Berkeley National Laboratory Berkeley CA 94720 USA
| |
Collapse
|
22
|
Gensch T, Dos Passos Gomes G, Friederich P, Peters E, Gaudin T, Pollice R, Jorner K, Nigam A, Lindner-D'Addario M, Sigman MS, Aspuru-Guzik A. A Comprehensive Discovery Platform for Organophosphorus Ligands for Catalysis. J Am Chem Soc 2022; 144:1205-1217. [PMID: 35020383 DOI: 10.1021/jacs.1c09718] [Citation(s) in RCA: 68] [Impact Index Per Article: 34.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Abstract
The design of molecular catalysts typically involves reconciling multiple conflicting property requirements, largely relying on human intuition and local structural searches. However, the vast number of potential catalysts requires pruning of the candidate space by efficient property prediction with quantitative structure-property relationships. Data-driven workflows embedded in a library of potential catalysts can be used to build predictive models for catalyst performance and serve as a blueprint for novel catalyst designs. Herein we introduce kraken, a discovery platform covering monodentate organophosphorus(III) ligands providing comprehensive physicochemical descriptors based on representative conformer ensembles. Using quantum-mechanical methods, we calculated descriptors for 1558 ligands, including commercially available examples, and trained machine learning models to predict properties of over 300000 new ligands. We demonstrate the application of kraken to systematically explore the property space of organophosphorus ligands and how existing data sets in catalysis can be used to accelerate ligand selection during reaction optimization.
Collapse
Affiliation(s)
- Tobias Gensch
- Department of Chemistry, University of Utah, 315 South 1400 East, Salt Lake City, Utah 84112, United States.,Department of Chemistry, TU Berlin, Straße des 17. Juni 135, Sekr. C2, 10623 Berlin, Germany
| | - Gabriel Dos Passos Gomes
- Chemical Physics Theory Group, Department of Chemistry, University of Toronto, 80 St. George St., Toronto, Ontario M5S 3H6, Canada.,Department of Computer Science, University of Toronto, 214 College St., Toronto, Ontario M5T 3A1, Canada.,Vector Institute for Artificial Intelligence, 661 University Ave. Suite 710, Toronto, Ontario M5G 1M1, Canada
| | - Pascal Friederich
- Chemical Physics Theory Group, Department of Chemistry, University of Toronto, 80 St. George St., Toronto, Ontario M5S 3H6, Canada.,Department of Computer Science, University of Toronto, 214 College St., Toronto, Ontario M5T 3A1, Canada.,Institute of Nanotechnology, Karlsruhe Institute of Technology, Hermann-von-Helmholtz-Platz 1, 76344 Eggenstein-Leopoldshafen, Germany
| | - Ellyn Peters
- Department of Chemistry, University of Utah, 315 South 1400 East, Salt Lake City, Utah 84112, United States
| | - Théophile Gaudin
- Department of Computer Science, University of Toronto, 214 College St., Toronto, Ontario M5T 3A1, Canada.,IBM Research Zurich, Säumerstrasse 4, 8803 Rüschlikon, Switzerland
| | - Robert Pollice
- Chemical Physics Theory Group, Department of Chemistry, University of Toronto, 80 St. George St., Toronto, Ontario M5S 3H6, Canada.,Department of Computer Science, University of Toronto, 214 College St., Toronto, Ontario M5T 3A1, Canada
| | - Kjell Jorner
- Chemical Physics Theory Group, Department of Chemistry, University of Toronto, 80 St. George St., Toronto, Ontario M5S 3H6, Canada.,Department of Computer Science, University of Toronto, 214 College St., Toronto, Ontario M5T 3A1, Canada.,Early Chemical Development, Pharmaceutical Sciences, R&D, AstraZeneca, Macclesfield K10 2NA, United Kingdom
| | - AkshatKumar Nigam
- Chemical Physics Theory Group, Department of Chemistry, University of Toronto, 80 St. George St., Toronto, Ontario M5S 3H6, Canada.,Department of Computer Science, University of Toronto, 214 College St., Toronto, Ontario M5T 3A1, Canada
| | - Michael Lindner-D'Addario
- Chemical Physics Theory Group, Department of Chemistry, University of Toronto, 80 St. George St., Toronto, Ontario M5S 3H6, Canada.,Department of Computer Science, University of Toronto, 214 College St., Toronto, Ontario M5T 3A1, Canada
| | - Matthew S Sigman
- Department of Chemistry, University of Utah, 315 South 1400 East, Salt Lake City, Utah 84112, United States
| | - Alán Aspuru-Guzik
- Chemical Physics Theory Group, Department of Chemistry, University of Toronto, 80 St. George St., Toronto, Ontario M5S 3H6, Canada.,Department of Computer Science, University of Toronto, 214 College St., Toronto, Ontario M5T 3A1, Canada.,Vector Institute for Artificial Intelligence, 661 University Ave. Suite 710, Toronto, Ontario M5G 1M1, Canada.,Lebovic Fellow, Canadian Institute for Advanced Research (CIFAR), 661 University Ave., Toronto, Ontario M5G, Canada
| |
Collapse
|
23
|
Saini V. A machine learning approach for predicting the fluorination strength of electrophilic fluorinating reagents. Phys Chem Chem Phys 2022; 24:26802-26812. [DOI: 10.1039/d2cp03281c] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
A neural network algorithm utilizing SMILES encoding of organic molecules was successfully employed for predicting the fluorination strength of a wide range of N–F fluorinating reagents.
Collapse
Affiliation(s)
- Vaneet Saini
- Department of Chemistry & Centre for Advanced Studies in Chemistry, Panjab University, Chandigarh 160014, India
| |
Collapse
|
24
|
Komp E, Janulaitis N, Valleau S. Progress towards machine learning reaction rate constants. Phys Chem Chem Phys 2021; 24:2692-2705. [PMID: 34935798 DOI: 10.1039/d1cp04422b] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/12/2023]
Abstract
Quantum and classical reaction rate constant calculations come at the cost of exploring potential energy surfaces. Due to the "curse of dimensionality", their evaluation quickly becomes unfeasible as the system size grows. Machine learning algorithms can accelerate the calculation of reaction rate constants by predicting them using low cost input features. In this perspective, we briefly introduce supervised machine learning algorithms in the context of reaction rate constant prediction. We discuss existing and recently created kinetic datasets and input feature representations as well as the use and design of machine learning algorithms to predict reaction rate constants or quantities required for their computation. Amongst these, we first describe the use of machine learning to predict activation, reaction, solvation and dissociation energies. We then look at the use of machine learning to predict reactive force field parameters, reaction rate constants as well as to help accelerate the search for minimum energy paths. Lastly, we provide an outlook on areas which have yet to be explored so as to improve and evaluate the use of machine learning algorithms for chemical reaction rate constants.
Collapse
Affiliation(s)
- Evan Komp
- Department of Chemical Engineering, University of Washington, Seattle, Washington 98195, USA.
| | - Nida Janulaitis
- Department of Chemical Engineering, University of Washington, Seattle, Washington 98195, USA.
| | - Stéphanie Valleau
- Department of Chemical Engineering, University of Washington, Seattle, Washington 98195, USA.
| |
Collapse
|
25
|
Prasad VK, Khalilian MH, Otero-de-la-Roza A, DiLabio GA. BSE49, a diverse, high-quality benchmark dataset of separation energies of chemical bonds. Sci Data 2021; 8:300. [PMID: 34815431 PMCID: PMC8611007 DOI: 10.1038/s41597-021-01088-2] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2021] [Accepted: 11/01/2021] [Indexed: 01/23/2023] Open
Abstract
We present an extensive and diverse dataset of bond separation energies associated with the homolytic cleavage of covalently bonded molecules (A-B) into their corresponding radical fragments (A. and B.). Our dataset contains two different classifications of model structures referred to as "Existing" (molecules with associated experimental data) and "Hypothetical" (molecules with no associated experimental data). In total, the dataset consists of 4502 datapoints (1969 datapoints from the Existing and 2533 datapoints from the Hypothetical classes). The dataset covers 49 unique X-Y type single bonds (except H-H, H-F, and H-Cl), where X and Y are H, B, C, N, O, F, Si, P, S, and Cl atoms. All the reference data was calculated at the (RO)CBS-QB3 level of theory. The reference bond separation energies are non-relativistic ground-state energy differences and contain no zero-point energy corrections. This new dataset of bond separation energies (BSE49) is presented as a high-quality reference dataset for assessing and developing computational chemistry methods.
Collapse
Affiliation(s)
- Viki Kumar Prasad
- Department of Chemistry, University of British Columbia, Kelowna, British Columbia, V1V 1V7, Canada
| | - M Hossein Khalilian
- Department of Chemistry, University of British Columbia, Kelowna, British Columbia, V1V 1V7, Canada
| | - Alberto Otero-de-la-Roza
- Departamento de Química Física y Analítica, Facultad de Química, Universidad de Oviedo, MALTA Consolider Team, E-33006, Oviedo, Spain
| | - Gino A DiLabio
- Department of Chemistry, University of British Columbia, Kelowna, British Columbia, V1V 1V7, Canada.
| |
Collapse
|
26
|
Mater AC, Coote ML. Explainable Molecular Sets: Using Information Theory to Generate Meaningful Descriptions of Groups of Molecules. J Chem Inf Model 2021; 61:4877-4889. [PMID: 34636543 DOI: 10.1021/acs.jcim.1c00519] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Algorithmically identifying the meaningful similarities between an assortment of molecules is a critical chemical problem, and one which is only gaining in relevance as data-driven chemistry continues to progress. Effectively addressing this challenge can be achieved through a reformulation of the problem into information theory, cluster-based supervised classification, and the implementation of key concepts, particularly information entropy and mutual information. These concepts are combined with unsupervised learning atop learned chemical spaces to generate meaningful labels for arbitrary collections of molecules. An open-source and highly extensible codebase is provided to undertake these experiments, demonstrate the viability of the approach on known clusters, and glean insights into the learned representations of chemical space within message-passing neural networks, an architecture not readily permitting interpretability. This approach facilitates the interoperability between human chemical knowledge and the algorithmically derived insights, which will continue to become more prevalent in the coming years.
Collapse
Affiliation(s)
- Adam C Mater
- Research School of Chemistry, Australian National University, Canberra, Australian Capital Territory 2601, Australia
| | - Michelle L Coote
- Research School of Chemistry, Australian National University, Canberra, Australian Capital Territory 2601, Australia
| |
Collapse
|
27
|
Ye S, Liang J, Zhu X. Catalyst deep neural networks (Cat-DNNs) in singlet fission property prediction. Phys Chem Chem Phys 2021; 23:20835-20840. [PMID: 34505584 DOI: 10.1039/d1cp03594k] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
Many current deep neural network (DNN) models only focus on straightforward optimization over the given database. However, most numerical fitting procedures depart from physical laws. By introducing the concept of "catalysis" from physical chemistry, we propose that the physical correlations among molecular properties could spontaneously act as a catalyst in the DNNs, which increases the accuracy, and more importantly, guides the DNNs in the right way. These Catalysis-DNNs (Cat-DNNs) could precisely predict both the ground and excited-state properties, especially the molecules' screening with singlet fission character. We show that traditional machine learning metrics are not suitable for evaluating model accuracy in physical-chemical tasks and issue new physical errors. We believe that the agile transfer of fundamental physics or chemistry domain knowledge, like the catalyst, could significantly benefit both the architecture and application of artificial intelligence technology in the future.
Collapse
Affiliation(s)
- Shuqian Ye
- School of Science and Engineering (SSE), Shenzhen Institute of Artificial Intelligence and Robotics for Society (AIRS), The Chinese University of Hong Kong, Shenzhen(CUHK-Shenzhen), 14-15F, Tower G2, Xinghe World, Rd Yabao, Longgang District, Shenzhen, Guangdong, 518172, China.
| | - Jiechun Liang
- School of Science and Engineering (SSE), Shenzhen Institute of Artificial Intelligence and Robotics for Society (AIRS), The Chinese University of Hong Kong, Shenzhen(CUHK-Shenzhen), 14-15F, Tower G2, Xinghe World, Rd Yabao, Longgang District, Shenzhen, Guangdong, 518172, China.
| | - Xi Zhu
- School of Science and Engineering (SSE), Shenzhen Institute of Artificial Intelligence and Robotics for Society (AIRS), The Chinese University of Hong Kong, Shenzhen(CUHK-Shenzhen), 14-15F, Tower G2, Xinghe World, Rd Yabao, Longgang District, Shenzhen, Guangdong, 518172, China.
| |
Collapse
|
28
|
Xie X, Clark Spotte-Smith EW, Wen M, Patel HD, Blau SM, Persson KA. Data-Driven Prediction of Formation Mechanisms of Lithium Ethylene Monocarbonate with an Automated Reaction Network. J Am Chem Soc 2021; 143:13245-13258. [PMID: 34379977 DOI: 10.1021/jacs.1c05807] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
Interfacial reactions are notoriously difficult to characterize, and robust prediction of the chemical evolution and associated functionality of the resulting surface film is one of the grand challenges of materials chemistry. The solid-electrolyte interphase (SEI), critical to Li-ion batteries (LIBs), exemplifies such a surface film, and despite decades of work, considerable controversy remains regarding the major components of the SEI as well as their formation mechanisms. Here we use a reaction network to investigate whether lithium ethylene monocarbonate (LEMC) or lithium ethylene dicarbonate (LEDC) is the major organic component of the LIB SEI. Our data-driven, automated methodology is based on a systematic generation of relevant species using a general fragmentation/recombination procedure which provides the basis for a vast thermodynamic reaction landscape, calculated with density functional theory. The shortest pathfinding algorithms are employed to explore the reaction landscape and obtain previously proposed formation mechanisms of LEMC as well as several new reaction pathways and intermediates. For example, we identify two novel LEMC formation mechanisms: one which involves LiH generation and another that involves breaking the (CH2)O-C(═O)OLi bond in LEDC. Most importantly, we find that all identified paths, which are also kinetically favorable under the explored conditions, require water as a reactant. This condition severely limits the amount of LEMC that can form, as compared with LEDC, a conclusion that has direct impact on the SEI formation in Li-ion energy storage systems. Finally, the data-driven framework presented here is generally applicable to any electrochemical system and expected to improve our understanding of surface passivation.
Collapse
Affiliation(s)
- Xiaowei Xie
- Department of Chemistry, University of California, Berkeley, California 94720, United States.,Materials Science Division, Lawrence Berkeley National Laboratory, Berkeley, California 94720, United States
| | - Evan Walter Clark Spotte-Smith
- Materials Science Division, Lawrence Berkeley National Laboratory, Berkeley, California 94720, United States.,Department of Materials Science and Engineering, University of California, Berkeley, California 94720, United States
| | - Mingjian Wen
- Energy Technologies Area, Lawrence Berkeley National Laboratory, Berkeley, California 94720, United States
| | - Hetal D Patel
- Materials Science Division, Lawrence Berkeley National Laboratory, Berkeley, California 94720, United States.,Department of Materials Science and Engineering, University of California, Berkeley, California 94720, United States
| | - Samuel M Blau
- Energy Technologies Area, Lawrence Berkeley National Laboratory, Berkeley, California 94720, United States
| | - Kristin A Persson
- Department of Materials Science and Engineering, University of California, Berkeley, California 94720, United States.,Molecular Foundry, Lawrence Berkeley National Laboratory, Berkeley, California 94720, United States
| |
Collapse
|
29
|
Quantum chemical calculations of lithium-ion battery electrolyte and interphase species. Sci Data 2021; 8:203. [PMID: 34354089 PMCID: PMC8342431 DOI: 10.1038/s41597-021-00986-9] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2021] [Accepted: 06/22/2021] [Indexed: 02/07/2023] Open
Abstract
Lithium-ion batteries (LIBs) represent the state of the art in high-density energy storage. To further advance LIB technology, a fundamental understanding of the underlying chemical processes is required. In particular, the decomposition of electrolyte species and associated formation of the solid electrolyte interphase (SEI) is critical for LIB performance. However, SEI formation is poorly understood, in part due to insufficient exploration of the vast reactive space. The Lithium-Ion Battery Electrolyte (LIBE) dataset reported here aims to provide accurate first-principles data to improve the understanding of SEI species and associated reactions. The dataset was generated by fragmenting a set of principal molecules, including solvents, salts, and SEI products, and then selectively recombining a subset of the fragments. All candidate molecules were analyzed at the ωB97X-V/def2-TZVPPD/SMD level of theory at various charges and spin multiplicities. In total, LIBE contains structural, thermodynamic, and vibrational information on over 17,000 unique species. In addition to studies of reactivity in LIBs, this dataset may prove useful for machine learning of molecular and reaction properties.
Collapse
|
30
|
Pablo‐García S, García‐Muelas R, Sabadell‐Rendón A, López N. Dimensionality reduction of complex reaction networks in heterogeneous catalysis: From l
inear‐scaling
relationships to statistical learning techniques. WIRES COMPUTATIONAL MOLECULAR SCIENCE 2021. [DOI: 10.1002/wcms.1540] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
Affiliation(s)
- Sergio Pablo‐García
- Institute of Chemical Research of Catalonia The Barcelona Institute of Science and Technology Tarragona Spain
| | - Rodrigo García‐Muelas
- Institute of Chemical Research of Catalonia The Barcelona Institute of Science and Technology Tarragona Spain
| | - Albert Sabadell‐Rendón
- Institute of Chemical Research of Catalonia The Barcelona Institute of Science and Technology Tarragona Spain
| | - Núria López
- Institute of Chemical Research of Catalonia The Barcelona Institute of Science and Technology Tarragona Spain
| |
Collapse
|
31
|
Paenurk E, Chen P. Modeling Gas-Phase Unimolecular Dissociation for Bond Dissociation Energies: Comparison of Statistical Rate Models within RRKM Theory. J Phys Chem A 2021; 125:1927-1940. [PMID: 33635061 DOI: 10.1021/acs.jpca.1c00183] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
The Rice-Ramsperger-Kassel-Marcus (RRKM) theory provides a simple yet powerful rate theory for calculating microcanonical rate constants. In particular, it has found widespread use in combination with gas-phase kinetic experiments of unimolecular dissociations to extract experimental bond dissociation energies (BDEs). We have previously found several discrepancies between the computed BDE values and the respective experimental ones, obtained with our empirical rate model, named L-CID. To investigate the reliability of our rate model, we conducted a theoretical analysis and comparison of the performance of conventional rate models and L-CID within the RRKM framework. Using the previously published microcanonical rate data as well as reaction cross-section data, we show that the BDE values obtained with the L-CID model agree with the ones from the other rate models within the expected uncertainty bounds. Based on this agreement, we discuss the possible rationalization of the good performance of the L-CID model.
Collapse
Affiliation(s)
- Eno Paenurk
- Department of Chemistry and Applied Biosciences, ETH Zürich, 8093 Zürich, Switzerland
| | - Peter Chen
- Department of Chemistry and Applied Biosciences, ETH Zürich, 8093 Zürich, Switzerland
| |
Collapse
|