1
|
Staub R, Gantzer P, Harabuchi Y, Maeda S, Varnek A. Challenges for Kinetics Predictions via Neural Network Potentials: A Wilkinson's Catalyst Case. Molecules 2023; 28:molecules28114477. [PMID: 37298952 DOI: 10.3390/molecules28114477] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2023] [Revised: 05/23/2023] [Accepted: 05/26/2023] [Indexed: 06/12/2023] Open
Abstract
Ab initio kinetic studies are important to understand and design novel chemical reactions. While the Artificial Force Induced Reaction (AFIR) method provides a convenient and efficient framework for kinetic studies, accurate explorations of reaction path networks incur high computational costs. In this article, we are investigating the applicability of Neural Network Potentials (NNP) to accelerate such studies. For this purpose, we are reporting a novel theoretical study of ethylene hydrogenation with a transition metal complex inspired by Wilkinson's catalyst, using the AFIR method. The resulting reaction path network was analyzed by the Generative Topographic Mapping method. The network's geometries were then used to train a state-of-the-art NNP model, to replace expensive ab initio calculations with fast NNP predictions during the search. This procedure was applied to run the first NNP-powered reaction path network exploration using the AFIR method. We discovered that such explorations are particularly challenging for general purpose NNP models, and we identified the underlying limitations. In addition, we are proposing to overcome these challenges by complementing NNP models with fast semiempirical predictions. The proposed solution offers a generally applicable framework, laying the foundations to further accelerate ab initio kinetic studies with Machine Learning Force Fields, and ultimately explore larger systems that are currently inaccessible.
Collapse
Affiliation(s)
- Ruben Staub
- Institute for Chemical Reaction Design and Discovery (WPI-ICReDD), Hokkaido University, Kita 21, Nishi 10, Kita-ku, Sapporo 001-0021, Japan
| | - Philippe Gantzer
- Institute for Chemical Reaction Design and Discovery (WPI-ICReDD), Hokkaido University, Kita 21, Nishi 10, Kita-ku, Sapporo 001-0021, Japan
| | - Yu Harabuchi
- Institute for Chemical Reaction Design and Discovery (WPI-ICReDD), Hokkaido University, Kita 21, Nishi 10, Kita-ku, Sapporo 001-0021, Japan
- Japan Science and Technology Agency (JST), ERATO Maeda Artificial Intelligence in Chemical Reaction Design and Discovery Project, Kita 10, Nishi 8, Kita-ku, Sapporo 060-0810, Japan
- Department of Chemistry, Faculty of Science, Hokkaido University, Kita 10, Nishi 8, Kita-ku, Sapporo 060-0810, Japan
| | - Satoshi Maeda
- Institute for Chemical Reaction Design and Discovery (WPI-ICReDD), Hokkaido University, Kita 21, Nishi 10, Kita-ku, Sapporo 001-0021, Japan
- Japan Science and Technology Agency (JST), ERATO Maeda Artificial Intelligence in Chemical Reaction Design and Discovery Project, Kita 10, Nishi 8, Kita-ku, Sapporo 060-0810, Japan
- Department of Chemistry, Faculty of Science, Hokkaido University, Kita 10, Nishi 8, Kita-ku, Sapporo 060-0810, Japan
- Research and Services Division of Materials Data and Integrated System (MaDIS), National Institute for Materials Science (NIMS), 1-1 Namiki, Tsukuba 305-0044, Japan
| | - Alexandre Varnek
- Institute for Chemical Reaction Design and Discovery (WPI-ICReDD), Hokkaido University, Kita 21, Nishi 10, Kita-ku, Sapporo 001-0021, Japan
- Laboratory of Chemoinformatics, UMR 7140, CNRS, University of Strasbourg, 67081 Strasbourg, France
| |
Collapse
|
2
|
Sun Y, Wang X, Ren N, Liu Y, You S. Improved Machine Learning Models by Data Processing for Predicting Life-Cycle Environmental Impacts of Chemicals. ENVIRONMENTAL SCIENCE & TECHNOLOGY 2023; 57:3434-3444. [PMID: 36537350 DOI: 10.1021/acs.est.2c04945] [Citation(s) in RCA: 10] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/17/2023]
Abstract
Machine learning (ML) provides an efficient manner for rapid prediction of the life-cycle environmental impacts of chemicals, but challenges remain due to low prediction accuracy and poor interpretability of the models. To address these issues, we focused on data processing by using a mutual information-permutation importance (MI-PI) feature selection method to filter out irrelevant molecular descriptors from the input data, which improved the model interpretability by preserving the physicochemical meanings of original molecular descriptors without generation of new variables. We also applied a weighted Euclidean distance method to mine the data most relevant to the predicted targets by quantifying the contribution of each feature, thereby the prediction accuracy was improved. On the basis of above data processing, we developed artificial neural network (ANN) models for predicting the life-cycle environmental impacts of chemicals with R2 values of 0.81, 0.81, 0.84, 0.75, 0.73, and 0.86 for global warming, human health, metal depletion, freshwater ecotoxicity, particulate matter formation, and terrestrial acidification, respectively. The ML models were interpreted using the Shapley additive explanation method by quantifying the contribution of each input molecular descriptor to environmental impact categories. This work suggests that the combination of feature selection by MI-PI and source data selection based on weighted Euclidean distance has a promising potential to improve the accuracy and interpretability of the models for predicting the life-cycle environmental impacts of chemicals.
Collapse
Affiliation(s)
- Ye Sun
- State Key Laboratory of Urban Water Resource and Environment, School of Environment, Harbin Institute of Technology, Harbin150090, P. R. China
| | - Xiuheng Wang
- State Key Laboratory of Urban Water Resource and Environment, School of Environment, Harbin Institute of Technology, Harbin150090, P. R. China
| | - Nanqi Ren
- State Key Laboratory of Urban Water Resource and Environment, School of Environment, Harbin Institute of Technology, Harbin150090, P. R. China
| | - Yanbiao Liu
- College of Environmental Science and Engineering, Textile Pollution Controlling Engineering Center of the Ministry of Ecology and Environment, Donghua University, Shanghai201620, China
| | - Shijie You
- State Key Laboratory of Urban Water Resource and Environment, School of Environment, Harbin Institute of Technology, Harbin150090, P. R. China
| |
Collapse
|
3
|
Al Ibrahim E, Farooq A. Transfer Learning Approach to Multitarget Temperature-Dependent Reaction Rate Prediction. J Phys Chem A 2022; 126:4617-4629. [PMID: 35793232 DOI: 10.1021/acs.jpca.2c00713] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Accurate prediction of temperature-dependent reaction rate constants of organic compounds is of great importance to both atmospheric chemistry and combustion science. Extensive work has been done on developing automated mechanism generation systems but the lack of quality reaction rate data remains a huge bottleneck in the application of highly detailed mechanisms. Machine learning prediction models have been recently adopted to alleviate the data gap in thermochemistry and have great potential to do the same for kinetic data with the recent release of quality reaction rate data compilations. The ultimate goal is to formulate easily accessible, general-purpose, temperature-dependent, and multitarget models for the prediction of reaction rates. To that end, we propose a model that depends on the well-known Morgan fingerprints as well as learned representations transferred from the QM9 data set. We propose the use of an Arrhenius-based loss where predictions of the three modified-Arrhenius parameters (A, n, and B = Ea/R) are given instead of the direct prediction of reaction rate constants. Our model is >35% more accurate compared to a baseline model of feed forward network (FFN) on Morgan fingerprints.
Collapse
Affiliation(s)
- Emad Al Ibrahim
- Clean Combustion Research Center (CCRC), Physical Sciences and Engineering Divsion, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Saudi Arabia
| | - Aamir Farooq
- Clean Combustion Research Center (CCRC), Physical Sciences and Engineering Divsion, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Saudi Arabia
| |
Collapse
|
4
|
Song R, Li D, Chang A, Tao M, Qin Y, Keller AA, Suh S. Accelerating the pace of ecotoxicological assessment using artificial intelligence. AMBIO 2022; 51:598-610. [PMID: 34427865 PMCID: PMC8800994 DOI: 10.1007/s13280-021-01598-8] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/03/2020] [Revised: 04/12/2021] [Accepted: 06/29/2021] [Indexed: 06/13/2023]
Abstract
Species Sensitivity Distribution (SSD) is a key metric for understanding the potential ecotoxicological impacts of chemicals. However, SSDs have been developed to estimate for only handful of chemicals due to the scarcity of experimental toxicity data. Here we present a novel approach to expand the chemical coverage of SSDs using Artificial Neural Network (ANN). We collected over 2000 experimental toxicity data in Lethal Concentration 50 (LC50) for 8 aquatic species and trained an ANN model for each of the 8 aquatic species based on molecular structure. The R2 values of resulting ANN models range from 0.54 to 0.75 (median R2 = 0.69). We applied the predicted LC50 values to fit SSD curves using bootstrapping method, generating SSDs for 8424 chemicals in the ToX21 database. The dataset is expected to serve as a screening-level reference SSD database for understanding potential ecotoxicological impacts of chemicals.
Collapse
Affiliation(s)
- Runsheng Song
- Bren School of Environmental Science and Management, University of California, Santa Barbara, Santa Barbara, CA 98121 USA
| | - Dingsheng Li
- University of Nevada, Reno, 1664 N Virginia St, Reno, NV 89557 USA
| | - Alexander Chang
- Emory Rollins School of Public Health, 1518 Clifton Rd, Atlanta, GA 30322 USA
| | - Mengya Tao
- Bren School of Environmental Science and Management, University of California, Santa Barbara, Santa Barbara, CA 98121 USA
| | - Yuwei Qin
- Bren School of Environmental Science and Management, University of California, Santa Barbara, Santa Barbara, CA 98121 USA
| | - Arturo A. Keller
- Bren School of Environmental Science and Management, University of California, Santa Barbara, Santa Barbara, CA 98121 USA
| | - Sangwon Suh
- Bren School of Environmental Science and Management, University of California, Santa Barbara, Santa Barbara, CA 98121 USA
| |
Collapse
|
5
|
Grambow C, Pattanaik L, Green WH. Deep Learning of Activation Energies. J Phys Chem Lett 2020; 11:2992-2997. [PMID: 32216310 PMCID: PMC7311089 DOI: 10.1021/acs.jpclett.0c00500] [Citation(s) in RCA: 76] [Impact Index Per Article: 19.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2020] [Accepted: 03/27/2020] [Indexed: 05/27/2023]
Abstract
Quantitative predictions of reaction properties, such as activation energy, have been limited due to a lack of available training data. Such predictions would be useful for computer-assisted reaction mechanism generation and organic synthesis planning. We develop a template-free deep learning model to predict the activation energy given reactant and product graphs and train the model on a new, diverse data set of gas-phase quantum chemistry reactions. We demonstrate that our model achieves accurate predictions and agrees with an intuitive understanding of chemical reactivity. With the continued generation of quantitative chemical reaction data and the development of methods that leverage such data, we expect many more methods for reactivity prediction to become available in the near future.
Collapse
Affiliation(s)
- Colin
A. Grambow
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Lagnajit Pattanaik
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - William H. Green
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| |
Collapse
|
6
|
Bondarev NV. Artificial Neural Network and Multiple Linear Regression for Prediction and Classification of Sustainability of Sodium and Potassium Coronates. RUSS J GEN CHEM+ 2019. [DOI: 10.1134/s1070363219070144] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
7
|
Oña-Ruales JO, Ruiz-Morales Y. Prediction of the Ultraviolet-Visible Absorption Spectra of Polycyclic Aromatic Hydrocarbons (Dibenzo and Naphtho) Derivatives of Fluoranthene. APPLIED SPECTROSCOPY 2017; 71:1134-1147. [PMID: 27671142 DOI: 10.1177/0003702816667517] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
The annellation theory method has been used to predict the locations of maximum absorbance (LMA) of the ultraviolet-visible (UV-Vis) spectral bands in the group of polycyclic aromatic hydrocarbons (PAHs) C24H14 (dibenzo and naphtho) derivatives of fluoranthene (DBNFl). In this group of 21 PAHs, ten PAHs present a sextet migration pattern with four or more benzenoid rings that is potentially related to a high molecular reactivity and high mutagenic conduct. This is the first time that the locations of maximum absorbance in the UV-Vis spectra of naphth[1,2- a]aceanthrylene, dibenz[ a,l]aceanthrylene, indeno[1,2,3- de]naphthacene, naphtho[1,2- j]fluoranthene, naphth[2,1- e]acephenanthrylene, naphth[2,1- a]aceanthrylene, dibenz[ a,j]aceanthrylene, naphth[1,2- e]acephenanthrylene, and naphtho[2,1- j]fluoranthene have been predicted. Also, this represents the first report about the application of the annellation theory for the calculation of the locations of maximum absorbance in the UV-Vis spectra of PAHs with five-membered rings. Furthermore, this study constitutes the premier investigation beyond the pure benzenoid classical approach toward the establishment of a generalized annellation theory that will encompass not only homocyclic benzenoid and non-benzenoid PAHs, but also heterocyclic compounds.
Collapse
Affiliation(s)
- Jorge O Oña-Ruales
- 1 National Institute of Standards and Technology (NIST), Gaithersburg, MD, USA
| | | |
Collapse
|