1
|
Yang Y, Zhang S, Ranasinghe KD, Isayev O, Roitberg AE. Machine Learning of Reactive Potentials. Annu Rev Phys Chem 2024; 75:371-395. [PMID: 38941524 DOI: 10.1146/annurev-physchem-062123-024417] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/30/2024]
Abstract
In the past two decades, machine learning potentials (MLPs) have driven significant developments in chemical, biological, and material sciences. The construction and training of MLPs enable fast and accurate simulations and analysis of thermodynamic and kinetic properties. This review focuses on the application of MLPs to reaction systems with consideration of bond breaking and formation. We review the development of MLP models, primarily with neural network and kernel-based algorithms, and recent applications of reactive MLPs (RMLPs) to systems at different scales. We show how RMLPs are constructed, how they speed up the calculation of reactive dynamics, and how they facilitate the study of reaction trajectories, reaction rates, free energy calculations, and many other calculations. Different data sampling strategies applied in building RMLPs are also discussed with a focus on how to collect structures for rare events and how to further improve their performance with active learning.
Collapse
Affiliation(s)
- Yinuo Yang
- Department of Chemistry, University of Florida, Gainesville, Florida;
| | - Shuhao Zhang
- Department of Chemistry, Carnegie Mellon University, Pittsburgh, Pennsylvania;
| | | | - Olexandr Isayev
- Department of Chemistry, Carnegie Mellon University, Pittsburgh, Pennsylvania;
| | - Adrian E Roitberg
- Department of Chemistry, University of Florida, Gainesville, Florida;
| |
Collapse
|
2
|
Wang G, Wang C, Zhang X, Li Z, Zhou J, Sun Z. Machine learning interatomic potential: Bridge the gap between small-scale models and realistic device-scale simulations. iScience 2024; 27:109673. [PMID: 38646181 PMCID: PMC11033164 DOI: 10.1016/j.isci.2024.109673] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/23/2024] Open
Abstract
Machine learning interatomic potential (MLIP) overcomes the challenges of high computational costs in density-functional theory and the relatively low accuracy in classical large-scale molecular dynamics, facilitating more efficient and precise simulations in materials research and design. In this review, the current state of the four essential stages of MLIP is discussed, including data generation methods, material structure descriptors, six unique machine learning algorithms, and available software. Furthermore, the applications of MLIP in various fields are investigated, notably in phase-change memory materials, structure searching, material properties predicting, and the pre-trained universal models. Eventually, the future perspectives, consisting of standard datasets, transferability, generalization, and trade-off between accuracy and complexity in MLIPs, are reported.
Collapse
Affiliation(s)
- Guanjie Wang
- School of Materials Science and Engineering, Beihang University, Beijing 100191, China
- School of Integrated Circuit Science and Engineering, Beihang University, Beijing 100191, China
| | - Changrui Wang
- School of Materials Science and Engineering, Beihang University, Beijing 100191, China
| | - Xuanguang Zhang
- School of Materials Science and Engineering, Beihang University, Beijing 100191, China
| | - Zefeng Li
- School of Materials Science and Engineering, Beihang University, Beijing 100191, China
| | - Jian Zhou
- School of Materials Science and Engineering, Beihang University, Beijing 100191, China
| | - Zhimei Sun
- School of Materials Science and Engineering, Beihang University, Beijing 100191, China
| |
Collapse
|
3
|
Martí C, Devereux C, Najm HN, Zádor J. Evaluation of Rate Coefficients in the Gas Phase Using Machine-Learned Potentials. J Phys Chem A 2024. [PMID: 38427974 DOI: 10.1021/acs.jpca.3c07872] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/03/2024]
Abstract
We assess the capability of machine-learned potentials to compute rate coefficients by training a neural network (NN) model and applying it to describe the chemical landscape on the C5H5 potential energy surface, which is relevant to molecular weight growth in combustion and interstellar media. We coupled the resulting NN with an automated kinetics workflow code, KinBot, to perform all necessary calculations to compute the rate coefficients. The NN is benchmarked exhaustively by evaluating its performance at the various stages of the kinetics calculations: from the electronic energy through the computation of zero point energy, barrier heights, entropic contributions, the portion of the PES explored, and finally the overall rate coefficients as formulated by transition state theory.
Collapse
Affiliation(s)
- Carles Martí
- Combustion Research Facility, Sandia National Laboratories, Livermore, California 94551, United States
| | - Christian Devereux
- Combustion Research Facility, Sandia National Laboratories, Livermore, California 94551, United States
| | - Habib N Najm
- Combustion Research Facility, Sandia National Laboratories, Livermore, California 94551, United States
| | - Judit Zádor
- Combustion Research Facility, Sandia National Laboratories, Livermore, California 94551, United States
| |
Collapse
|
4
|
Kývala L, Dellago C. Optimizing the architecture of Behler-Parrinello neural network potentials. J Chem Phys 2023; 159:094105. [PMID: 37655764 DOI: 10.1063/5.0167260] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2023] [Accepted: 08/10/2023] [Indexed: 09/02/2023] Open
Abstract
The architecture of neural network potentials is typically optimized at the beginning of the training process and remains unchanged throughout. Here, we investigate the accuracy of Behler-Parrinello neural network potentials for varying training set sizes. Using the QM9 and 3BPA datasets, we show that adjusting the network architecture according to the training set size improves the accuracy significantly. We demonstrate that both an insufficient and an excessive number of fitting parameters can have a detrimental impact on the accuracy of the neural network potential. Furthermore, we investigate the influences of descriptor complexity, neural network depth, and activation function on the model's performance. We find that for the neural network potentials studied here, two hidden layers yield the best accuracy and that unbounded activation functions outperform bounded ones.
Collapse
Affiliation(s)
- Lukáš Kývala
- Faculty of Physics, University of Vienna, Kolingasse 14-16, 1090 Vienna, Austria
- Vienna Doctoral School in Physics, University of Vienna, Boltzmanngasse 5, 1090 Vienna, Austria
| | - Christoph Dellago
- Faculty of Physics, University of Vienna, Kolingasse 14-16, 1090 Vienna, Austria
| |
Collapse
|
5
|
Raghavachari K, Maier S, Collins EM, Debnath S, Sengupta A. Approaching Coupled Cluster Accuracy with Density Functional Theory Using the Generalized Connectivity-Based Hierarchy. J Chem Theory Comput 2023. [PMID: 37338997 DOI: 10.1021/acs.jctc.3c00301] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/22/2023]
Abstract
This Perspective reviews connectivity-based hierarchy (CBH), a systematic hierarchy of error-cancellation schemes developed in our group with the goal of achieving chemical accuracy using inexpensive computational techniques ("coupled cluster accuracy with DFT"). The hierarchy is a generalization of Pople's isodesmic bond separation scheme that is based only on the structure and connectivity and is applicable to any organic and biomolecule consisting of covalent bonds. It is formulated as a series of rungs involving increasing levels of error cancellation on progressively larger fragments of the parent molecule. The method and our implementation are discussed briefly. Examples are given for the applications of CBH involving (1) energies of complex organic rearrangement reactions, (2) bond energies of biofuel molecules, (3) redox potentials in solution, (4) pKa predictions in the aqueous medium, and (5) theoretical thermochemistry combining CBH with machine learning. They clearly show that near-chemical accuracy (1-2 kcal/mol) is achieved for a variety of applications with DFT methods irrespective of the underlying density functional used. They demonstrate conclusively that seemingly disparate results, often seen with different density functionals in many chemical applications, are due to an accumulation of systematic errors in the smaller local molecular fragments that can be easily corrected with higher-level calculations on those small units. This enables the method to achieve the accuracy of the high level of theory (e.g., coupled cluster) while the cost remains that of DFT. The advantages and limitations of the method are discussed along with areas of ongoing developments.
Collapse
Affiliation(s)
- Krishnan Raghavachari
- Department of Chemistry, Indiana University, Bloomington, Indiana 47405, United States
| | - Sarah Maier
- Department of Chemistry, Indiana University, Bloomington, Indiana 47405, United States
| | - Eric M Collins
- Department of Chemistry, Indiana University, Bloomington, Indiana 47405, United States
| | - Sibali Debnath
- Department of Chemistry, Indiana University, Bloomington, Indiana 47405, United States
| | - Arkajyoti Sengupta
- Department of Chemistry, Indiana University, Bloomington, Indiana 47405, United States
| |
Collapse
|
6
|
Bhattacharya S, Sahoo A, Baitalik S. Human brain-inspired chemical artificial intelligence tools for the analysis and prediction of the anion-sensing characteristics of an imidazole-based luminescent Os(II)-bipyridine complex. Dalton Trans 2023; 52:6749-6762. [PMID: 37129261 DOI: 10.1039/d3dt00327b] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/03/2023]
Abstract
Neural network and decision tree-based soft computing techniques are implemented in this work for the thorough analysis of the multichannel anion-sensing characteristics of an Os(II)-bipyridine complex derived from imidazole-4,5-bis(benzimidazole) ligand. With the aid of three imidazole NH protons in its outer coordination sphere, a substantial change in the spectral response as well as OsII/OsIII potential is made possible upon treating with anions of varying basicity. Initial hydrogen bonding between NH protons and anions and thereafter complete proton transfer from the complex backbone probably take place in the process. The deprotonation of the complex by specific anions and restoration to its original form by acid is also reversible. The responsiveness of the new compound is complex enough to imitate multiple sophisticated binary and ternary Boolean logic (BL) functions (NOT logic, combinational logic, traffic signal, set-reset flip-flop logic, and ternary NOR logic) by employing its spectral and redox outputs upon the action of suitable anions and acid in a proper sequence. Executing sensing investigations on altering the amount of the anions within a widespread range is often time-consuming and tedious. To overcome the lacuna, we implemented multiple soft computing techniques, viz., fuzzy logic (FL), artificial neural networks (ANNs), adaptive neuro-fuzzy inference system (ANFIS), and decision tree (DT) regression, for the thorough analysis and prediction of the experimentally observed results. The outcomes obtained from different techniques were compared among themselves as well as with the experimental data and utilized for the proper modeling of the anion-sensing behaviors of the complex.
Collapse
Affiliation(s)
- Sohini Bhattacharya
- Department of Chemistry, Inorganic Chemistry Section, Jadavpur University, Kolkata-700032, India.
| | - Anik Sahoo
- Department of Chemistry, Inorganic Chemistry Section, Jadavpur University, Kolkata-700032, India.
| | - Sujoy Baitalik
- Department of Chemistry, Inorganic Chemistry Section, Jadavpur University, Kolkata-700032, India.
| |
Collapse
|
7
|
Zheng B, Oliveira FL, Neumann Barros Ferreira R, Steiner M, Hamann H, Gu GX, Luan B. Quantum Informed Machine-Learning Potentials for Molecular Dynamics Simulations of CO 2's Chemisorption and Diffusion in Mg-MOF-74. ACS NANO 2023; 17:5579-5587. [PMID: 36883740 DOI: 10.1021/acsnano.2c11102] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/18/2023]
Abstract
Among various porous solids for gas separation and purification, metal-organic frameworks (MOFs) are promising materials that potentially combine high CO2 uptake and CO2/N2 selectivity. So far, within the hundreds of thousands of MOF structures known today, it remains a challenge to computationally identify the best suited species. First principle-based simulations of CO2 adsorption in MOFs would provide the necessary accuracy; however, they are impractical due to the high computational cost. Classical force field-based simulations would be computationally feasible; however, they do not provide sufficient accuracy. Thus, the entropy contribution that requires both accurate force fields and sufficiently long computing time for sampling is difficult to obtain in simulations. Here, we report quantum-informed machine-learning force fields (QMLFFs) for atomistic simulations of CO2 in MOFs. We demonstrate that the method has a much higher computational efficiency (∼1000×) than the first-principle one while maintaining the quantum-level accuracy. As a proof of concept, we show that the QMLFF-based molecular dynamics simulations of CO2 in Mg-MOF-74 can predict the binding free energy landscape and the diffusion coefficient close to experimental values. The combination of machine learning and atomistic simulation helps achieve more accurate and efficient in silico evaluations of the chemisorption and diffusion of gas molecules in MOFs.
Collapse
Affiliation(s)
- Bowen Zheng
- IBM Research, Yorktown Heights, New York 10598, United States
- Department of Mechanical Engineering, University of California, Berkeley, California 94720, United States
| | - Felipe Lopes Oliveira
- IBM Research, Av. República do Chile, 330, CEP 20031-170 Rio de Janeiro, RJ, Brazil
- Department of Organic Chemistry, Instituto de Química, Universidade Federal do Rio de Janeiro, CEP 21941-909 Rio de Janeiro, RJ, Brazil
| | | | - Mathias Steiner
- IBM Research, Av. República do Chile, 330, CEP 20031-170 Rio de Janeiro, RJ, Brazil
| | - Hendrik Hamann
- IBM Research, Yorktown Heights, New York 10598, United States
| | - Grace X Gu
- Department of Mechanical Engineering, University of California, Berkeley, California 94720, United States
| | - Binquan Luan
- IBM Research, Yorktown Heights, New York 10598, United States
| |
Collapse
|
8
|
Kjeldal FØ, Eriksen JJ. Decomposing Chemical Space: Applications to the Machine Learning of Atomic Energies. J Chem Theory Comput 2023; 19:2029-2038. [PMID: 36926874 DOI: 10.1021/acs.jctc.2c01290] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/18/2023]
Abstract
We apply a number of atomic decomposition schemes across the standard QM7 data set─a small model set of organic molecules at equilibrium geometry─to inspect the possible emergence of trends among contributions to atomization energies from distinct elements embedded within molecules. Specifically, a recent decomposition scheme of ours based on spatially localized molecular orbitals is compared to alternatives that instead partition molecular energies on account of which nuclei individual atomic orbitals are centered on. We find these partitioning schemes to expose the composition of chemical compound space in very dissimilar ways in terms of the grouping, binning, and heterogeneity of discrete atomic contributions, e.g., those associated with hydrogens bonded to different heavy atoms. Furthermore, unphysical dependencies on the one-electron basis set are found for some, but not all of these schemes. The relevance and importance of these compositional factors for training tailored neural network models based on atomic energies are next assessed. We identify both limitations and possible advantages with respect to contemporary machine learning models and discuss the design of potential counterparts based on atoms and the intrinsic energies of these as the principal decomposition units.
Collapse
Affiliation(s)
- Frederik Ø Kjeldal
- DTU Chemistry, Technical University of Denmark Kemitorvet Building 206, 2800 Kongens Lyngby, Denmark
| | - Janus J Eriksen
- DTU Chemistry, Technical University of Denmark Kemitorvet Building 206, 2800 Kongens Lyngby, Denmark
| |
Collapse
|
9
|
Yang Q, Jiang GD, He SG. Enhancing the Performance of Global Optimization of Platinum Cluster Structures by Transfer Learning in a Deep Neural Network. J Chem Theory Comput 2023; 19:1922-1930. [PMID: 36917066 DOI: 10.1021/acs.jctc.2c00923] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/16/2023]
Abstract
The global optimization of metal cluster structures is an important research field. The traditional deep neural network (T-DNN) global optimization method is a good way to find out the global minimum (GM) of metal cluster structures, but a large number of samples are required. We developed a new global optimization method which is the combination of the DNN and transfer learning (DNN-TL). The DNN-TL method transfers the DNN parameters of the small-sized cluster to the DNN of the large-sized cluster to greatly reduce the number of samples. For the global optimization of Pt9 and Pt13 clusters in this research, the T-DNN method requires about 3-10 times more samples than the DNN-TL method, and the DNN-TL method saves about 70-80% of time. We also found that the average amplitude of parameter changes in the T-DNN training is about 2 times larger than that in the DNN-TL training, which rationalizes the effectiveness of transfer learning. The average fitting errors of the DNN trained by the DNN-TL method can be even smaller than those by the T-DNN method because of the reliability of transfer learning. Finally, we successfully obtained the GM structures of Ptn (n = 8-14) clusters by the DNN-TL method.
Collapse
Affiliation(s)
- Qi Yang
- State Key Laboratory for Structural Chemistry of Unstable and Stable Species, Institute of Chemistry, Chinese Academy of Sciences, Beijing 100190, PR China.,University of Chinese Academy of Sciences, Beijing 100049, PR China.,Beijing National Laboratory for Molecular Sciences and CAS Research/Education Center of Excellence in Molecular Sciences, Beijing 100190, PR China
| | - Gui-Duo Jiang
- State Key Laboratory for Structural Chemistry of Unstable and Stable Species, Institute of Chemistry, Chinese Academy of Sciences, Beijing 100190, PR China.,University of Chinese Academy of Sciences, Beijing 100049, PR China.,Beijing National Laboratory for Molecular Sciences and CAS Research/Education Center of Excellence in Molecular Sciences, Beijing 100190, PR China
| | - Sheng-Gui He
- State Key Laboratory for Structural Chemistry of Unstable and Stable Species, Institute of Chemistry, Chinese Academy of Sciences, Beijing 100190, PR China.,University of Chinese Academy of Sciences, Beijing 100049, PR China.,Beijing National Laboratory for Molecular Sciences and CAS Research/Education Center of Excellence in Molecular Sciences, Beijing 100190, PR China
| |
Collapse
|
10
|
Käser S, Vazquez-Salazar LI, Meuwly M, Töpfer K. Neural network potentials for chemistry: concepts, applications and prospects. DIGITAL DISCOVERY 2023; 2:28-58. [PMID: 36798879 PMCID: PMC9923808 DOI: 10.1039/d2dd00102k] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/23/2022] [Accepted: 12/20/2022] [Indexed: 12/24/2022]
Abstract
Artificial Neural Networks (NN) are already heavily involved in methods and applications for frequent tasks in the field of computational chemistry such as representation of potential energy surfaces (PES) and spectroscopic predictions. This perspective provides an overview of the foundations of neural network-based full-dimensional potential energy surfaces, their architectures, underlying concepts, their representation and applications to chemical systems. Methods for data generation and training procedures for PES construction are discussed and means for error assessment and refinement through transfer learning are presented. A selection of recent results illustrates the latest improvements regarding accuracy of PES representations and system size limitations in dynamics simulations, but also NN application enabling direct prediction of physical results without dynamics simulations. The aim is to provide an overview for the current state-of-the-art NN approaches in computational chemistry and also to point out the current challenges in enhancing reliability and applicability of NN methods on a larger scale.
Collapse
Affiliation(s)
- Silvan Käser
- Department of Chemistry, University of Basel Klingelbergstrasse 80 CH-4056 Basel Switzerland
| | | | - Markus Meuwly
- Department of Chemistry, University of Basel Klingelbergstrasse 80 CH-4056 Basel Switzerland
| | - Kai Töpfer
- Department of Chemistry, University of Basel Klingelbergstrasse 80 CH-4056 Basel Switzerland
| |
Collapse
|
11
|
Chen LL, Xu YC, Yang Y, Li N, Zou HX, Wen HH, Yan X. Prediction of peptide-induced silica formation under a wide pH range by molecular descriptors. Colloids Surf A Physicochem Eng Asp 2022. [DOI: 10.1016/j.colsurfa.2022.130030] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/03/2022]
|
12
|
Lee S, Ermanis K, Goodman JM. MolE8: finding DFT potential energy surface minima values from force-field optimised organic molecules with new machine learning representations. Chem Sci 2022; 13:7204-7214. [PMID: 35799803 PMCID: PMC9214916 DOI: 10.1039/d1sc06324c] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2021] [Accepted: 05/23/2022] [Indexed: 11/21/2022] Open
Abstract
The use of machine learning techniques in computational chemistry has gained significant momentum since large molecular databases are now readily available. Predictions of molecular properties using machine learning have advantages over the traditional quantum mechanics calculations because they can be cheaper computationally without losing the accuracy. We present a new extrapolatable and explainable molecular representation based on bonds, angles and dihedrals that can be used to train machine learning models. The trained models can accurately predict the electronic energy and the free energy of small organic molecules with atom types C, H N and O, with a mean absolute error of 1.2 kcal mol-1. The models can be extrapolated to larger organic molecules with an average error of less than 3.7 kcal mol-1 for 10 or fewer heavy atoms, which represent a chemical space two orders of magnitude larger. The rapid energy predictions of multiple molecules, up to 7 times faster than previous ML models of similar accuracy, has been achieved by sampling geometries around the potential energy surface minima. Therefore, the input geometries do not have to be located precisely on the minima and we show that accurate density functional theory energy predictions can be made from force-field optimised geometries with a mean absolute error 2.5 kcal mol-1.
Collapse
Affiliation(s)
- Sanha Lee
- Yusuf Hamied Department of Chemistry, University of Cambridge Lensfield Road Cambridge CB2 1EW UK
| | | | - Jonathan M Goodman
- Yusuf Hamied Department of Chemistry, University of Cambridge Lensfield Road Cambridge CB2 1EW UK
| |
Collapse
|
13
|
Achar SK, Wardzala JJ, Bernasconi L, Zhang L, Johnson JK. Combined Deep Learning and Classical Potential Approach for Modeling Diffusion in UiO-66. J Chem Theory Comput 2022; 18:3593-3606. [PMID: 35653218 DOI: 10.1021/acs.jctc.2c00010] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Modeling of diffusion of adsorbates through porous materials with atomistic molecular dynamics (MD) can be a challenging task if the flexibility of the adsorbent needs to be included. This is because potentials need to be developed that accurately account for the motion of the adsorbent in response to the presence of adsorbate molecules. In this work, we show that it is possible to use accurate machine learning atomistic potentials for metal-organic frameworks in concert with classical potentials for adsorbates to accurately compute diffusivities though a hybrid potential approach. As a proof-of-concept, we have developed an accurate deep learning potential (DP) for UiO-66, a metal-organic framework, and used this DP to perform hybrid potential simulations, modeling diffusion of neon and xenon through the crystal. The adsorbate-adsorbate interactions were modeled with Lennard-Jones (LJ) potentials, the adsorbent-adsorbent interactions were described by the DP, and the adsorbent-adsorbate interactions used LJ cross-interactions. Thus, our hybrid potential allows for adsorbent-adsorbate interactions with classical potentials but models the response of the adsorbent to the presence of the adsorbate through near-DFT accuracy DPs. This hybrid approach does not require refitting the DP for new adsorbates. We calculated self-diffusion coefficients for Ne in UiO-66 from DFT-MD, our hybrid DP/LJ approach, and from two different classical potentials for UiO-66. Our DP/LJ results are in excellent agreement with DFT-MD. We modeled diffusion of Xe in UiO-66 with DP/LJ and a classical potential. Diffusion of Xe in UiO-66 is about a factor of 30 slower than that of Ne, so it is not computationally feasible to compute Xe diffusion with DFT-MD. Our hybrid DP-classical potential approach can be applied to other MOFs and other adsorbates, making it possible to use an accurate DP generated from DFT simulations of an empty adsorbent in concert with existing classical potentials for adsorbates to model adsorption and diffusion within the porous material, including adsorbate-induced changes to the framework.
Collapse
Affiliation(s)
- Siddarth K Achar
- Computational Modeling & Simulation Program, University of Pittsburgh, Pittsburgh, Pennsylvania 15260, United States
| | - Jacob J Wardzala
- Department of Chemical & Petroleum Engineering, University of Pittsburgh, Pittsburgh, Pennsylvania 15261, United States
| | - Leonardo Bernasconi
- Center for Research Computing and Department of Chemistry, University of Pittsburgh, Pittsburgh, Pennsylvania 15260, United States
| | - Linfeng Zhang
- DP Technology, Beijing 100080, China.,AI for Science Institute, Beijing 100080, China
| | - J Karl Johnson
- Department of Chemical & Petroleum Engineering, University of Pittsburgh, Pittsburgh, Pennsylvania 15261, United States
| |
Collapse
|
14
|
Xu Y, Huang X, Li C, Wei Z, Wang M. Predicting Structure‐dependent Properties Directly from the
3D
Molecular Images via Convolutional Neural Networks. AIChE J 2022. [DOI: 10.1002/aic.17721] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Affiliation(s)
- Yunhao Xu
- School of Chemistry and Chemical Engineering Chongqing University Chongqing 400044 China
| | - Xun Huang
- School of Chemistry and Chemical Engineering Chongqing University Chongqing 400044 China
| | - Cunpu Li
- School of Chemistry and Chemical Engineering Chongqing University Chongqing 400044 China
| | - Zidong Wei
- School of Chemistry and Chemical Engineering Chongqing University Chongqing 400044 China
| | - Meng Wang
- School of Chemistry and Chemical Engineering Chongqing University Chongqing 400044 China
| |
Collapse
|
15
|
Abstract
In the past two decades, machine learning potentials (MLPs) have reached a level of maturity that now enables applications to large-scale atomistic simulations of a wide range of systems in chemistry, physics, and materials science. Different machine learning algorithms have been used with great success in the construction of these MLPs. In this review, we discuss an important group of MLPs relying on artificial neural networks to establish a mapping from the atomic structure to the potential energy. In spite of this common feature, there are important conceptual differences among MLPs, which concern the dimensionality of the systems, the inclusion of long-range electrostatic interactions, global phenomena like nonlocal charge transfer, and the type of descriptor used to represent the atomic structure, which can be either predefined or learnable. A concise overview is given along with a discussion of the open challenges in the field. Expected final online publication date for the Annual Review of Physical Chemistry, Volume 73 is April 2022. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.
Collapse
Affiliation(s)
- Emir Kocer
- Institut für Physikalische Chemie, Theoretische Chemie, Universität Göttingen, Göttingen, Germany;, ,
| | - Tsz Wai Ko
- Institut für Physikalische Chemie, Theoretische Chemie, Universität Göttingen, Göttingen, Germany;, ,
| | - Jörg Behler
- Institut für Physikalische Chemie, Theoretische Chemie, Universität Göttingen, Göttingen, Germany;, ,
| |
Collapse
|
16
|
How WB, Wang B, Chu W, Tkatchenko A, Prezhdo OV. Significance of the Chemical Environment of an Element in Nonadiabatic Molecular Dynamics: Feature Selection and Dimensionality Reduction with Machine Learning. J Phys Chem Lett 2021; 12:12026-12032. [PMID: 34902248 DOI: 10.1021/acs.jpclett.1c03469] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Using supervised and unsupervised machine learning (ML) on features generated from nonadiabatic (NA) molecular dynamics (MD) trajectories under the classical path approximation, we demonstrate that mutual information with the NA Hamiltonian can be used for feature selection and model simplification. Focusing on CsPbI3, a popular metal halide perovskite, we observe that the chemical environment of a single element is sufficient for predicting the NA Hamiltonian. The conclusion applies even to Cs, although Cs does not contribute to the relevant wave functions. Interatomic distances between Cs and I or Pb and the octahedral tilt angle are the most important features. We reduce a typical 360-parameter ML force-field model to just a 12-parameter NA Hamiltonian model, while maintaining a high NA-MD simulation quality. Because NA-MD is a valuable tool for studying excited state processes, overcoming its high computational cost through simple ML models will streamline NA-MD simulations and expand the ranges of accessible system size and simulation time.
Collapse
Affiliation(s)
- Wei Bin How
- Division of Chemistry and Biological Chemistry, School of Physical and Mathematical Sciences, Nanyang Technological University, 637371 Singapore
| | - Bipeng Wang
- Department of Chemical Engineering, University of Southern California, Los Angeles, California 90089, United States
| | - Weibin Chu
- Department of Chemistry, University of Southern California, Los Angeles, California 90089, United States
| | - Alexandre Tkatchenko
- Department of Physics and Materials Science, University of Luxembourg, L-1511 Luxembourg City, Luxembourg
| | - Oleg V Prezhdo
- Department of Chemical Engineering, University of Southern California, Los Angeles, California 90089, United States
- Department of Chemistry, University of Southern California, Los Angeles, California 90089, United States
- Department of Physics and Astronomy, University of Southern California, Los Angeles, California 90089, United States
| |
Collapse
|
17
|
Matlock MK, Hoffman M, Dang NL, Folmsbee DL, Langkamp LA, Hutchison GR, Kumar N, Sarullo K, Swamidass SJ. Deep Learning Coordinate-Free Quantum Chemistry. J Phys Chem A 2021; 125:8978-8986. [PMID: 34609871 DOI: 10.1021/acs.jpca.1c04462] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Computing quantum chemical properties of small molecules and polymers can provide insights valuable into physicists, chemists, and biologists when designing new materials, catalysts, biological probes, and drugs. Deep learning can compute quantum chemical properties accurately in a fraction of time required by commonly used methods such as density functional theory. Most current approaches to deep learning in quantum chemistry begin with geometric information from experimentally derived molecular structures or pre-calculated atom coordinates. These approaches have many useful applications, but they can be costly in time and computational resources. In this study, we demonstrate that accurate quantum chemical computations can be performed without geometric information by operating in the coordinate-free domain using deep learning on graph encodings. Coordinate-free methods rely only on molecular graphs, the connectivity of atoms and bonds, without atom coordinates or bond distances. We also find that the choice of graph-encoding architecture substantially affects the performance of these methods. The structures of these graph-encoding architectures provide an opportunity to probe an important, outstanding question in quantum mechanics: what types of quantum chemical properties can be represented by local variable models? We find that Wave, a local variable model, accurately calculates the quantum chemical properties, while graph convolutional architectures require global variables. Furthermore, local variable Wave models outperform global variable graph convolution models on complex molecules with large, correlated systems.
Collapse
Affiliation(s)
- Matthew K Matlock
- Department of Pathology and Immunology, Washington University in St. Louis, Saint Louis, Missouri 63130, United States
| | - Max Hoffman
- Department of Pathology and Immunology, Washington University in St. Louis, Saint Louis, Missouri 63130, United States
| | - Na Le Dang
- Department of Pathology and Immunology, Washington University in St. Louis, Saint Louis, Missouri 63130, United States
| | - Dakota L Folmsbee
- Department of Chemistry, University of Pittsburgh, Pittsburgh, Pennsylvania 15260, United States
| | - Luke A Langkamp
- Department of Chemistry, University of Pittsburgh, Pittsburgh, Pennsylvania 15260, United States
| | - Geoffrey R Hutchison
- Department of Chemistry, University of Pittsburgh, Pittsburgh, Pennsylvania 15260, United States.,Department of Chemical and Petroleum Engineering, University of Pittsburgh, Pittsburgh, Pennsylvania 15260, United States
| | - Neeraj Kumar
- Pacific Northwest National Laboratory, Computational Biology and Bioinformatics Group, Richland, Washington 99354, United States
| | - Kathryn Sarullo
- Department of Pathology and Immunology, Washington University in St. Louis, Saint Louis, Missouri 63130, United States
| | - S Joshua Swamidass
- Department of Pathology and Immunology, Washington University in St. Louis, Saint Louis, Missouri 63130, United States.,Washington University in St. Louis, Institute for Informatics, Saint Louis, Missouri 63130, United States
| |
Collapse
|
18
|
Abstract
Chemical compound space (CCS), the set of all theoretically conceivable combinations of chemical elements and (meta-)stable geometries that make up matter, is colossal. The first-principles based virtual sampling of this space, for example, in search of novel molecules or materials which exhibit desirable properties, is therefore prohibitive for all but the smallest subsets and simplest properties. We review studies aimed at tackling this challenge using modern machine learning techniques based on (i) synthetic data, typically generated using quantum mechanics based methods, and (ii) model architectures inspired by quantum mechanics. Such Quantum mechanics based Machine Learning (QML) approaches combine the numerical efficiency of statistical surrogate models with an ab initio view on matter. They rigorously reflect the underlying physics in order to reach universality and transferability across CCS. While state-of-the-art approximations to quantum problems impose severe computational bottlenecks, recent QML based developments indicate the possibility of substantial acceleration without sacrificing the predictive power of quantum mechanics.
Collapse
Affiliation(s)
- Bing Huang
- Faculty
of Physics, University of Vienna, 1090 Vienna, Austria
| | - O. Anatole von Lilienfeld
- Faculty
of Physics, University of Vienna, 1090 Vienna, Austria
- Institute
of Physical Chemistry and National Center for Computational Design
and Discovery of Novel Materials (MARVEL), Department of Chemistry, University of Basel, 4056 Basel, Switzerland
| |
Collapse
|
19
|
Zhang J, Fuller J, An Q. Coordination and Thermophysical Properties of Transition Metal Chlorocomplexes in LiCl-KCl Eutectic. J Phys Chem B 2021; 125:8876-8887. [PMID: 34328331 DOI: 10.1021/acs.jpcb.1c03748] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Eutectic LiCl-KCl molten salt is often used in molten salt reactors as the primary coolant due to its high thermal capacity and high solubility of fission products. Thermophysical properties, such as density, heat capacity, and viscosity, are important parameters for engineering applications of molten salts but may be significantly influenced by metal solutes from corrosion of metallic structural materials. The behavior of the LiCl-KCl eutectic composition is well researched, yet the effects on these properties due to chlorocomplex formation from metals dissolved in the salt are less well known. These properties are often difficult to accurately measure from experimental methods due to the issues arising from the dissolved species, such as volatility. Here, we applied a combination of quantum mechanics molecular dynamics (QM-MD) and deep machine learning force field (DP-FF) molecular dynamics simulations to investigate the structural and thermophysical properties of LiCl-KCl eutectic as well as the influence of dissolved transition metal chlorocomplexes NiCl2 and CrCl3 at low concentrations. We find that the dissolution of Ni and Cr in the LiCl-KCl system forms the local tetrahedral (NiCl4)2- and octahedral (CrCl6)3- chlorocomplexes, respectively, which do not have a significant impact on the overall liquid salt structures. In addition, the thermodynamic properties including diffusion constant and specific heat capacity are not significantly affected by these chlorocomplexes. However, the viscosity significantly increases in the temperature range of 673-773 K. This study thus provides essential information for evaluating the effects of dissolved metals on the thermophysical and transport properties of molten salts.
Collapse
Affiliation(s)
- Jing Zhang
- Department of Chemical and Materials Engineering, University of Nevada-Reno, Reno, Nevada 89557, United States
| | - Jon Fuller
- Department of Chemical and Materials Engineering, University of Nevada-Reno, Reno, Nevada 89557, United States
| | - Qi An
- Department of Chemical and Materials Engineering, University of Nevada-Reno, Reno, Nevada 89557, United States
| |
Collapse
|
20
|
Ren H, Li H, Zhang Q, Liang L, Guo W, Huang F, Luo Y, Jiang J. A machine learning vibrational spectroscopy protocol for spectrum prediction and spectrum-based structure recognition. FUNDAMENTAL RESEARCH 2021. [DOI: 10.1016/j.fmre.2021.05.005] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022] Open
|
21
|
Gao P, Zhang J, Qiu H, Zhao S. A general QSPR protocol for the prediction of atomic/inter-atomic properties: a fragment based graph convolutional neural network (F-GCN). Phys Chem Chem Phys 2021; 23:13242-13249. [PMID: 34086015 DOI: 10.1039/d1cp00677k] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
In this study, a general quantitative structure-property relationship (QSPR) protocol, fragment based graph convolutional neural network (F-GCN), was developed for the prediction of atomic/inter-atomic properties. We applied this novel artificial intelligence (AI) tool in predictions of NMR chemical shifts and bond dissociation energies (BDEs). The obtained results were comparable to experimental measurements, while the computational cost was substantially reduced, with respect to pure density functional theory (DFT) calculations. The two important features of F-GCN can be summarised as: first, it could utilise different levels of molecular fragments for atomic/inter-atomic information extraction; second, the designed architecture is also open to include additional descriptors for a more accurate solution of the local environment at atomic level, making itself more efficient for structural solutions. And during our test, the averaged prediction error of 1H NMR chemical shifts is as small as 0.32 ppm, and the error of C-H BDE estimation is 2.7 kcal mol-1. Moreover, we further demonstrated the applicability of this developed F-GCN model via several challenging structural assignments. The success of the F-GCN in atomic and inter-atomic predictions also indicates an essential improvement of computational chemistry with the assistance of AI tools.
Collapse
Affiliation(s)
- Peng Gao
- School of Chemistry and Molecular Bioscience, University of Wollongong, NSW 2500, Australia
| | - Jie Zhang
- Centre of Chemistry and Chemical Biology, Bioland Laboratory (Guangzhou Regenerative Medicine and Health-Guangdong Laboratory), Guangzhou 53000, China. and School of Chemical Engineering, East China University of Science and Technology, Shanghai 200237, China
| | - Hongbo Qiu
- Department of Chemical Engineering, Monash University, Clayton, VIC 3800, Australia
| | - Shuaifei Zhao
- Institute for Frontier Materials (IFM), Deakin University, Perth, WA, Australia
| |
Collapse
|
22
|
Friederich P, Häse F, Proppe J, Aspuru-Guzik A. Machine-learned potentials for next-generation matter simulations. NATURE MATERIALS 2021; 20:750-761. [PMID: 34045696 DOI: 10.1038/s41563-020-0777-6] [Citation(s) in RCA: 112] [Impact Index Per Article: 37.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/21/2019] [Accepted: 07/17/2020] [Indexed: 05/18/2023]
Abstract
The choice of simulation methods in computational materials science is driven by a fundamental trade-off: bridging large time- and length-scales with highly accurate simulations at an affordable computational cost. Venturing the investigation of complex phenomena on large scales requires fast yet accurate computational methods. We review the emerging field of machine-learned potentials, which promises to reach the accuracy of quantum mechanical computations at a substantially reduced computational cost. This Review will summarize the basic principles of the underlying machine learning methods, the data acquisition process and active learning procedures. We highlight multiple recent applications of machine-learned potentials in various fields, ranging from organic chemistry and biomolecules to inorganic crystal structure predictions and surface science. We furthermore discuss the developments required to promote a broader use of ML potentials, and the possibility of using them to help solve open questions in materials science and facilitate fully computational materials design.
Collapse
Affiliation(s)
- Pascal Friederich
- Chemical Physics Theory Group, Department of Chemistry, University of Toronto, Toronto, Ontario, Canada
- Department of Computer Science, University of Toronto, Toronto, Ontario, Canada
- Institute of Nanotechnology, Karlsruhe Institute of Technology, Eggenstein-Leopoldshafen, Germany
- Institute of Theoretical Informatics, Karlsruhe Institute of Technology, Karlsruhe, Germany
| | - Florian Häse
- Chemical Physics Theory Group, Department of Chemistry, University of Toronto, Toronto, Ontario, Canada
- Department of Computer Science, University of Toronto, Toronto, Ontario, Canada
- Vector Institute for Artificial Intelligence, Toronto, Ontario, Canada
- Department of Chemistry and Chemical Biology, Harvard University, Cambridge, MA, USA
| | - Jonny Proppe
- Chemical Physics Theory Group, Department of Chemistry, University of Toronto, Toronto, Ontario, Canada
- Department of Computer Science, University of Toronto, Toronto, Ontario, Canada
- Institute of Physical Chemistry, Georg-August University, Göttingen, Germany
| | - Alán Aspuru-Guzik
- Chemical Physics Theory Group, Department of Chemistry, University of Toronto, Toronto, Ontario, Canada.
- Department of Computer Science, University of Toronto, Toronto, Ontario, Canada.
- Vector Institute for Artificial Intelligence, Toronto, Ontario, Canada.
- Department of Chemistry and Chemical Biology, Harvard University, Cambridge, MA, USA.
- Lebovic Fellow, Canadian Institute for Advanced Research (CIFAR), Toronto, Ontario, Canada.
| |
Collapse
|
23
|
Hou F, Ma Y, Hu Z, Ding S, Fu H, Wang L, Zhang X, Li G. Machine Learning Enabled Quickly Predicting of Detonation Properties of N‐Containing Molecules for Discovering New Energetic Materials. ADVANCED THEORY AND SIMULATIONS 2021. [DOI: 10.1002/adts.202100057] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Affiliation(s)
- Fang Hou
- Key Laboratory for Green Chemical Technology of Ministry of Education School of Chemical Engineering and Technology Tianjin University Tianjin 300072 China
| | - Yi Ma
- College of Intelligence and Computing Tianjin University Tianjin 300072 China
| | - Zheng Hu
- Key Laboratory for Green Chemical Technology of Ministry of Education School of Chemical Engineering and Technology Tianjin University Tianjin 300072 China
| | - Shining Ding
- Key Laboratory for Green Chemical Technology of Ministry of Education School of Chemical Engineering and Technology Tianjin University Tianjin 300072 China
| | - Haihan Fu
- Key Laboratory for Green Chemical Technology of Ministry of Education School of Chemical Engineering and Technology Tianjin University Tianjin 300072 China
| | - Li Wang
- Key Laboratory for Green Chemical Technology of Ministry of Education School of Chemical Engineering and Technology Tianjin University Tianjin 300072 China
- Collaborative Innovation Center of Chemical Science and Engineering (Tianjin) Tianjin 300072 China
| | - Xiangwen Zhang
- Key Laboratory for Green Chemical Technology of Ministry of Education School of Chemical Engineering and Technology Tianjin University Tianjin 300072 China
- Collaborative Innovation Center of Chemical Science and Engineering (Tianjin) Tianjin 300072 China
| | - Guozhu Li
- Key Laboratory for Green Chemical Technology of Ministry of Education School of Chemical Engineering and Technology Tianjin University Tianjin 300072 China
- Collaborative Innovation Center of Chemical Science and Engineering (Tianjin) Tianjin 300072 China
| |
Collapse
|
24
|
Gong S, Wang Y, Tian Y, Wang L, Liu G. Rapid enthalpy prediction of transition states using molecular graph convolutional network. AIChE J 2021. [DOI: 10.1002/aic.17269] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Affiliation(s)
- Siyuan Gong
- Key Laboratory for Green Chemical Technology of Ministry of Education, School of Chemical Engineering and Technology Tianjin University Tianjin China
| | - Yutong Wang
- Key Laboratory for Green Chemical Technology of Ministry of Education, School of Chemical Engineering and Technology Tianjin University Tianjin China
| | - Yajie Tian
- Key Laboratory for Green Chemical Technology of Ministry of Education, School of Chemical Engineering and Technology Tianjin University Tianjin China
- Henan Engineering Research Center of Resource and Energy Recovery from Waste, College of Chemistry and Chemical Engineering Henan University Kaifeng China
| | - Li Wang
- Key Laboratory for Green Chemical Technology of Ministry of Education, School of Chemical Engineering and Technology Tianjin University Tianjin China
- Collaborative Innovation Center of Chemical Science and Engineering (Tianjin) Tianjin University Tianjin China
| | - Guozhu Liu
- Key Laboratory for Green Chemical Technology of Ministry of Education, School of Chemical Engineering and Technology Tianjin University Tianjin China
- Collaborative Innovation Center of Chemical Science and Engineering (Tianjin) Tianjin University Tianjin China
| |
Collapse
|
25
|
Abstract
We introduce new and robust decompositions of mean-field Hartree-Fock and Kohn-Sham density functional theory relying on the use of localized molecular orbitals and physically sound charge population protocols. The new lossless property decompositions, which allow for partitioning one-electron reduced density matrices into either bond-wise or atomic contributions, are compared to alternatives from the literature with regard to both molecular energies and dipole moments. Besides commenting on possible applications as an interpretative tool in the rationalization of certain electronic phenomena, we demonstrate how decomposed mean-field theory makes it possible to expose and amplify compositional features in the context of machine-learned quantum chemistry. This is made possible by improving upon the granularity of the underlying data. On the basis of our preliminary proof-of-concept results, we conjecture that many of the structure-property inferences in existence today may be further refined by efficiently leveraging an increase in dataset complexity and richness.
Collapse
Affiliation(s)
- Janus J Eriksen
- School of Chemistry, University of Bristol, Cantock's Close, Bristol BS8 1TS, United Kingdom
| |
Collapse
|
26
|
Sarullo K, Matlock MK, Swamidass SJ. Site-Level Bioactivity of Small-Molecules from Deep-Learned Representations of Quantum Chemistry. J Phys Chem A 2020; 124:9194-9202. [PMID: 33084331 DOI: 10.1021/acs.jpca.0c06231] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
Atom- or bond-level chemical properties of interest in medicinal chemistry, such as drug metabolism and electrophilic reactivity, are important to understand and predict across arbitrary new molecules. Deep learning can be used to map molecular structures to their chemical properties, but the data sets for these tasks are relatively small, which can limit accuracy and generalizability. To overcome this limitation, it would be preferable to model these properties on the basis of the underlying quantum chemical characteristics of small molecules. However, it is difficult to learn higher level chemical properties from lower level quantum calculations. To overcome this challenge, we pretrained deep learning models to compute quantum chemical properties and then reused the intermediate representations constructed by the pretrained network. Transfer learning, in this way, substantially outperformed models based on chemical graphs alone or quantum chemical properties alone. This result was robust, observable in five prediction tasks: identifying sites of epoxidation by metabolic enzymes and identifying sites of covalent reactivity with cyanide, glutathione, DNA and protein. We see that this approach may substantially improve the accuracy of deep learning models for specific chemical structures, such as aromatic systems.
Collapse
Affiliation(s)
- Kathryn Sarullo
- Department of Pathology and Immunology, School of Medicine, Washington University in St. Louis, Saint Louis, Missouri 63110, United States
| | - Matthew K Matlock
- Department of Pathology and Immunology, School of Medicine, Washington University in St. Louis, Saint Louis, Missouri 63110, United States
| | - S Joshua Swamidass
- Department of Pathology and Immunology, School of Medicine, Washington University in St. Louis, Saint Louis, Missouri 63110, United States
| |
Collapse
|
27
|
Hanaoka K. Deep Neural Networks for Multicomponent Molecular Systems. ACS OMEGA 2020; 5:21042-21053. [PMID: 32875241 PMCID: PMC7450624 DOI: 10.1021/acsomega.0c02599] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/01/2020] [Accepted: 07/20/2020] [Indexed: 06/11/2023]
Abstract
Deep neural networks (DNNs) represent promising approaches to molecular machine learning (ML). However, their applicability remains limited to single-component materials and a general DNN model capable of handling various multicomponent molecular systems with composition data is still elusive, while current ML approaches for multicomponent molecular systems are still molecular descriptor-based. Here, a general DNN architecture extending existing molecular DNN models to multicomponent systems called MEIA is proposed. Case studies showed that the MEIA architecture could extend two exiting molecular DNN models to multicomponent systems with the same procedure, and that the obtained models that could learn both the molecular structure and composition information with equal or better accuracies compared to a well-used molecular descriptor-based model in the best model for each case study. Furthermore, the case studies also showed that, for ML tasks where the molecular structure information plays a minor role, the performance improvements by DNN models were small; while for ML tasks where the molecular structure information plays a major role, the performance improvements by DNN models were large, and DNN models showed notable predictive accuracies for an extremely sparse dataset, which cannot be modeled without the molecular structure information. The enhanced predictive ability of DNN models for sparse datasets of multicomponent systems will extend the applicability of ML in the multicomponent material design. Furthermore, the general capability of MEIA to extend DNN models to multicomponent systems will provide new opportunities to utilize the progress of actively developed single-component DNNs for the modeling of multicomponent systems.
Collapse
|
28
|
Collins EM, Raghavachari K. Effective Molecular Descriptors for Chemical Accuracy at DFT Cost: Fragmentation, Error-Cancellation, and Machine Learning. J Chem Theory Comput 2020; 16:4938-4950. [PMID: 32678593 DOI: 10.1021/acs.jctc.0c00236] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/08/2023]
Abstract
Recent advances in theoretical thermochemistry have allowed the study of small organic and bio-organic molecules with high accuracy. However, applications to larger molecules are still impeded by the steep scaling problem of highly accurate quantum mechanical (QM) methods, forcing the use of approximate, more cost-effective methods at a greatly reduced accuracy. One of the most successful strategies to mitigate this error is the use of systematic error-cancellation schemes, in which highly accurate QM calculations can be performed on small portions of the molecule to construct corrections to an approximate method. Herein, we build on ideas from fragmentation and error-cancellation to introduce a new family of molecular descriptors for machine learning modeled after the Connectivity-Based Hierarchy (CBH) of generalized isodesmic reaction schemes. The best performing descriptor ML(CBH-2) is constructed from fragments preserving only the immediate connectivity of all heavy (non-H) atoms of a molecule along with overlapping regions of fragments in accordance with the inclusion-exclusion principle. Our proposed approach offers a simple, chemically intuitive grouping of atoms, tuned with an optimal amount of error-cancellation, and outperforms previous structure-based descriptors using a much smaller input vector length. For a wide variety of density functionals, DFT+ΔML(CBH-2) models, trained on a set of small- to medium-sized organic HCNOSCl-containing molecules, achieved an out-of-sample MAE within 0.5 kcal/mol and 2σ (95%) confidence interval of <1.5 kcal/mol compared to accurate G4 reference values at DFT cost.
Collapse
Affiliation(s)
- Eric M Collins
- Department of Chemistry, Indiana University, Bloomington, Indiana 47405, United States
| | - Krishnan Raghavachari
- Department of Chemistry, Indiana University, Bloomington, Indiana 47405, United States
| |
Collapse
|
29
|
Pattnaik P, Raghunathan S, Kalluri T, Bhimalapuram P, Jawahar CV, Priyakumar UD. Machine Learning for Accurate Force Calculations in Molecular Dynamics Simulations. J Phys Chem A 2020; 124:6954-6967. [DOI: 10.1021/acs.jpca.0c03926] [Citation(s) in RCA: 23] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Affiliation(s)
- Punyaslok Pattnaik
- Center for Computational Natural Sciences and Bioinformatics, International Institute of Information Technology, Hyderabad 500 032, India
| | - Shampa Raghunathan
- Center for Computational Natural Sciences and Bioinformatics, International Institute of Information Technology, Hyderabad 500 032, India
| | - Tarun Kalluri
- Center for Visual Information Technology, KCIS, International Institute of Information Technology, Hyderabad 500 032, India
| | - Prabhakar Bhimalapuram
- Center for Computational Natural Sciences and Bioinformatics, International Institute of Information Technology, Hyderabad 500 032, India
| | - C. V. Jawahar
- Center for Visual Information Technology, KCIS, International Institute of Information Technology, Hyderabad 500 032, India
| | - U. Deva Priyakumar
- Center for Computational Natural Sciences and Bioinformatics, International Institute of Information Technology, Hyderabad 500 032, India
| |
Collapse
|
30
|
Xie X, Persson KA, Small DW. Incorporating Electronic Information into Machine Learning Potential Energy Surfaces via Approaching the Ground-State Electronic Energy as a Function of Atom-Based Electronic Populations. J Chem Theory Comput 2020; 16:4256-4270. [PMID: 32502350 DOI: 10.1021/acs.jctc.0c00217] [Citation(s) in RCA: 32] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Machine learning (ML) approximations to density functional theory (DFT) potential energy surfaces (PESs) are showing great promise for reducing the computational cost of accurate molecular simulations, but at present, they are not applicable to varying electronic states, and in particular, they are not well suited for molecular systems in which the local electronic structure is sensitive to the medium to long-range electronic environment. With this issue as the focal point, we present a new machine learning approach called "BpopNN" for obtaining efficient approximations to DFT PESs. Conceptually, the methodology is based on approaching the true DFT energy as a function of electron populations on atoms; in practice, this is realized with available density functionals and constrained DFT (CDFT). The new approach creates approximations to this function with neural networks. These approximations thereby incorporate electronic information naturally into a ML approach, and optimizing the model energy with respect to populations allows the electronic terms to self-consistently adapt to the environment, as in DFT. We confirm the effectiveness of this approach with a variety of calculations on LinHn clusters.
Collapse
Affiliation(s)
- Xiaowei Xie
- Department of Chemistry, University of California, Berkeley, California 94720, United States.,Energy Technologies Area, Lawrence Berkeley National Laboratory, Berkeley, California 94720, United States
| | - Kristin A Persson
- Energy Technologies Area, Lawrence Berkeley National Laboratory, Berkeley, California 94720, United States.,Department of Materials Science and Engineering, University of California, Berkeley, California 94720, United States
| | - David W Small
- Department of Chemistry, University of California, Berkeley, California 94720, United States.,Molecular Graphics and Computation Facility, College of Chemistry, University of California, Berkeley 94720, California United States
| |
Collapse
|
31
|
Muratov EN, Bajorath J, Sheridan RP, Tetko IV, Filimonov D, Poroikov V, Oprea TI, Baskin II, Varnek A, Roitberg A, Isayev O, Curtarolo S, Fourches D, Cohen Y, Aspuru-Guzik A, Winkler DA, Agrafiotis D, Cherkasov A, Tropsha A. QSAR without borders. Chem Soc Rev 2020; 49:3525-3564. [PMID: 32356548 PMCID: PMC8008490 DOI: 10.1039/d0cs00098a] [Citation(s) in RCA: 319] [Impact Index Per Article: 79.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
Prediction of chemical bioactivity and physical properties has been one of the most important applications of statistical and more recently, machine learning and artificial intelligence methods in chemical sciences. This field of research, broadly known as quantitative structure-activity relationships (QSAR) modeling, has developed many important algorithms and has found a broad range of applications in physical organic and medicinal chemistry in the past 55+ years. This Perspective summarizes recent technological advances in QSAR modeling but it also highlights the applicability of algorithms, modeling methods, and validation practices developed in QSAR to a wide range of research areas outside of traditional QSAR boundaries including synthesis planning, nanotechnology, materials science, biomaterials, and clinical informatics. As modern research methods generate rapidly increasing amounts of data, the knowledge of robust data-driven modelling methods professed within the QSAR field can become essential for scientists working both within and outside of chemical research. We hope that this contribution highlighting the generalizable components of QSAR modeling will serve to address this challenge.
Collapse
Affiliation(s)
- Eugene N Muratov
- UNC Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, NC, USA.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
32
|
Prediction of organic homolytic bond dissociation enthalpies at near chemical accuracy with sub-second computational cost. Nat Commun 2020; 11:2328. [PMID: 32393773 PMCID: PMC7214445 DOI: 10.1038/s41467-020-16201-z] [Citation(s) in RCA: 85] [Impact Index Per Article: 21.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2019] [Accepted: 04/15/2020] [Indexed: 12/31/2022] Open
Abstract
Bond dissociation enthalpies (BDEs) of organic molecules play a fundamental role in determining chemical reactivity and selectivity. However, BDE computations at sufficiently high levels of quantum mechanical theory require substantial computing resources. In this paper, we develop a machine learning model capable of accurately predicting BDEs for organic molecules in a fraction of a second. We perform automated density functional theory (DFT) calculations at the M06-2X/def2-TZVP level of theory for 42,577 small organic molecules, resulting in 290,664 BDEs. A graph neural network trained on a subset of these results achieves a mean absolute error of 0.58 kcal mol-1 (vs DFT) for BDEs of unseen molecules. We further demonstrate the model on two applications: first, we rapidly and accurately predict major sites of hydrogen abstraction in the metabolism of drug-like molecules, and second, we determine the dominant molecular fragmentation pathways during soot formation.
Collapse
|
33
|
Smith JS, Zubatyuk R, Nebgen B, Lubbers N, Barros K, Roitberg AE, Isayev O, Tretiak S. The ANI-1ccx and ANI-1x data sets, coupled-cluster and density functional theory properties for molecules. Sci Data 2020; 7:134. [PMID: 32358545 PMCID: PMC7195467 DOI: 10.1038/s41597-020-0473-z] [Citation(s) in RCA: 76] [Impact Index Per Article: 19.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2019] [Accepted: 03/24/2020] [Indexed: 11/22/2022] Open
Abstract
Maximum diversification of data is a central theme in building generalized and accurate machine learning (ML) models. In chemistry, ML has been used to develop models for predicting molecular properties, for example quantum mechanics (QM) calculated potential energy surfaces and atomic charge models. The ANI-1x and ANI-1ccx ML-based general-purpose potentials for organic molecules were developed through active learning; an automated data diversification process. Here, we describe the ANI-1x and ANI-1ccx data sets. To demonstrate data diversity, we visualize it with a dimensionality reduction scheme, and contrast against existing data sets. The ANI-1x data set contains multiple QM properties from 5 M density functional theory calculations, while the ANI-1ccx data set contains 500 k data points obtained with an accurate CCSD(T)/CBS extrapolation. Approximately 14 million CPU core-hours were expended to generate this data. Multiple QM calculated properties for the chemical elements C, H, N, and O are provided: energies, atomic forces, multipole moments, atomic charges, etc. We provide this data to the community to aid research and development of ML models for chemistry.
Collapse
Affiliation(s)
- Justin S Smith
- Center for Nonlinear Studies, Los Alamos National Laboratory, Los Alamos, NM, USA
- Theoretical Division, Los Alamos National Laboratory, Los Alamos, NM, USA
| | - Roman Zubatyuk
- Theoretical Division, Los Alamos National Laboratory, Los Alamos, NM, USA
- Department of Chemistry, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Benjamin Nebgen
- Theoretical Division, Los Alamos National Laboratory, Los Alamos, NM, USA
| | - Nicholas Lubbers
- Computer, Computational, and Statistical Sciences Division, Los Alamos National Laboratory, Los Alamos, NM, USA
| | - Kipton Barros
- Computer, Computational, and Statistical Sciences Division, Los Alamos National Laboratory, Los Alamos, NM, USA
| | - Adrian E Roitberg
- University of Florida, Department of Chemistry, PO Box 117200, 32611-7200, Gainesville, USA.
| | - Olexandr Isayev
- Department of Chemistry, Carnegie Mellon University, Pittsburgh, PA, USA.
| | - Sergei Tretiak
- Theoretical Division, Los Alamos National Laboratory, Los Alamos, NM, USA.
| |
Collapse
|
34
|
Yu H, Wang Y, Wang X, Zhang J, Ye S, Huang Y, Luo Y, Sharman E, Chen S, Jiang J. Using Machine Learning to Predict the Dissociation Energy of Organic Carbonyls. J Phys Chem A 2020; 124:3844-3850. [PMID: 32315178 DOI: 10.1021/acs.jpca.0c01280] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Bond dissociation energy (BDE), an indicator of the strength of chemical bonds, exhibits great potential for evaluating and screening high-performance materials and catalysts, which are of critical importance in industrial applications. However, the measurement or computation of BDE via conventional experimental or theoretical methods is usually costly and involved, substantially preventing the BDE from being applied to large-scale and high-throughput studies. Therefore, a potentially more efficient approach for estimating BDE is highly desirable. To this end, we combined first-principles calculations and machine learning techniques, including neural networks and random forest, to explore the inner relationships between carbonyl structure and its BDE. Results show that machine learning can not only effectively reproduce the computed BDEs of carbonyls but also in turn serve as guidance for the rational design of carbonyl structure aimed at optimizing performance.
Collapse
Affiliation(s)
- Haishan Yu
- Hefei National Laboratory for Physical Sciences at the Microscale, Department of Chemistry and Materials Science, University of Science and Technology of China, Hefei 230026, Anhui, China
| | - Ying Wang
- Key Laboratory of Cluster Science of Ministry of Education, School of Chemistry and Chemical Engineering, Beijing Institute of Technology, Beijing 100081, China
| | - Xijun Wang
- Department of Chemical and Biomolecular Engineering, North Carolina State University, Raleigh 27606, North Carolina, United States
| | - Jinxiao Zhang
- Hefei National Laboratory for Physical Sciences at the Microscale, Department of Chemistry and Materials Science, University of Science and Technology of China, Hefei 230026, Anhui, China
| | - Sheng Ye
- Hefei National Laboratory for Physical Sciences at the Microscale, Department of Chemistry and Materials Science, University of Science and Technology of China, Hefei 230026, Anhui, China
| | - Yan Huang
- Hefei National Laboratory for Physical Sciences at the Microscale, Department of Chemistry and Materials Science, University of Science and Technology of China, Hefei 230026, Anhui, China
| | - Yi Luo
- Hefei National Laboratory for Physical Sciences at the Microscale, Department of Chemistry and Materials Science, University of Science and Technology of China, Hefei 230026, Anhui, China
| | - Edward Sharman
- Department of Neurology, University of California, Irvine 92697, California, United States
| | - Shilu Chen
- Key Laboratory of Cluster Science of Ministry of Education, School of Chemistry and Chemical Engineering, Beijing Institute of Technology, Beijing 100081, China
| | - Jun Jiang
- Hefei National Laboratory for Physical Sciences at the Microscale, Department of Chemistry and Materials Science, University of Science and Technology of China, Hefei 230026, Anhui, China
| |
Collapse
|
35
|
Abstract
As the quantum chemistry (QC) community embraces machine learning (ML), the number of new methods and applications based on the combination of QC and ML is surging. In this Perspective, a view of the current state of affairs in this new and exciting research field is offered, challenges of using machine learning in quantum chemistry applications are described, and potential future developments are outlined. Specifically, examples of how machine learning is used to improve the accuracy and accelerate quantum chemical research are shown. Generalization and classification of existing techniques are provided to ease the navigation in the sea of literature and to guide researchers entering the field. The emphasis of this Perspective is on supervised machine learning.
Collapse
Affiliation(s)
- Pavlo O Dral
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, Department of Chemistry, and College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, China
| |
Collapse
|
36
|
|
37
|
Laghuvarapu S, Pathak Y, Priyakumar UD. BAND NN: A Deep Learning Framework for Energy Prediction and Geometry Optimization of Organic Small Molecules. J Comput Chem 2019; 41:790-799. [DOI: 10.1002/jcc.26128] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2019] [Revised: 11/13/2019] [Accepted: 11/21/2019] [Indexed: 12/26/2022]
Affiliation(s)
- Siddhartha Laghuvarapu
- Center for Computational Natural Sciences and BioinformaticsInternational Institute of Information Technology Hyderabad 500 032 India
| | - Yashaswi Pathak
- Center for Computational Natural Sciences and BioinformaticsInternational Institute of Information Technology Hyderabad 500 032 India
| | - U. Deva Priyakumar
- Center for Computational Natural Sciences and BioinformaticsInternational Institute of Information Technology Hyderabad 500 032 India
| |
Collapse
|
38
|
Profitt TA, Pearson JK. A shared-weight neural network architecture for predicting molecular properties. Phys Chem Chem Phys 2019; 21:26175-26183. [PMID: 31750845 DOI: 10.1039/c9cp03103k] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]
Abstract
Quantum chemical methods scale poorly with increasing molecular size and machine learning models have emerged as a promising, computationally-efficient alternative. We present a shared-weight neural network architecture based on modified atom-centered symmetry functions (ACSFs) and show that it performs similarly to the more computationally expensive per-element neural networks of previous work with ACSFs. The model achieves chemically accurate predictions, with a mean absolute error as low as 0.63 kcal mol-1 on energy predictions in the QM9 data set. Additionally, we show that it can reliably predict atomic forces.
Collapse
Affiliation(s)
- Trevor A Profitt
- Department of Chemistry, University of Prince Edward Island, Charlottetown, PE, Canada.
| | | |
Collapse
|
39
|
Hu W, Ye S, Zhang Y, Li T, Zhang G, Luo Y, Mukamel S, Jiang J. Machine Learning Protocol for Surface-Enhanced Raman Spectroscopy. J Phys Chem Lett 2019; 10:6026-6031. [PMID: 31538788 DOI: 10.1021/acs.jpclett.9b02517] [Citation(s) in RCA: 46] [Impact Index Per Article: 9.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/18/2023]
Abstract
Surface-enhanced Raman spectroscopy (SERS) is a powerful technique that can capture the electronic-vibrational "fingerprint" of molecules on surfaces. Ab initio prediction of Raman response is a long-standing challenge because of the diversified interfacial structures. Here we show that a cost-effective machine learning (ML) random forest method can predict SERS signals of a trans-1,2-bis (4-pyridyl) ethylene (BPE) molecule adsorbed on a gold substrate. Using geometric descriptors extracted from quantum chemistry simulations of thousands of ab initio molecular dynamics conformations, the ML protocol predicts vibrational frequencies and Raman intensities. The resulting spectra agree with density functional theory calculations and experiment. Predicted SERS responses of the molecule on different surfaces, or under external fields of electric fields and solvent environment, demonstrate the good transferability of the protocol.
Collapse
Affiliation(s)
- Wei Hu
- Shandong Provincial Key Laboratory of Molecular Engineering, School of Chemistry and Pharmaceutical Engineering , Qilu University of Technology , Jinan , Shandong 250353 , P.R. China
- Hefei National Laboratory for Physical Sciences at the Microscale, CAS Center for Excellence in Nanoscience, School of Chemistry and Materials Science , University of Science and Technology of China , Hefei , Anhui 230026 , P.R. China
| | - Sheng Ye
- Hefei National Laboratory for Physical Sciences at the Microscale, CAS Center for Excellence in Nanoscience, School of Chemistry and Materials Science , University of Science and Technology of China , Hefei , Anhui 230026 , P.R. China
| | - Yujin Zhang
- School of Electronic and Information Engineering (Department of Physics) , Qilu University of Technology , Jinan , Shandong 250353 , P.R. China
| | - Tianduo Li
- Shandong Provincial Key Laboratory of Molecular Engineering, School of Chemistry and Pharmaceutical Engineering , Qilu University of Technology , Jinan , Shandong 250353 , P.R. China
| | - Guozhen Zhang
- Hefei National Laboratory for Physical Sciences at the Microscale, CAS Center for Excellence in Nanoscience, School of Chemistry and Materials Science , University of Science and Technology of China , Hefei , Anhui 230026 , P.R. China
| | - Yi Luo
- Hefei National Laboratory for Physical Sciences at the Microscale, CAS Center for Excellence in Nanoscience, School of Chemistry and Materials Science , University of Science and Technology of China , Hefei , Anhui 230026 , P.R. China
| | - Shaul Mukamel
- Departments of Chemistry and Physics and Astronomy , University of California , Irvine , California 92697 , United States
| | - Jun Jiang
- Hefei National Laboratory for Physical Sciences at the Microscale, CAS Center for Excellence in Nanoscience, School of Chemistry and Materials Science , University of Science and Technology of China , Hefei , Anhui 230026 , P.R. China
| |
Collapse
|
40
|
García-Muelas R, López N. Statistical learning goes beyond the d-band model providing the thermochemistry of adsorbates on transition metals. Nat Commun 2019; 10:4687. [PMID: 31615991 PMCID: PMC6794282 DOI: 10.1038/s41467-019-12709-1] [Citation(s) in RCA: 43] [Impact Index Per Article: 8.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2019] [Accepted: 08/23/2019] [Indexed: 12/30/2022] Open
Abstract
The rational design of heterogeneous catalysts relies on the efficient survey of mechanisms by density functional theory (DFT). However, massive reaction networks cannot be sampled effectively as they grow exponentially with the size of reactants. Here we present a statistical principal component analysis and regression applied to the DFT thermochemical data of 71 C\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$${}_{1}$$\end{document}1–C\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$${}_{2}$$\end{document}2 species on 12 close-packed metal surfaces. Adsorption is controlled by covalent (\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$d$$\end{document}d-band center) and ionic terms (reduction potential), modulated by conjugation and conformational contributions. All formation energies can be reproduced from only three key intermediates (predictors) calculated with DFT. The results agree with accurate experimental measurements having error bars comparable to those of DFT. The procedure can be extended to single-atom and near-surface alloys reducing the number of explicit DFT calculation needed by a factor of 20, thus paving the way for a rapid and accurate survey of whole reaction networks on multimetallic surfaces. Assessing catalytic mechanisms using DFT calculations greatly aids catalyst design, but is impractical for large molecules. Here the authors develop a statistical learning-based thermochemical model for estimating adsorption of organics onto metals, retaining DFT accuracy while reducing the number of calculations by a factor of 20.
Collapse
Affiliation(s)
- Rodrigo García-Muelas
- Institute of Chemical Research of Catalonia (ICIQ), The Barcelona Institute of Science and Technology (BIST), Av. Països Catalans 16, 43007, Tarragona, Spain.
| | - Núria López
- Institute of Chemical Research of Catalonia (ICIQ), The Barcelona Institute of Science and Technology (BIST), Av. Països Catalans 16, 43007, Tarragona, Spain.
| |
Collapse
|
41
|
Janet JP, Duan C, Yang T, Nandy A, Kulik HJ. A quantitative uncertainty metric controls error in neural network-driven chemical discovery. Chem Sci 2019; 10:7913-7922. [PMID: 31588334 PMCID: PMC6764470 DOI: 10.1039/c9sc02298h] [Citation(s) in RCA: 81] [Impact Index Per Article: 16.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2019] [Accepted: 07/11/2019] [Indexed: 12/14/2022] Open
Abstract
Machine learning (ML) models, such as artificial neural networks, have emerged as a complement to high-throughput screening, enabling characterization of new compounds in seconds instead of hours. The promise of ML models to enable large-scale chemical space exploration can only be realized if it is straightforward to identify when molecules and materials are outside the model's domain of applicability. Established uncertainty metrics for neural network models are either costly to obtain (e.g., ensemble models) or rely on feature engineering (e.g., feature space distances), and each has limitations in estimating prediction errors for chemical space exploration. We introduce the distance to available data in the latent space of a neural network ML model as a low-cost, quantitative uncertainty metric that works for both inorganic and organic chemistry. The calibrated performance of this approach exceeds widely used uncertainty metrics and is readily applied to models of increasing complexity at no additional cost. Tightening latent distance cutoffs systematically drives down predicted model errors below training errors, thus enabling predictive error control in chemical discovery or identification of useful data points for active learning.
Collapse
Affiliation(s)
- Jon Paul Janet
- Department of Chemical Engineering , Massachusetts Institute of Technology , Cambridge , MA 02139 , USA . ; Tel: +1-617-253-4584
| | - Chenru Duan
- Department of Chemical Engineering , Massachusetts Institute of Technology , Cambridge , MA 02139 , USA . ; Tel: +1-617-253-4584
- Department of Chemistry , Massachusetts Institute of Technology , Cambridge , MA 02139 , USA
| | - Tzuhsiung Yang
- Department of Chemical Engineering , Massachusetts Institute of Technology , Cambridge , MA 02139 , USA . ; Tel: +1-617-253-4584
| | - Aditya Nandy
- Department of Chemical Engineering , Massachusetts Institute of Technology , Cambridge , MA 02139 , USA . ; Tel: +1-617-253-4584
- Department of Chemistry , Massachusetts Institute of Technology , Cambridge , MA 02139 , USA
| | - Heather J Kulik
- Department of Chemical Engineering , Massachusetts Institute of Technology , Cambridge , MA 02139 , USA . ; Tel: +1-617-253-4584
| |
Collapse
|
42
|
Bauer CA, Schneider G, Göller AH. Machine learning models for hydrogen bond donor and acceptor strengths using large and diverse training data generated by first-principles interaction free energies. J Cheminform 2019; 11:59. [PMID: 33430967 PMCID: PMC6737620 DOI: 10.1186/s13321-019-0381-4] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2019] [Accepted: 08/10/2019] [Indexed: 02/06/2023] Open
Abstract
We present machine learning (ML) models for hydrogen bond acceptor (HBA) and hydrogen bond donor (HBD) strengths. Quantum chemical (QC) free energies in solution for 1:1 hydrogen-bonded complex formation to the reference molecules 4-fluorophenol and acetone serve as our target values. Our acceptor and donor databases are the largest on record with 4426 and 1036 data points, respectively. After scanning over radial atomic descriptors and ML methods, our final trained HBA and HBD ML models achieve RMSEs of 3.8 kJ mol-1 (acceptors), and 2.3 kJ mol-1 (donors) on experimental test sets, respectively. This performance is comparable with previous models that are trained on experimental hydrogen bonding free energies, indicating that molecular QC data can serve as substitute for experiment. The potential ramifications thereof could lead to a full replacement of wetlab chemistry for HBA/HBD strength determination by QC. As a possible chemical application of our ML models, we highlight our predicted HBA and HBD strengths as possible descriptors in two case studies on trends in intramolecular hydrogen bonding.
Collapse
Affiliation(s)
- Christoph A Bauer
- Department of Chemistry and Applied Biosciences, Swiss Federal Institute of Technology (ETH), 8093, Zurich, Switzerland
| | - Gisbert Schneider
- Department of Chemistry and Applied Biosciences, Swiss Federal Institute of Technology (ETH), 8093, Zurich, Switzerland.
| | | |
Collapse
|
43
|
Zhang Y, Hu C, Jiang B. Embedded Atom Neural Network Potentials: Efficient and Accurate Machine Learning with a Physically Inspired Representation. J Phys Chem Lett 2019; 10:4962-4967. [PMID: 31397157 DOI: 10.1021/acs.jpclett.9b02037] [Citation(s) in RCA: 115] [Impact Index Per Article: 23.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/24/2023]
Abstract
We propose a simple, but efficient and accurate, machine learning (ML) model for developing a high-dimensional potential energy surface. This so-called embedded atom neural network (EANN) approach is inspired by the well-known empirical embedded atom method (EAM) model used in the condensed phase. It simply replaces the scalar embedded atom density in EAM with a Gaussian-type orbital based density vector and represents the complex relationship between the embedded density vector and atomic energy by neural networks. We demonstrate that the EANN approach is equally accurate as several established ML models in representing both big molecular and extended periodic systems, yet with much fewer parameters and configurations. It is highly efficient as it implicitly contains the three-body information without an explicit sum of the conventional costly angular descriptors. With high accuracy and efficiency, EANN potentials can vastly accelerate molecular dynamics and spectroscopic simulations in complex systems at ab initio level.
Collapse
Affiliation(s)
- Yaolong Zhang
- Hefei National Laboratory for Physical Science at the Microscale, Department of Chemical Physics, Key Laboratory of Surface and Interface Chemistry and Energy Catalysis of Anhui Higher Education Institutes, University of Science and Technology of China, Hefei, Anhui 230026, China
| | - Ce Hu
- Hefei National Laboratory for Physical Science at the Microscale, Department of Chemical Physics, Key Laboratory of Surface and Interface Chemistry and Energy Catalysis of Anhui Higher Education Institutes, University of Science and Technology of China, Hefei, Anhui 230026, China
| | - Bin Jiang
- Hefei National Laboratory for Physical Science at the Microscale, Department of Chemical Physics, Key Laboratory of Surface and Interface Chemistry and Energy Catalysis of Anhui Higher Education Institutes, University of Science and Technology of China, Hefei, Anhui 230026, China
| |
Collapse
|
44
|
Herr JE, Koh K, Yao K, Parkhill J. Compressing physics with an autoencoder: Creating an atomic species representation to improve machine learning models in the chemical sciences. J Chem Phys 2019; 151:084103. [DOI: 10.1063/1.5108803] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open
Affiliation(s)
- John E. Herr
- Department of Chemistry and Biochemistry, The University of Notre Dame du Lac, 251 Nieuwland Science Hall, Notre Dame, Indiana 46556, USA
| | - Kevin Koh
- Department of Chemistry and Biochemistry, The University of Notre Dame du Lac, 251 Nieuwland Science Hall, Notre Dame, Indiana 46556, USA
| | - Kun Yao
- Department of Chemistry and Biochemistry, The University of Notre Dame du Lac, 251 Nieuwland Science Hall, Notre Dame, Indiana 46556, USA
| | - John Parkhill
- Department of Chemistry and Biochemistry, The University of Notre Dame du Lac, 251 Nieuwland Science Hall, Notre Dame, Indiana 46556, USA
| |
Collapse
|
45
|
Nakai H, Seino J, Nakamura K. Bond Energy Density Analysis Combined with Informatics Technique. J Phys Chem A 2019; 123:7777-7784. [DOI: 10.1021/acs.jpca.9b04030] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Affiliation(s)
- Hiromi Nakai
- Department of Chemistry and Biochemistry, School of Advanced Science and Engineering, Waseda University, Tokyo 169-8555, Japan
- Waseda Research Institute for Science and Engineering, Waseda University, Tokyo 169-8555, Japan
- ESICB, Kyoto University, Kyotodaigaku-Katsura, Nishigyoku, Kyoto 615-8520, Japan
| | - Junji Seino
- Waseda Research Institute for Science and Engineering, Waseda University, Tokyo 169-8555, Japan
- PRESTO, Japan Science and Technology Agency, 4-1-8 Honcho, Kawaguchi, Saitama 332-0012, Japan
| | - Kairi Nakamura
- Department of Chemistry and Biochemistry, School of Advanced Science and Engineering, Waseda University, Tokyo 169-8555, Japan
| |
Collapse
|
46
|
Lee SJR, Ding F, Manby FR, Miller TF. Analytical gradients for projection-based wavefunction-in-DFT embedding. J Chem Phys 2019. [DOI: 10.1063/1.5109882] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022] Open
Affiliation(s)
- Sebastian J. R. Lee
- Division of Chemistry and Chemical Engineering, California Institute of Technology, Pasadena, California 91125, USA
| | - Feizhi Ding
- Division of Chemistry and Chemical Engineering, California Institute of Technology, Pasadena, California 91125, USA
| | - Frederick R. Manby
- Centre for Computational Chemistry, School of Chemistry, University of Bristol, Bristol BS8 1TS, United Kingdom
| | - Thomas F. Miller
- Division of Chemistry and Chemical Engineering, California Institute of Technology, Pasadena, California 91125, USA
| |
Collapse
|
47
|
Abstract
Machine learning enables computers to address problems by learning from data. Deep learning is a type of machine learning that uses a hierarchical recombination of features to extract pertinent information and then learn the patterns represented in the data. Over the last eight years, its abilities have increasingly been applied to a wide variety of chemical challenges, from improving computational chemistry to drug and materials design and even synthesis planning. This review aims to explain the concepts of deep learning to chemists from any background and follows this with an overview of the diverse applications demonstrated in the literature. We hope that this will empower the broader chemical community to engage with this burgeoning field and foster the growing movement of deep learning accelerated chemistry.
Collapse
Affiliation(s)
- Adam C Mater
- ARC Centre of Excellence for Electromaterials Science, Research School of Chemistry , Australian National University , Canberra , Australian Capital Territory 2601 , Australia
| | - Michelle L Coote
- ARC Centre of Excellence for Electromaterials Science, Research School of Chemistry , Australian National University , Canberra , Australian Capital Territory 2601 , Australia
| |
Collapse
|
48
|
A neural network protocol for electronic excitations of N-methylacetamide. Proc Natl Acad Sci U S A 2019; 116:11612-11617. [PMID: 31147467 DOI: 10.1073/pnas.1821044116] [Citation(s) in RCA: 40] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
UV absorption is widely used for characterizing proteins structures. The mapping of UV spectra to atomic structure of proteins relies on expensive theoretical simulations, circumventing the heavy computational cost which involves repeated quantum-mechanical simulations of excited-state properties of many fluctuating protein geometries, which has been a long-time challenge. Here we show that a neural network machine-learning technique can predict electronic absorption spectra of N-methylacetamide (NMA), which is a widely used model system for the peptide bond. Using ground-state geometric parameters and charge information as descriptors, we employed a neural network to predict transition energies, ground-state, and transition dipole moments of many molecular-dynamics conformations at different temperatures, in agreement with time-dependent density-functional theory calculations. The neural network simulations are nearly 3,000× faster than comparable quantum calculations. Machine learning should provide a cost-effective tool for simulating optical properties of proteins.
Collapse
|
49
|
Abstract
Complex chemical systems present challenges to electronic structure theory stemming from large system sizes, subtle interactions, coupled dynamical time scales, and electronically nonadiabatic effects. New methods are needed to perform reliable, rigorous, and affordable electronic structure calculations for simulating the properties and dynamics of such systems. This Account reviews projection-based quantum embedding for electronic structure, which provides a formally exact method for density functional theory (DFT) embedding. The method also provides a rigorous and accurate approach for describing a small part of a chemical system at the level of a correlated wavefunction (WF) method while the remainder of the system is described at the level of DFT. A key advantage of projection-based embedding is that it can be formulated in terms of an extremely simple level-shift projection operator, which eliminates the need for any optimized effective potential calculation or kinetic energy functional approximation while simultaneously ensuring that no extra programming is needed to perform WF-in-DFT embedding with an arbitrary WF method. The current work presents the theoretical underpinnings of projection-based embedding, describes use of the method for combining wavefunction and density functional theories, and discusses technical refinements that have improved the applicability and robustness of the method. Applications of projection-based WF-in-DFT embedding are also reviewed, with particular focus on recent work on transition-metal catalysis, enzyme reactivity, and battery electrolyte decomposition. In particular, we review the application of projection-based embedding for the prediction of electrochemical potentials and reaction pathways in a Co-centered hydrogen evolution catalyst. Projection-based WF-in-DFT calculations are shown to provide quantitative accuracy while greatly reducing the computational cost compared with a reference coupled cluster calculation on the full system. Additionally, projection-based WF-in-DFT embedding is used to study the mechanism of citrate synthase; it is shown that projection-based WF-in-DFT largely eliminates the sensitivity of the potential energy landscape to the employed DFT exchange-correlation functional. Finally, we demonstrate the use of projection-based WF-in-DFT to study electron transfer reactions associated with battery electrolyte decomposition. Projection-based WF-in-DFT embedding is used to calculate the oxidation potentials of neat ethylene carbonate (EC), neat dimethyl carbonate (DMC), and 1:1 mixtures of EC and DMC in order to overcome qualitative inaccuracies in the electron densities and ionization energies obtained from conventional DFT methods. By further embedding the WF-in-DFT description in a molecular mechanics point-charge environment, this work enables an explicit description of the solvent and ensemble averaging of the solvent configurations. Looking forward, we anticipate continued refinement of the projection-based embedding methodology as well as its increasingly widespread application in diverse areas of chemistry, biology, and materials science.
Collapse
Affiliation(s)
- Sebastian J. R. Lee
- Division of Chemistry and Chemical Engineering, California Institute of Technology, Pasadena, California 91125, United States
| | - Matthew Welborn
- Division of Chemistry and Chemical Engineering, California Institute of Technology, Pasadena, California 91125, United States
| | - Frederick R. Manby
- Centre for Computational Chemistry, School of Chemistry, University of Bristol, Bristol BS8 1TS, United Kingdom
| | - Thomas F. Miller
- Division of Chemistry and Chemical Engineering, California Institute of Technology, Pasadena, California 91125, United States
| |
Collapse
|
50
|
Sawatlon B, Wodrich MD, Meyer B, Fabrizio A, Corminboeuf C. Data Mining the C−C Cross‐Coupling Genome. ChemCatChem 2019. [DOI: 10.1002/cctc.201900597] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Affiliation(s)
- Boodsarin Sawatlon
- Laboratory for Computational Molecular Design Institute of Chemical Sciences and EngineeringEcole Polytechnique Fédérale de Lausanne (EPFL) 1015 Lausanne Switzerland
| | - Matthew D. Wodrich
- Laboratory for Computational Molecular Design Institute of Chemical Sciences and EngineeringEcole Polytechnique Fédérale de Lausanne (EPFL) 1015 Lausanne Switzerland
| | - Benjamin Meyer
- Laboratory for Computational Molecular Design Institute of Chemical Sciences and EngineeringEcole Polytechnique Fédérale de Lausanne (EPFL) 1015 Lausanne Switzerland
- National Centre for Computational Design and Discovery of Novel Materials (MARVEL)Ecole Polytechnique Fédérale de Lausanne (EPFL) 1015 Lausanne Switzerland
| | - Alberto Fabrizio
- Laboratory for Computational Molecular Design Institute of Chemical Sciences and EngineeringEcole Polytechnique Fédérale de Lausanne (EPFL) 1015 Lausanne Switzerland
- National Centre for Computational Design and Discovery of Novel Materials (MARVEL)Ecole Polytechnique Fédérale de Lausanne (EPFL) 1015 Lausanne Switzerland
| | - Clémence Corminboeuf
- Laboratory for Computational Molecular Design Institute of Chemical Sciences and EngineeringEcole Polytechnique Fédérale de Lausanne (EPFL) 1015 Lausanne Switzerland
- National Centre for Computational Design and Discovery of Novel Materials (MARVEL)Ecole Polytechnique Fédérale de Lausanne (EPFL) 1015 Lausanne Switzerland
| |
Collapse
|