1
|
Sahre MJ, von Rudorff GF, Marquetand P, von Lilienfeld OA. Transferability of atomic energies from alchemical decomposition. J Chem Phys 2024; 160:054106. [PMID: 38341696 DOI: 10.1063/5.0187298] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2023] [Accepted: 01/09/2024] [Indexed: 02/13/2024] Open
Abstract
We study alchemical atomic energy partitioning as a method to estimate atomization energies from atomic contributions, which are defined in physically rigorous and general ways through the use of the uniform electron gas as a joint reference. We analyze quantitatively the relation between atomic energies and their local environment using a dataset of 1325 organic molecules. The atomic energies are transferable across various molecules, enabling the prediction of atomization energies with a mean absolute error of 23 kcal/mol, comparable to simple statistical estimates but potentially more robust given their grounding in the physics-based decomposition scheme. A comparative analysis with other decomposition methods highlights its sensitivity to electrostatic variations, underlining its potential as a representation of the environment as well as in studying processes like diffusion in solids characterized by significant electrostatic shifts.
Collapse
Affiliation(s)
- Michael J Sahre
- Vienna Doctoral School in Chemistry (DoSChem) and Institute of Theoretical Chemistry and Faculty of Physics, University of Vienna, 1090 Vienna, Austria
| | - Guido Falk von Rudorff
- Department of Chemistry, University Kassel, Heinrich-Plett-Str.40, 34132 Kassel, Germany
- Center for Interdisciplinary Nanostructure Science and Technology (CINSaT), Heinrich-Plett-Straße 40, 34132 Kassel, Germany
| | - Philipp Marquetand
- Faculty of Chemistry, Institute of Theoretical Chemistry, University of Vienna, Währinger Str. 17, 1090 Vienna, Austria
| | - O Anatole von Lilienfeld
- Vienna Doctoral School in Chemistry (DoSChem) and Institute of Theoretical Chemistry and Faculty of Physics, University of Vienna, 1090 Vienna, Austria
- Chemical Physics Theory Group, Department of Chemistry, University of Toronto, St. George Campus, Toronto, M5S 3H6 Ontario, Canada
- Department of Materials Science and Engineering, University of Toronto, St. George Campus, Toronto, M5S 3E4 Ontario, Canada
- Vector Institute for Artificial Intelligence, Toronto, M5S 1M1 Ontario, Canada
- ML Group, Technische Universität Berlin and Institute for the Foundations of Learning and Data, 10587 Berlin, Germany
- Berlin Institute for the Foundations of Learning and Data, 10587 Berlin, Germany
- Department of Physics, University of Toronto, St. George Campus, Toronto, M5S 1A7 Ontario, Canada
| |
Collapse
|
2
|
Ng WP, Liang Q, Yang J. Low-Data Deep Quantum Chemical Learning for Accurate MP2 and Coupled-Cluster Correlations. J Chem Theory Comput 2023; 19:5439-5449. [PMID: 37506400 DOI: 10.1021/acs.jctc.3c00518] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/30/2023]
Abstract
Accurate ab initio prediction of electronic energies is very expensive for macromolecules by explicitly solving post-Hartree-Fock equations. We here exploit the physically justified local correlation feature in a compact basis of small molecules and construct an expressive low-data deep neural network (dNN) model to obtain machine-learned electron correlation energies on par with MP2 and CCSD levels of theory for more complex molecules and different datasets that are not represented in the training set. We show that our dNN-powered model is data efficient and makes highly transferable predictions across alkanes of various lengths, organic molecules with non-covalent and biomolecular interactions, as well as water clusters of different sizes and morphologies. In particular, by training 800 (H2O)8 clusters with the local correlation descriptors, accurate MP2/cc-pVTZ correlation energies up to (H2O)128 can be predicted with a small random error within chemical accuracy from exact values, while a majority of prediction deviations are attributed to an intrinsically systematic error. Our results reveal that an extremely compact local correlation feature set, which is poor for any direct post-Hartree-Fock calculations, has however a prominent advantage in reserving important electron correlation patterns for making accurate transferable predictions across distinct molecular compositions, bond types, and geometries.
Collapse
Affiliation(s)
- Wai-Pan Ng
- Department of Chemistry, The University of Hong Kong, Hong Kong 999077, P. R. China
- Hong Kong Quantum AI Lab Limited, Hong Kong 999077, P. R. China
| | - Qiujiang Liang
- Department of Chemistry, The University of Hong Kong, Hong Kong 999077, P. R. China
| | - Jun Yang
- Department of Chemistry, The University of Hong Kong, Hong Kong 999077, P. R. China
- Hong Kong Quantum AI Lab Limited, Hong Kong 999077, P. R. China
| |
Collapse
|
3
|
Kjeldal FØ, Eriksen JJ. Decomposing Chemical Space: Applications to the Machine Learning of Atomic Energies. J Chem Theory Comput 2023; 19:2029-2038. [PMID: 36926874 DOI: 10.1021/acs.jctc.2c01290] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/18/2023]
Abstract
We apply a number of atomic decomposition schemes across the standard QM7 data set─a small model set of organic molecules at equilibrium geometry─to inspect the possible emergence of trends among contributions to atomization energies from distinct elements embedded within molecules. Specifically, a recent decomposition scheme of ours based on spatially localized molecular orbitals is compared to alternatives that instead partition molecular energies on account of which nuclei individual atomic orbitals are centered on. We find these partitioning schemes to expose the composition of chemical compound space in very dissimilar ways in terms of the grouping, binning, and heterogeneity of discrete atomic contributions, e.g., those associated with hydrogens bonded to different heavy atoms. Furthermore, unphysical dependencies on the one-electron basis set are found for some, but not all of these schemes. The relevance and importance of these compositional factors for training tailored neural network models based on atomic energies are next assessed. We identify both limitations and possible advantages with respect to contemporary machine learning models and discuss the design of potential counterparts based on atoms and the intrinsic energies of these as the principal decomposition units.
Collapse
Affiliation(s)
- Frederik Ø Kjeldal
- DTU Chemistry, Technical University of Denmark Kemitorvet Building 206, 2800 Kongens Lyngby, Denmark
| | - Janus J Eriksen
- DTU Chemistry, Technical University of Denmark Kemitorvet Building 206, 2800 Kongens Lyngby, Denmark
| |
Collapse
|
4
|
Long Z, Tuckerman ME. Hydroxide Diffusion in Functionalized Cylindrical Nanopores as Idealized Models of Anion Exchange Membrane Environments: An Ab Initio Molecular Dynamics Study. THE JOURNAL OF PHYSICAL CHEMISTRY. C, NANOMATERIALS AND INTERFACES 2023; 127:2792-2804. [PMID: 36968146 PMCID: PMC10034739 DOI: 10.1021/acs.jpcc.2c05747] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/11/2022] [Revised: 12/28/2022] [Indexed: 06/18/2023]
Abstract
Anion exchange membranes (AEMs) have attracted significant interest for their applications in fuel cells and other electrochemical devices in recent years. Understanding water distributions and hydroxide transport mechanisms within AEMs is critical to improving their performance as concerns hydroxide conductivity. Recently, nanoconfined environments have been used to mimic AEM environments. Following this approach, we construct nanoconfined cylindrical pore structures using graphane nanotubes (GNs) functionalized with trimethylammonium cations as models of local AEM morphology. These structures were then used to investigate hydroxide transport using ab initio molecular dynamics (AIMD). The simulations showed that hydroxide transport is suppressed in these confined environments relative to the bulk solution although the mechanism is dominated by structural diffusion. One factor causing the suppressed hydroxide transport is the reduced proton transfer (PT) rates due to changes in hydroxide and water solvation patterns under confinement compared to bulk solution as well as strong interactions between hydroxide ions and the tethered cation groups.
Collapse
Affiliation(s)
- Zhuoran Long
- Department
of Chemistry, New York University, New York, New York10003, United States
| | - Mark E. Tuckerman
- Department
of Chemistry, New York University, New York, New York10003, United States
- Courant
Institute of Mathematical Science, New York
University, New York, New York10012, United States
- NYU-ECNU
Center for Computational Chemistry at NYU Shanghai, 3663 Zhongshan Road North, Shanghai200062, China
| |
Collapse
|
5
|
Käser S, Vazquez-Salazar LI, Meuwly M, Töpfer K. Neural network potentials for chemistry: concepts, applications and prospects. DIGITAL DISCOVERY 2023; 2:28-58. [PMID: 36798879 PMCID: PMC9923808 DOI: 10.1039/d2dd00102k] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/23/2022] [Accepted: 12/20/2022] [Indexed: 12/24/2022]
Abstract
Artificial Neural Networks (NN) are already heavily involved in methods and applications for frequent tasks in the field of computational chemistry such as representation of potential energy surfaces (PES) and spectroscopic predictions. This perspective provides an overview of the foundations of neural network-based full-dimensional potential energy surfaces, their architectures, underlying concepts, their representation and applications to chemical systems. Methods for data generation and training procedures for PES construction are discussed and means for error assessment and refinement through transfer learning are presented. A selection of recent results illustrates the latest improvements regarding accuracy of PES representations and system size limitations in dynamics simulations, but also NN application enabling direct prediction of physical results without dynamics simulations. The aim is to provide an overview for the current state-of-the-art NN approaches in computational chemistry and also to point out the current challenges in enhancing reliability and applicability of NN methods on a larger scale.
Collapse
Affiliation(s)
- Silvan Käser
- Department of Chemistry, University of Basel Klingelbergstrasse 80 CH-4056 Basel Switzerland
| | | | - Markus Meuwly
- Department of Chemistry, University of Basel Klingelbergstrasse 80 CH-4056 Basel Switzerland
| | - Kai Töpfer
- Department of Chemistry, University of Basel Klingelbergstrasse 80 CH-4056 Basel Switzerland
| |
Collapse
|
6
|
Kondratyev V, Dryzhakov M, Gimadiev T, Slutskiy D. Generative model based on junction tree variational autoencoder for HOMO value prediction and molecular optimization. J Cheminform 2023; 15:11. [PMID: 36732800 PMCID: PMC9893566 DOI: 10.1186/s13321-023-00681-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2022] [Accepted: 01/06/2023] [Indexed: 02/04/2023] Open
Abstract
In this work, we provide further development of the junction tree variational autoencoder (JT VAE) architecture in terms of implementation and application of the internal feature space of the model. Pretraining of JT VAE on a large dataset and further optimization with a regression model led to a latent space that can solve several tasks simultaneously: prediction, generation, and optimization. We use the ZINC database as a source of molecules for the JT VAE pretraining and the QM9 dataset with its HOMO values to show the application case. We evaluate our model on multiple tasks such as property (value) prediction, generation of new molecules with predefined properties, and structure modification toward the property. Across these tasks, our model shows improvements in generation and optimization tasks while preserving the precision of state-of-the-art models.
Collapse
Affiliation(s)
- Vladimir Kondratyev
- Computer Science and Artificial Intelligence Laboratory, ENGIE Lab CRIGEN, 4 rue Josephine Baker, 93240 Stains, France ,grid.89485.380000 0004 0600 5611Telecom Paris, 19 Place Marguerite Perey, CS 20031, 91123 Palaiseau, France
| | - Marian Dryzhakov
- Computer Science and Artificial Intelligence Laboratory, ENGIE Lab CRIGEN, 4 rue Josephine Baker, 93240 Stains, France
| | - Timur Gimadiev
- grid.77268.3c0000 0004 0543 9688Laboratory of Chemoinformatics and Molecular Modeling, Butlerov Institute of Chemistry, Kazan Federal University, 18 Kremlyovskaya str., 420008 Kazan, Russia ,grid.465285.80000 0004 0637 9007Federal Research Center “Kazan Scientific Center of Russian Academy of Sciences”, 420008 Kazan, Russia ,JSC “BIOCAD”, Petrodvortsoviy District, Strelna, Svyazi St., Bld. 34, Liter A., 198515 St. Petersburg, Russia
| | - Dmitriy Slutskiy
- Computer Science and Artificial Intelligence Laboratory, ENGIE Lab CRIGEN, 4 rue Josephine Baker, 93240 Stains, France
| |
Collapse
|
7
|
Vazquez-Salazar LI, Boittier ED, Meuwly M. Uncertainty quantification for predictions of atomistic neural networks. Chem Sci 2022; 13:13068-13084. [PMID: 36425481 PMCID: PMC9667919 DOI: 10.1039/d2sc04056e] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2022] [Accepted: 10/16/2022] [Indexed: 12/31/2023] Open
Abstract
The value of uncertainty quantification on predictions for trained neural networks (NNs) on quantum chemical reference data is quantitatively explored. For this, the architecture of the PhysNet NN was suitably modified and the resulting model (PhysNet-DER) was evaluated with different metrics to quantify its calibration, the quality of its predictions, and whether prediction error and the predicted uncertainty can be correlated. Training on the QM9 database and evaluating data in the test set within and outside the distribution indicate that error and uncertainty are not linearly related. However, the observed variance provides insight into the quality of the data used for training. Additionally, the influence of the chemical space covered by the training data set was studied by using a biased database. The results clarify that noise and redundancy complicate property prediction for molecules even in cases for which changes - such as double bond migration in two otherwise identical molecules - are small. The model was also applied to a real database of tautomerization reactions. Analysis of the distance between members in feature space in combination with other parameters shows that redundant information in the training dataset can lead to large variances and small errors whereas the presence of similar but unspecific information returns large errors but small variances. This was, e.g., observed for nitro-containing aliphatic chains for which predictions were difficult although the training set contained several examples for nitro groups bound to aromatic molecules. The finding underlines the importance of the composition of the training data and provides chemical insight into how this affects the prediction capabilities of a ML model. Finally, the presented method can be used for information-based improvement of chemical databases for target applications through active learning optimization.
Collapse
Affiliation(s)
| | - Eric D Boittier
- Department of Chemistry, University of Basel Basel Switzerland
| | - Markus Meuwly
- Department of Chemistry, University of Basel Basel Switzerland
- Department of Chemistry, Brown University USA
| |
Collapse
|
8
|
Bull-Vulpe EF, Riera M, Bore SL, Paesani F. Data-Driven Many-Body Potential Energy Functions for Generic Molecules: Linear Alkanes as a Proof-of-Concept Application. J Chem Theory Comput 2022. [PMID: 36113028 DOI: 10.1021/acs.jctc.2c00645] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
We present a generalization of the many-body energy (MB-nrg) theoretical/computational framework that enables the development of data-driven potential energy functions (PEFs) for generic covalently bonded molecules, with arbitrary quantum mechanical accuracy. The "nearsightedness of electronic matter" is exploited to define monomers as "natural building blocks" on the basis of their distinct chemical identity. The energy of generic molecules is then expressed as a sum of individual many-body energies of incrementally larger subsystems. The MB-nrg PEFs represent the low-order n-body energies, with n = 1-4, using permutationally invariant polynomials derived from electronic structure data carried out at an arbitrary quantum mechanical level of theory, while all higher-order n-body terms (n > 4) are represented by a classical many-body polarization term. As a proof-of-concept application of the general MB-nrg framework, we present MB-nrg PEFs for linear alkanes. The MB-nrg PEFs are shown to accurately reproduce reference energies, harmonic frequencies, and potential energy scans of alkanes, independently of their length. Since, by construction, the MB-nrg framework introduced here can be applied to generic covalently bonded molecules, we envision future computer simulations of complex molecular systems using data-driven MB-nrg PEFs, with arbitrary quantum mechanical accuracy.
Collapse
Affiliation(s)
- Ethan F. Bull-Vulpe
- Department of Chemistry and Biochemistry, University of California San Diego, La Jolla, California 92093, United States
| | - Marc Riera
- Department of Chemistry and Biochemistry, University of California San Diego, La Jolla, California 92093, United States
| | - Sigbjørn L. Bore
- Department of Chemistry and Biochemistry, University of California San Diego, La Jolla, California 92093, United States
| | - Francesco Paesani
- Department of Chemistry and Biochemistry, University of California San Diego, La Jolla, California 92093, United States
- Materials Science and Engineering, University of California San Diego, La Jolla, California 92093, United States
- San Diego Supercomputer Center, University of California San Diego, La Jolla, California 92093, United States
| |
Collapse
|
9
|
Using Deep 1D Convolutional Grated Recurrent Unit Neural Network to Optimize Quantum Molecular Properties and Predict Intramolecular Coupling Constants of Molecules of Potential Health Medications and Other Generic Molecules. APPLIED SCIENCES-BASEL 2022. [DOI: 10.3390/app12147228] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
A molecule is the smallest particle in a chemical element or compound that possesses the element or compound’s chemical characteristics. There are numerous challenges associated with the development of molecular simulations of fluid characteristics for industrial purposes. Fluid characteristics for industrial purposes find applications in the development of various liquid household products, such as liquid detergents, drinks, beverages, and liquid health medications, amongst others. Predicting the molecular properties of liquid pharmaceuticals or therapies to address health concerns is one of the greatest difficulties in drug development. Computational tools for precise prediction can help speed up and lower the cost of identifying new medications. A one-dimensional deep convolutional gated recurrent neural network (1D-CNN-GRU) was used in this study to offer a novel forecasting model for molecular property prediction of liquids or fluids. The signal data from molecular properties were pre-processed and normalized. A 1D convolutional neural network (1D-CNN) was then built to extract the characteristics of the normalized molecular property of the sequence data. Furthermore, gated recurrent unit (GRU) layers processed the extracted features to extract temporal features. The output features were then passed through several fully-connected layers for final prediction. For both training and validation, we used molecular properties obtained from the Kaggle database. The proposed method achieved a better prediction accuracy, with values of 0.0230, 0.1517, and 0.0693, respectively, in terms of the mean squared error (MSE), root mean square error (RMSE), and mean absolute error (MAE).
Collapse
|
10
|
Töpfer K, Käser S, Meuwly M. Double proton transfer in hydrated formic acid dimer: Interplay of spatial symmetry and solvent-generated force on reactivity. Phys Chem Chem Phys 2022; 24:13869-13882. [PMID: 35620978 PMCID: PMC9176184 DOI: 10.1039/d2cp01583h] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
The double proton transfer (DPT) reaction in the hydrated formic acid dimer (FAD) is investigated at molecular-level detail. For this, a global and reactive machine learned (ML) potential energy surface (PES) is developed to run extensive (more than 100 ns) mixed ML/MM molecular dynamics (MD) simulations in explicit molecular mechanics (MM) solvent at MP2-quality for the solute. Simulations with fixed – as in a conventional empirical force field – and conformationally fluctuating – as available from the ML-based PES – charge models for FAD show a significant impact on the competition between DPT and dissociation of FAD into two formic acid monomers. With increasing temperature the barrier height for DPT in solution changes by about 10% (∼1 kcal mol−1) between 300 K and 600 K. The rate for DPT is largest, ∼1 ns−1, at 350 K and decreases for higher temperatures due to destabilisation and increased probability for dissociation of FAD. The water solvent is found to promote the first proton transfer by exerting a favourable solvent-induced Coulomb force along the O–H⋯O hydrogen bond whereas the second proton transfer is significantly controlled by the O–O separation and other conformational degrees of freedom. Double proton transfer in hydrated FAD is found to involve a subtle interplay and balance between structural and electrostatic factors. Simulation of double proton transfer in formic acid dimer by reactive ML potential in explicit molecular mechanics water solvent.![]()
Collapse
Affiliation(s)
- Kai Töpfer
- Department of Chemistry, University of Basel, Klingelbergstrasse 80, CH-4056 Basel, Switzerland.
| | - Silvan Käser
- Department of Chemistry, University of Basel, Klingelbergstrasse 80, CH-4056 Basel, Switzerland.
| | - Markus Meuwly
- Department of Chemistry, University of Basel, Klingelbergstrasse 80, CH-4056 Basel, Switzerland.
| |
Collapse
|
11
|
Fabregat R, Fabrizio A, Engel EA, Meyer B, Juraskova V, Ceriotti M, Corminboeuf C. Local Kernel Regression and Neural Network Approaches to the Conformational Landscapes of Oligopeptides. J Chem Theory Comput 2022; 18:1467-1479. [PMID: 35179897 PMCID: PMC8908737 DOI: 10.1021/acs.jctc.1c00813] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2021] [Indexed: 11/30/2022]
Abstract
The application of machine learning to theoretical chemistry has made it possible to combine the accuracy of quantum chemical energetics with the thorough sampling of finite-temperature fluctuations. To reach this goal, a diverse set of methods has been proposed, ranging from simple linear models to kernel regression and highly nonlinear neural networks. Here we apply two widely different approaches to the same, challenging problem: the sampling of the conformational landscape of polypeptides at finite temperature. We develop a local kernel regression (LKR) coupled with a supervised sparsity method and compare it with a more established approach based on Behler-Parrinello type neural networks. In the context of the LKR, we discuss how the supervised selection of the reference pool of environments is crucial to achieve accurate potential energy surfaces at a competitive computational cost and leverage the locality of the model to infer which chemical environments are poorly described by the DFTB baseline. We then discuss the relative merits of the two frameworks and perform Hamiltonian-reservoir replica-exchange Monte Carlo sampling and metadynamics simulations, respectively, to demonstrate that both frameworks can achieve converged and transferable sampling of the conformational landscape of complex and flexible biomolecules with comparable accuracy and computational cost.
Collapse
Affiliation(s)
- Raimon Fabregat
- Laboratory for Computational
Molecular Design, Institute of Chemical
Sciences and Engineering, National Centre for Computational Design and Discovery
of Novel Materials (MARVEL), École
Polytechnique Fédérale de Lausanne, CH-1015 Lausanne, Switzerland
| | - Alberto Fabrizio
- Laboratory for Computational
Molecular Design, Institute of Chemical
Sciences and Engineering, National Centre for Computational Design and Discovery
of Novel Materials (MARVEL), École
Polytechnique Fédérale de Lausanne, CH-1015 Lausanne, Switzerland
| | - Edgar A. Engel
- Laboratory for Computational
Molecular Design, Institute of Chemical
Sciences and Engineering, National Centre for Computational Design and Discovery
of Novel Materials (MARVEL), École
Polytechnique Fédérale de Lausanne, CH-1015 Lausanne, Switzerland
- Laboratory
of Computational Science and Modeling, IMX,
École Polytechnique Fédérale de Lausanne, 1015 Lausanne, Switzerland
| | - Benjamin Meyer
- Laboratory for Computational
Molecular Design, Institute of Chemical
Sciences and Engineering, National Centre for Computational Design and Discovery
of Novel Materials (MARVEL), École
Polytechnique Fédérale de Lausanne, CH-1015 Lausanne, Switzerland
| | - Veronika Juraskova
- Laboratory for Computational
Molecular Design, Institute of Chemical
Sciences and Engineering, National Centre for Computational Design and Discovery
of Novel Materials (MARVEL), École
Polytechnique Fédérale de Lausanne, CH-1015 Lausanne, Switzerland
| | - Michele Ceriotti
- Laboratory for Computational
Molecular Design, Institute of Chemical
Sciences and Engineering, National Centre for Computational Design and Discovery
of Novel Materials (MARVEL), École
Polytechnique Fédérale de Lausanne, CH-1015 Lausanne, Switzerland
- Laboratory
of Computational Science and Modeling, IMX,
École Polytechnique Fédérale de Lausanne, 1015 Lausanne, Switzerland
| | - Clemence Corminboeuf
- Laboratory for Computational
Molecular Design, Institute of Chemical
Sciences and Engineering, National Centre for Computational Design and Discovery
of Novel Materials (MARVEL), École
Polytechnique Fédérale de Lausanne, CH-1015 Lausanne, Switzerland
| |
Collapse
|
12
|
Gokcan H, Isayev O. Learning molecular potentials with neural networks. WIRES COMPUTATIONAL MOLECULAR SCIENCE 2022. [DOI: 10.1002/wcms.1564] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Affiliation(s)
- Hatice Gokcan
- Department of Chemistry, Mellon College of Science Carnegie Mellon University Pittsburgh Pennsylvania USA
| | - Olexandr Isayev
- Department of Chemistry, Mellon College of Science Carnegie Mellon University Pittsburgh Pennsylvania USA
| |
Collapse
|
13
|
Unke OT, Chmiela S, Gastegger M, Schütt KT, Sauceda HE, Müller KR. SpookyNet: Learning force fields with electronic degrees of freedom and nonlocal effects. Nat Commun 2021; 12:7273. [PMID: 34907176 PMCID: PMC8671403 DOI: 10.1038/s41467-021-27504-0] [Citation(s) in RCA: 87] [Impact Index Per Article: 29.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2021] [Accepted: 11/16/2021] [Indexed: 01/12/2023] Open
Abstract
Machine-learned force fields combine the accuracy of ab initio methods with the efficiency of conventional force fields. However, current machine-learned force fields typically ignore electronic degrees of freedom, such as the total charge or spin state, and assume chemical locality, which is problematic when molecules have inconsistent electronic states, or when nonlocal effects play a significant role. This work introduces SpookyNet, a deep neural network for constructing machine-learned force fields with explicit treatment of electronic degrees of freedom and nonlocality, modeled via self-attention in a transformer architecture. Chemically meaningful inductive biases and analytical corrections built into the network architecture allow it to properly model physical limits. SpookyNet improves upon the current state-of-the-art (or achieves similar performance) on popular quantum chemistry data sets. Notably, it is able to generalize across chemical and conformational space and can leverage the learned chemical insights, e.g. by predicting unknown spin states, thus helping to close a further important remaining gap for today's machine learning models in quantum chemistry.
Collapse
Affiliation(s)
- Oliver T Unke
- Machine Learning Group, Technische Universität Berlin, 10587, Berlin, Germany.
- DFG Cluster of Excellence "Unifying Systems in Catalysis" (UniSysCat), Technische Universität Berlin, 10623, Berlin, Germany.
| | - Stefan Chmiela
- Machine Learning Group, Technische Universität Berlin, 10587, Berlin, Germany
| | - Michael Gastegger
- Machine Learning Group, Technische Universität Berlin, 10587, Berlin, Germany
- DFG Cluster of Excellence "Unifying Systems in Catalysis" (UniSysCat), Technische Universität Berlin, 10623, Berlin, Germany
| | - Kristof T Schütt
- Machine Learning Group, Technische Universität Berlin, 10587, Berlin, Germany
| | - Huziel E Sauceda
- Machine Learning Group, Technische Universität Berlin, 10587, Berlin, Germany
- BASLEARN, BASF-TU joint Lab, Technische Universität Berlin, 10587, Berlin, Germany
| | - Klaus-Robert Müller
- Machine Learning Group, Technische Universität Berlin, 10587, Berlin, Germany.
- Department of Artificial Intelligence, Korea University, Anam-dong, Seongbuk-gu, Seoul, 02841, Korea.
- Max Planck Institute for Informatics, Stuhlsatzenhausweg, 66123, Saarbrücken, Germany.
- BIFOLD-Berlin Institute for the Foundations of Learning and Data, Berlin, Germany.
- Google Research, Brain team, Berlin, Germany.
| |
Collapse
|
14
|
Modee R, Laghuvarapu S, Priyakumar UD. Benchmark study on deep neural network potentials for small organic molecules. J Comput Chem 2021; 43:308-318. [PMID: 34870332 DOI: 10.1002/jcc.26790] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2021] [Revised: 11/13/2021] [Accepted: 11/15/2021] [Indexed: 11/06/2022]
Abstract
There has been tremendous advancement in machine learning (ML) applications in computational chemistry, particularly in neural network potentials (NNP). NNPs can approximate potential energy surface (PES) as a high dimensional function by learning from existing reference data, thereby circumventing the need to solve the electronic Schrödinger equation explicitly. As a result, ML accelerates chemical space exploration and property prediction compared to quantum mechanical methods. Novel ML methods have the potential to provide efficient means for predicting the properties of molecules. However, this potential has been limited by the lack of standard comparative evaluations. In this work, we compare four selected models, that is, ANI, PhysNet, SchNet, and BAND-NN, developed to represent the PES of small organic molecules. We evaluate these models for their accuracy and transferability on two different test sets (i) Small organic molecules of up to eight-heavy atoms on which ANI and SchNet achieve root mean square error (RMSE) of 0.55 and 0.60 kcal/mol, respectively. (ii) On random selection of molecules from the GDB-11 database with 10-heavy atoms, ANI achieves RMSE of 1.17 kcal/mol and SchNet achieves RMSE of 1.89 kcal/mol. We examine their ability to produce smooth meaningful surface by performing PES scans for bond stretch, angle bend, and dihedral rotations on relatively large molecules to assess their possible application in molecular dynamics simulations. We also evaluate their performance for yielding minimum energy structures via geometry optimization using various minimization algorithms. All these models were also able to accurately differentiate different isomers of the same empirical formula C 10 H 20 . ANI and PhysNet achieve an RMSE of 0.29 and 0.52 kcal/mol, respectively, on C 10 H 20 isomers.
Collapse
Affiliation(s)
- Rohit Modee
- Center for Computational Natural Sciences and Bioinformatics, International Institute of Information Technology, Hyderabad, India
| | - Siddhartha Laghuvarapu
- Center for Computational Natural Sciences and Bioinformatics, International Institute of Information Technology, Hyderabad, India
| | - U Deva Priyakumar
- Center for Computational Natural Sciences and Bioinformatics, International Institute of Information Technology, Hyderabad, India
| |
Collapse
|
15
|
Zeng J, Giese TJ, Ekesan Ş, York DM. Development of Range-Corrected Deep Learning Potentials for Fast, Accurate Quantum Mechanical/Molecular Mechanical Simulations of Chemical Reactions in Solution. J Chem Theory Comput 2021; 17:6993-7009. [PMID: 34644071 DOI: 10.1021/acs.jctc.1c00201] [Citation(s) in RCA: 43] [Impact Index Per Article: 14.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
We develop a new deep potential─range correction (DPRc) machine learning potential for combined quantum mechanical/molecular mechanical (QM/MM) simulations of chemical reactions in the condensed phase. The new range correction enables short-ranged QM/MM interactions to be tuned for higher accuracy, and the correction smoothly vanishes within a specified cutoff. We further develop an active learning procedure for robust neural network training. We test the DPRc model and training procedure against a series of six nonenzymatic phosphoryl transfer reactions in solution that are important in mechanistic studies of RNA-cleaving enzymes. Specifically, we apply DPRc corrections to a base QM model and test its ability to reproduce free-energy profiles generated from a target QM model. We perform these comparisons using the MNDO/d and DFTB2 semiempirical models because they differ in the way they treat orbital orthogonalization and electrostatics and produce free-energy profiles which differ significantly from each other, thereby providing us a rigorous stress test for the DPRc model and training procedure. The comparisons show that accurate reproduction of the free-energy profiles requires correction of the QM/MM interactions out to 6 Å. We further find that the model's initial training benefits from generating data from temperature replica exchange simulations and including high-temperature configurations into the fitting procedure, so the resulting models are trained to properly avoid high-energy regions. A single DPRc model was trained to reproduce four different reactions and yielded good agreement with the free-energy profiles made from the target QM/MM simulations. The DPRc model was further demonstrated to be transferable to 2D free-energy surfaces and 1D free-energy profiles that were not explicitly considered in the training. Examination of the computational performance of the DPRc model showed that it was fairly slow when run on CPUs but was sped up almost 100-fold when using NVIDIA V100 GPUs, resulting in almost negligible overhead. The new DPRc model and training procedure provide a potentially powerful new tool for the creation of next-generation QM/MM potentials for a wide spectrum of free-energy applications ranging from drug discovery to enzyme design.
Collapse
Affiliation(s)
- Jinzhe Zeng
- Laboratory for Biomolecular Simulation Research, Institute for Quantitative Biomedicine, and Department of Chemistry and Chemical Biology, Rutgers the State University of New Jersey, New Brunswick, New Jersey 08901-8554, United States
| | - Timothy J Giese
- Laboratory for Biomolecular Simulation Research, Institute for Quantitative Biomedicine, and Department of Chemistry and Chemical Biology, Rutgers the State University of New Jersey, New Brunswick, New Jersey 08901-8554, United States
| | - Şölen Ekesan
- Laboratory for Biomolecular Simulation Research, Institute for Quantitative Biomedicine, and Department of Chemistry and Chemical Biology, Rutgers the State University of New Jersey, New Brunswick, New Jersey 08901-8554, United States
| | - Darrin M York
- Laboratory for Biomolecular Simulation Research, Institute for Quantitative Biomedicine, and Department of Chemistry and Chemical Biology, Rutgers the State University of New Jersey, New Brunswick, New Jersey 08901-8554, United States
| |
Collapse
|
16
|
Lambros E, Dasgupta S, Palos E, Swee S, Hu J, Paesani F. General Many-Body Framework for Data-Driven Potentials with Arbitrary Quantum Mechanical Accuracy: Water as a Case Study. J Chem Theory Comput 2021; 17:5635-5650. [PMID: 34370954 DOI: 10.1021/acs.jctc.1c00541] [Citation(s) in RCA: 22] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
We present a general framework for the development of data-driven many-body (MB) potential energy functions (MB-QM PEFs) that represent the interactions between small molecules at an arbitrary quantum-mechanical (QM) level of theory. As a demonstration, a family of MB-QM PEFs for water is rigorously derived from density functionals belonging to different rungs across Jacob's ladder of approximations within density functional theory (MB-DFT) and from Møller-Plesset perturbation theory (MB-MP2). Through a systematic analysis of individual MB contributions to the interaction energies of water clusters, we demonstrate that all MB-QM PEFs preserve the same accuracy as the corresponding ab initio calculations, with the exception of those derived from density functionals within the generalized gradient approximation (GGA). The differences between the DFT and MB-DFT results are traced back to density-driven errors that prevent GGA functionals from accurately representing the underlying molecular interactions for different cluster sizes and hydrogen-bonding arrangements. We show that this shortcoming may be overcome, within the MB formalism, by using density-corrected functionals (DC-DFT) that provide a more consistent representation of each individual MB contribution. This is demonstrated through the development of a MB-DFT PEF derived from DC-PBE-D3 data, which more accurately reproduce the corresponding ab initio results.
Collapse
Affiliation(s)
- Eleftherios Lambros
- Department of Chemistry and Biochemistry, University of California San Diego, La Jolla, California 92093, United States
| | - Saswata Dasgupta
- Department of Chemistry and Biochemistry, University of California San Diego, La Jolla, California 92093, United States
| | - Etienne Palos
- Department of Chemistry and Biochemistry, University of California San Diego, La Jolla, California 92093, United States
| | - Steven Swee
- Department of Chemistry and Biochemistry, University of California San Diego, La Jolla, California 92093, United States
| | - Jie Hu
- Department of Chemistry and Biochemistry, University of California San Diego, La Jolla, California 92093, United States
| | - Francesco Paesani
- Department of Chemistry and Biochemistry, University of California San Diego, La Jolla, California 92093, United States.,Materials Science and Engineering, University of California San Diego, La Jolla, California 92093, United States.,San Diego Supercomputer Center, University of California San Diego, La Jolla, California 92093, United States
| |
Collapse
|
17
|
Hoxha M, Kamberaj H. Automation of some macromolecular properties using a machine learning approach. MACHINE LEARNING: SCIENCE AND TECHNOLOGY 2021. [DOI: 10.1088/2632-2153/abe7b6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022] Open
Abstract
Abstract
In this study, we employed a newly developed method to predict macromolecular properties using a swarm artificial neural network (ANN) method as a machine learning approach. In this method, the molecular structures are represented by the feature description vectors used as training input data for a neural network. This study aims to develop an efficient approach for training an ANN using either experimental or quantum mechanics data. We aim to introduce an error model controlling the reliability of the prediction confidence interval using a bootstrapping swarm approach. We created different datasets of selected experimental or quantum mechanics results. Using this optimized ANN, we hope to predict properties and their statistical errors for new molecules. There are four datasets used in this study. That includes the dataset of 642 small organic molecules with known experimental hydration free energies, the dataset of 1475 experimental pKa values of ionizable groups in 192 proteins, the dataset of 2693 mutants in 14 proteins with given experimental values of changes in the Gibbs free energy, and a dataset of 7101 quantum mechanics heat of formation calculations. All the data are prepared and optimized using the AMBER force field in the CHARMM macromolecular computer simulation program. The bootstrapping swarm ANN code for performing the optimization and prediction is written in Python computer programming language. The descriptor vectors of the small molecules are based on the Coulomb matrix and sum over bond properties. For the macromolecular systems, they consider the chemical-physical fingerprints of the region in the vicinity of each amino acid.
Collapse
|
18
|
Keith JA, Vassilev-Galindo V, Cheng B, Chmiela S, Gastegger M, Müller KR, Tkatchenko A. Combining Machine Learning and Computational Chemistry for Predictive Insights Into Chemical Systems. Chem Rev 2021; 121:9816-9872. [PMID: 34232033 PMCID: PMC8391798 DOI: 10.1021/acs.chemrev.1c00107] [Citation(s) in RCA: 211] [Impact Index Per Article: 70.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2021] [Indexed: 12/23/2022]
Abstract
Machine learning models are poised to make a transformative impact on chemical sciences by dramatically accelerating computational algorithms and amplifying insights available from computational chemistry methods. However, achieving this requires a confluence and coaction of expertise in computer science and physical sciences. This Review is written for new and experienced researchers working at the intersection of both fields. We first provide concise tutorials of computational chemistry and machine learning methods, showing how insights involving both can be achieved. We follow with a critical review of noteworthy applications that demonstrate how computational chemistry and machine learning can be used together to provide insightful (and useful) predictions in molecular and materials modeling, retrosyntheses, catalysis, and drug design.
Collapse
Affiliation(s)
- John A. Keith
- Department
of Chemical and Petroleum Engineering Swanson School of Engineering, University of Pittsburgh, Pittsburgh, Pennsylvania 15261, United States
| | - Valentin Vassilev-Galindo
- Department
of Physics and Materials Science, University
of Luxembourg, L-1511 Luxembourg City, Luxembourg
| | - Bingqing Cheng
- Accelerate
Programme for Scientific Discovery, Department
of Computer Science and Technology, 15 J. J. Thomson Avenue, Cambridge CB3 0FD, United Kingdom
| | - Stefan Chmiela
- Department
of Software Engineering and Theoretical Computer Science, Technische Universität Berlin, 10587, Berlin, Germany
| | - Michael Gastegger
- Department
of Software Engineering and Theoretical Computer Science, Technische Universität Berlin, 10587, Berlin, Germany
| | - Klaus-Robert Müller
- Machine
Learning Group, Technische Universität
Berlin, 10587, Berlin, Germany
- Department
of Artificial Intelligence, Korea University, Anam-dong, Seongbuk-gu, Seoul, 02841, Korea
- Max-Planck-Institut für Informatik, 66123 Saarbrücken, Germany
- Google Research, Brain Team, 10117 Berlin, Germany
| | - Alexandre Tkatchenko
- Department
of Physics and Materials Science, University
of Luxembourg, L-1511 Luxembourg City, Luxembourg
| |
Collapse
|
19
|
Abstract
Chemical compound space (CCS), the set of all theoretically conceivable combinations of chemical elements and (meta-)stable geometries that make up matter, is colossal. The first-principles based virtual sampling of this space, for example, in search of novel molecules or materials which exhibit desirable properties, is therefore prohibitive for all but the smallest subsets and simplest properties. We review studies aimed at tackling this challenge using modern machine learning techniques based on (i) synthetic data, typically generated using quantum mechanics based methods, and (ii) model architectures inspired by quantum mechanics. Such Quantum mechanics based Machine Learning (QML) approaches combine the numerical efficiency of statistical surrogate models with an ab initio view on matter. They rigorously reflect the underlying physics in order to reach universality and transferability across CCS. While state-of-the-art approximations to quantum problems impose severe computational bottlenecks, recent QML based developments indicate the possibility of substantial acceleration without sacrificing the predictive power of quantum mechanics.
Collapse
Affiliation(s)
- Bing Huang
- Faculty
of Physics, University of Vienna, 1090 Vienna, Austria
| | - O. Anatole von Lilienfeld
- Faculty
of Physics, University of Vienna, 1090 Vienna, Austria
- Institute
of Physical Chemistry and National Center for Computational Design
and Discovery of Novel Materials (MARVEL), Department of Chemistry, University of Basel, 4056 Basel, Switzerland
| |
Collapse
|
20
|
Unke O, Chmiela S, Sauceda HE, Gastegger M, Poltavsky I, Schütt KT, Tkatchenko A, Müller KR. Machine Learning Force Fields. Chem Rev 2021; 121:10142-10186. [PMID: 33705118 PMCID: PMC8391964 DOI: 10.1021/acs.chemrev.0c01111] [Citation(s) in RCA: 404] [Impact Index Per Article: 134.7] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2020] [Indexed: 12/27/2022]
Abstract
In recent years, the use of machine learning (ML) in computational chemistry has enabled numerous advances previously out of reach due to the computational complexity of traditional electronic-structure methods. One of the most promising applications is the construction of ML-based force fields (FFs), with the aim to narrow the gap between the accuracy of ab initio methods and the efficiency of classical FFs. The key idea is to learn the statistical relation between chemical structure and potential energy without relying on a preconceived notion of fixed chemical bonds or knowledge about the relevant interactions. Such universal ML approximations are in principle only limited by the quality and quantity of the reference data used to train them. This review gives an overview of applications of ML-FFs and the chemical insights that can be obtained from them. The core concepts underlying ML-FFs are described in detail, and a step-by-step guide for constructing and testing them from scratch is given. The text concludes with a discussion of the challenges that remain to be overcome by the next generation of ML-FFs.
Collapse
Affiliation(s)
- Oliver
T. Unke
- Machine
Learning Group, Technische Universität
Berlin, 10587 Berlin, Germany
- DFG
Cluster of Excellence “Unifying Systems in Catalysis”
(UniSysCat), Technische Universität Berlin, 10623 Berlin, Germany
| | - Stefan Chmiela
- Machine
Learning Group, Technische Universität
Berlin, 10587 Berlin, Germany
| | - Huziel E. Sauceda
- Machine
Learning Group, Technische Universität
Berlin, 10587 Berlin, Germany
- BASLEARN,
BASF-TU Joint Lab, Technische Universität
Berlin, 10587 Berlin, Germany
| | - Michael Gastegger
- Machine
Learning Group, Technische Universität
Berlin, 10587 Berlin, Germany
- DFG
Cluster of Excellence “Unifying Systems in Catalysis”
(UniSysCat), Technische Universität Berlin, 10623 Berlin, Germany
- BASLEARN,
BASF-TU Joint Lab, Technische Universität
Berlin, 10587 Berlin, Germany
| | - Igor Poltavsky
- Department
of Physics and Materials Science, University
of Luxembourg, L-1511 Luxembourg City, Luxembourg
| | - Kristof T. Schütt
- Machine
Learning Group, Technische Universität
Berlin, 10587 Berlin, Germany
| | - Alexandre Tkatchenko
- Department
of Physics and Materials Science, University
of Luxembourg, L-1511 Luxembourg City, Luxembourg
| | - Klaus-Robert Müller
- Machine
Learning Group, Technische Universität
Berlin, 10587 Berlin, Germany
- BIFOLD−Berlin
Institute for the Foundations of Learning and Data, Berlin, Germany
- Department
of Artificial Intelligence, Korea University, Anam-dong, Seongbuk-gu, Seoul 02841, Korea
- Max Planck
Institute for Informatics, Stuhlsatzenhausweg, 66123 Saarbrücken, Germany
- Google
Research, Brain Team, Berlin, Germany
| |
Collapse
|
21
|
Chen MS, Morawietz T, Mori H, Markland TE, Artrith N. AENET-LAMMPS and AENET-TINKER: Interfaces for accurate and efficient molecular dynamics simulations with machine learning potentials. J Chem Phys 2021; 155:074801. [PMID: 34418919 DOI: 10.1063/5.0063880] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
Machine-learning potentials (MLPs) trained on data from quantum-mechanics based first-principles methods can approach the accuracy of the reference method at a fraction of the computational cost. To facilitate efficient MLP-based molecular dynamics and Monte Carlo simulations, an integration of the MLPs with sampling software is needed. Here, we develop two interfaces that link the atomic energy network (ænet) MLP package with the popular sampling packages TINKER and LAMMPS. The three packages, ænet, TINKER, and LAMMPS, are free and open-source software that enable, in combination, accurate simulations of large and complex systems with low computational cost that scales linearly with the number of atoms. Scaling tests show that the parallel efficiency of the ænet-TINKER interface is nearly optimal but is limited to shared-memory systems. The ænet-LAMMPS interface achieves excellent parallel efficiency on highly parallel distributed-memory systems and benefits from the highly optimized neighbor list implemented in LAMMPS. We demonstrate the utility of the two MLP interfaces for two relevant example applications: the investigation of diffusion phenomena in liquid water and the equilibration of nanostructured amorphous battery materials.
Collapse
Affiliation(s)
- Michael S Chen
- Department of Chemistry, Stanford University, Stanford, California 94305, USA
| | - Tobias Morawietz
- Department of Chemistry, Stanford University, Stanford, California 94305, USA
| | - Hideki Mori
- Department of Mechanical Engineering, College of Industrial Technology, 1-27-1 Nishikoya, Amagasaki, Hyogo 661-0047, Japan
| | - Thomas E Markland
- Department of Chemistry, Stanford University, Stanford, California 94305, USA
| | - Nongnuch Artrith
- Department of Chemical Engineering, Columbia University, New York, New York 10027, USA
| |
Collapse
|
22
|
Upadhyay M, Pezzella M, Meuwly M. Genesis of Polyatomic Molecules in Dark Clouds: CO 2 Formation on Cold Amorphous Solid Water. J Phys Chem Lett 2021; 12:6781-6787. [PMID: 34270244 DOI: 10.1021/acs.jpclett.1c01810] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Understanding the formation of molecules under conditions relevant to interstellar chemistry is fundamental to characterize the chemical evolution of the universe. Using reactive molecular dynamics simulations with model-based or high-quality potential energy surfaces provides a means to specifically and quantitatively probe individual reaction channels at a molecular level. The formation of CO2 from collision of CO(1Σ) and O(1D) is characterized on amorphous solid water (ASW) under conditions typical in cold molecular clouds. Recombination takes place on the subnanosecond time scale and internal energy redistribution leads to stabilization of the product with CO2 remaining adsorbed on the ASW on extended time scales. Using a high-level, reproducing kernel-based potential energy surface for CO2, formation into and stabilization of CO2 and COO are observed.
Collapse
Affiliation(s)
- Meenu Upadhyay
- Department of Chemistry, University of Basel, Klingelbergstrasse 80, CH-4056 Basel, Switzerland
| | - Marco Pezzella
- Department of Chemistry, University of Basel, Klingelbergstrasse 80, CH-4056 Basel, Switzerland
| | - Markus Meuwly
- Department of Chemistry, University of Basel, Klingelbergstrasse 80, CH-4056 Basel, Switzerland
| |
Collapse
|
23
|
Vazquez-Salazar LI, Boittier ED, Unke OT, Meuwly M. Impact of the Characteristics of Quantum Chemical Databases on Machine Learning Prediction of Tautomerization Energies. J Chem Theory Comput 2021; 17:4769-4785. [PMID: 34288675 DOI: 10.1021/acs.jctc.1c00363] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]
Abstract
An essential aspect for adequate predictions of chemical properties by machine learning models is the database used for training them. However, studies that analyze how the content and structure of the databases used for training impact the prediction quality are scarce. In this work, we analyze and quantify the relationships learned by a machine learning model (Neural Network) trained on five different reference databases (QM9, PC9, ANI-1E, ANI-1, and ANI-1x) to predict tautomerization energies from molecules in Tautobase. For this, characteristics such as the number of heavy atoms in a molecule, number of atoms of a given element, bond composition, or initial geometry on the quality of the predictions are considered. The results indicate that training on a chemically diverse database is crucial for obtaining good results and also that conformational sampling can partly compensate for limited coverage of chemical diversity. The overall best-performing reference database (ANI-1x) performs on average by 1 kcal/mol better than PC9, which, however, contains about 2 orders of magnitude fewer reference structures. On the other hand, PC9 is chemically more diverse by a factor of ∼5 as quantified by the number of atom-in-molecule-based fragments (amons) it contains compared with the ANI family of databases. A quantitative measure for deficiencies is the Kullback-Leibler divergence between reference and target distributions. It is explicitly demonstrated that when certain types of bonds need to be covered in the target database (Tautobase) but are undersampled in the reference databases, the resulting predictions are poor. Examples of this include the poor performance of all databases analyzed to predict C(sp2)-C(sp2) double bonds close to heteroatoms and azoles containing N-N and N-O bonds. Analysis of the results with a Tree MAP algorithm provides deeper understanding of specific deficiencies in predicting tautomerization energies by the reference datasets due to inadequate coverage of chemical space. Capitalizing on this information can be used to either improve existing databases or generate new databases of sufficient diversity for a range of machine learning (ML) applications in chemistry.
Collapse
Affiliation(s)
| | - Eric D Boittier
- Department of Chemistry, University of Basel, Klingelbergstrasse 80, CH-4056 Basel, Switzerland
| | - Oliver T Unke
- Machine Learning Group, Technische Universität Berlin, 10587 Berlin, Germany.,DFG Cluster of Excellence "Unifying Systems in Catalysis" (UniSysCat), Technische Universität Berlin, 10623 Berlin, Germany
| | - Markus Meuwly
- Department of Chemistry, University of Basel, Klingelbergstrasse 80, CH-4056 Basel, Switzerland.,Department of Chemistry, Brown University, Providence, Rhode Island 02912, United States
| |
Collapse
|
24
|
Miksch AM, Morawietz T, Kästner J, Urban A, Artrith N. Strategies for the construction of machine-learning potentials for accurate and efficient atomic-scale simulations. MACHINE LEARNING: SCIENCE AND TECHNOLOGY 2021. [DOI: 10.1088/2632-2153/abfd96] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Abstract
Recent advances in machine-learning interatomic potentials have enabled the efficient modeling of complex atomistic systems with an accuracy that is comparable to that of conventional quantum-mechanics based methods. At the same time, the construction of new machine-learning potentials can seem a daunting task, as it involves data-science techniques that are not yet common in chemistry and materials science. Here, we provide a tutorial-style overview of strategies and best practices for the construction of artificial neural network (ANN) potentials. We illustrate the most important aspects of (a) data collection, (b) model selection, (c) training and validation, and (d) testing and refinement of ANN potentials on the basis of practical examples. Current research in the areas of active learning and delta learning are also discussed in the context of ANN potentials. This tutorial review aims at equipping computational chemists and materials scientists with the required background knowledge for ANN potential construction and application, with the intention to accelerate the adoption of the method, so that it can facilitate exciting research that would otherwise be challenging with conventional strategies.
Collapse
|
25
|
Zeni C, Rossi K, Glielmo A, de Gironcoli S. Compact atomic descriptors enable accurate predictions via linear models. J Chem Phys 2021; 154:224112. [PMID: 34241204 DOI: 10.1063/5.0052961] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022] Open
Abstract
We probe the accuracy of linear ridge regression employing a three-body local density representation derived from the atomic cluster expansion. We benchmark the accuracy of this framework in the prediction of formation energies and atomic forces in molecules and solids. We find that such a simple regression framework performs on par with state-of-the-art machine learning methods which are, in most cases, more complex and more computationally demanding. Subsequently, we look for ways to sparsify the descriptor and further improve the computational efficiency of the method. To this aim, we use both principal component analysis and least absolute shrinkage operator regression for energy fitting on six single-element datasets. Both methods highlight the possibility of constructing a descriptor that is four times smaller than the original with a similar or even improved accuracy. Furthermore, we find that the reduced descriptors share a sizable fraction of their features across the six independent datasets, hinting at the possibility of designing material-agnostic, optimally compressed, and accurate descriptors.
Collapse
Affiliation(s)
- Claudio Zeni
- Physics Area, International School for Advanced Studies, Trieste, Italy
| | - Kevin Rossi
- Laboratory of Nanochemistry, Institute of Chemistry and Chemical Engineering, Ecole Polytechnique Fédérale de Lausanne, Lausanne, CH, Switzerland
| | - Aldo Glielmo
- Physics Area, International School for Advanced Studies, Trieste, Italy
| | | |
Collapse
|
26
|
Abstract
Machine learning (ML) techniques applied to chemical reactions have a long history. The present contribution discusses applications ranging from small molecule reaction dynamics to computational platforms for reaction planning. ML-based techniques can be particularly relevant for problems involving both computation and experiments. For one, Bayesian inference is a powerful approach to develop models consistent with knowledge from experiments. Second, ML-based methods can also be used to handle problems that are formally intractable using conventional approaches, such as exhaustive characterization of state-to-state information in reactive collisions. Finally, the explicit simulation of reactive networks as they occur in combustion has become possible using machine-learned neural network potentials. This review provides an overview of the questions that can and have been addressed using machine learning techniques, and an outlook discusses challenges in this diverse and stimulating field. It is concluded that ML applied to chemistry problems as practiced and conceived today has the potential to transform the way with which the field approaches problems involving chemical reactions, in both research and academic teaching.
Collapse
Affiliation(s)
- Markus Meuwly
- Department of Chemistry, University of Basel, Klingelbergstrasse 80, 4056 Basel, Switzerland.,Department of Chemistry, Brown University, Providence, Rhode Island 02912, United States
| |
Collapse
|
27
|
Lu J, Xia S, Lu J, Zhang Y. Dataset Construction to Explore Chemical Space with 3D Geometry and Deep Learning. J Chem Inf Model 2021; 61:1095-1104. [PMID: 33683885 DOI: 10.1021/acs.jcim.1c00007] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
A dataset is the basis of deep learning model development, and the success of deep learning models heavily relies on the quality and size of the dataset. In this work, we present a new data preparation protocol and build a large fragment-based dataset Frag20, which consists of optimized 3D geometries and calculated molecular properties from Merck molecular force field (MMFF) and DFT at the B3LYP/6-31G* level of theory for more than half a million molecules composed of H, B, C, O, N, F, P, S, Cl, and Br with no larger than 20 heavy atoms. Based on the new dataset, we develop robust molecular energy prediction models using a simplified PhysNet architecture for both DFT-optimized and MMFF-optimized geometries, which achieve better than or close to chemical accuracy (1 kcal/mol) on multiple test sets, including CSD20 and Plati20 based on experimental crystal structures.
Collapse
Affiliation(s)
- Jianing Lu
- Department of Chemistry, New York University, New York, New York 10003, United States
| | - Song Xia
- Department of Chemistry, New York University, New York, New York 10003, United States
| | - Jieyu Lu
- Department of Chemistry, New York University, New York, New York 10003, United States
| | - Yingkai Zhang
- Department of Chemistry, New York University, New York, New York 10003, United States.,NYU-ECNU Center for Computational Chemistry at NYU Shanghai, Shanghai 200062, China
| |
Collapse
|
28
|
Abstract
We introduce new and robust decompositions of mean-field Hartree-Fock and Kohn-Sham density functional theory relying on the use of localized molecular orbitals and physically sound charge population protocols. The new lossless property decompositions, which allow for partitioning one-electron reduced density matrices into either bond-wise or atomic contributions, are compared to alternatives from the literature with regard to both molecular energies and dipole moments. Besides commenting on possible applications as an interpretative tool in the rationalization of certain electronic phenomena, we demonstrate how decomposed mean-field theory makes it possible to expose and amplify compositional features in the context of machine-learned quantum chemistry. This is made possible by improving upon the granularity of the underlying data. On the basis of our preliminary proof-of-concept results, we conjecture that many of the structure-property inferences in existence today may be further refined by efficiently leveraging an increase in dataset complexity and richness.
Collapse
Affiliation(s)
- Janus J Eriksen
- School of Chemistry, University of Bristol, Cantock's Close, Bristol BS8 1TS, United Kingdom
| |
Collapse
|
29
|
Scivetti I, Sen K, Elena AM, Todorov I. Reactive Molecular Dynamics at Constant Pressure via Nonreactive Force Fields: Extending the Empirical Valence Bond Method to the Isothermal-Isobaric Ensemble. J Phys Chem A 2020; 124:7585-7597. [PMID: 32820921 DOI: 10.1021/acs.jpca.0c05461] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
The Empirical Valence Bond (EVB) method offers a suitable framework to obtain reactive potentials through the coupling of nonreactive force fields. In this formalism, most of the implemented coupling terms are built using functional forms that depend on spatial coordinates, while parameters are fitted against reference data to model the change of chemistry between the participating nonreactive states. In this work, we demonstrate that the use of such coupling terms precludes the computation of the stress tensor for condensed phase systems and prevents the possibility to carry out EVB molecular dynamics in the isothermal-isobaric (NPT) ensemble. Alternatively, we make use of coupling terms that depend on the energy gaps, defined as the energy differences between the participating nonreactive force fields, and derive a general expression for the EVB stress tensor suitable for computation. Implementation of this new methodology is tested for a model of a single reactive malonaldehyde solvated in nonreactive water. Mass densities and probability distributions for the values of the energy gaps computed in the NPT ensemble reveal a negligible role of the reactive potential in the limit of low concentrated solutions, thus corroborating for the first time the validity of approximations based on the canonical NVT ensemble, customarily adopted for EVB simulations. The presented formalism also aims to contribute to future implementations and extensions of the EVB method to research the limit of highly concentrated solutions.
Collapse
Affiliation(s)
- Ivan Scivetti
- Daresbury Laboratory, Sc. Tech., Keckwick Lane, Daresbury, Warrington WA4 4AD, U.K.,Department of Chemistry, University of Liverpool, Liverpool L69 3BX, U.K
| | - Kakali Sen
- Daresbury Laboratory, Sc. Tech., Keckwick Lane, Daresbury, Warrington WA4 4AD, U.K
| | - Alin M Elena
- Daresbury Laboratory, Sc. Tech., Keckwick Lane, Daresbury, Warrington WA4 4AD, U.K
| | - Ilian Todorov
- Daresbury Laboratory, Sc. Tech., Keckwick Lane, Daresbury, Warrington WA4 4AD, U.K
| |
Collapse
|
30
|
Zaverkin V, Kästner J. Gaussian Moments as Physically Inspired Molecular Descriptors for Accurate and Scalable Machine Learning Potentials. J Chem Theory Comput 2020; 16:5410-5421. [DOI: 10.1021/acs.jctc.0c00347] [Citation(s) in RCA: 29] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
- V. Zaverkin
- Institute for Theoretical Chemistry, University of Stuttgart, Pfaffenwaldring 55, 70569 Stuttgart, Germany
| | - J. Kästner
- Institute for Theoretical Chemistry, University of Stuttgart, Pfaffenwaldring 55, 70569 Stuttgart, Germany
| |
Collapse
|
31
|
Dandu N, Ward L, Assary RS, Redfern PC, Narayanan B, Foster IT, Curtiss LA. Quantum-Chemically Informed Machine Learning: Prediction of Energies of Organic Molecules with 10 to 14 Non-hydrogen Atoms. J Phys Chem A 2020; 124:5804-5811. [PMID: 32539388 DOI: 10.1021/acs.jpca.0c01777] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/02/2023]
Abstract
High-fidelity quantum-chemical calculations can provide accurate predictions of molecular energies, but their high computational costs limit their utility, especially for larger molecules. We have shown in previous work that machine learning models trained on high-level quantum-chemical calculations (G4MP2) for organic molecules with one to nine non-hydrogen atoms can provide accurate predictions for other molecules of comparable size at much lower costs. Here we demonstrate that such models can also be used to effectively predict energies of molecules larger than those in the training set. To implement this strategy, we first established a set of 191 molecules with 10-14 non-hydrogen atoms having reliable experimental enthalpies of formation. We then assessed the accuracy of computed G4MP2 enthalpies of formation for these 191 molecules. The error in the G4MP2 results was somewhat larger than that for smaller molecules, and the reason for this increase is discussed. Two density functional methods, B3LYP and ωB97X-D, were also used on this set of molecules, with ωB97X-D found to perform better than B3LYP at predicting energies. The G4MP2 energies for the 191 molecules were then predicted using these two functionals with two machine learning methods, the FCHL-Δ and SchNet-Δ models, with the learning done on calculated energies of the one to nine non-hydrogen atom molecules. The better-performing model, FCHL-Δ, gave atomization energies of the 191 organic molecules with 10-14 non-hydrogen atoms within 0.4 kcal/mol of their G4MP2 energies. Thus, this work demonstrates that quantum-chemically informed machine learning can be used to successfully predict the energies of large organic molecules whose size is beyond that in the training set.
Collapse
Affiliation(s)
- Naveen Dandu
- Materials Science Division, Argonne National Laboratory, Lemont, Illinois 60439, United States.,Joint Center for Energy Storage Research (JCESR), Argonne National Laboratory, Lemont, Illinois 60439, United States
| | - Logan Ward
- Joint Center for Energy Storage Research (JCESR), Argonne National Laboratory, Lemont, Illinois 60439, United States.,Data Science and Learning Division, Argonne National Laboratory, Lemont, Illinois 60439, United States
| | - Rajeev S Assary
- Materials Science Division, Argonne National Laboratory, Lemont, Illinois 60439, United States.,Joint Center for Energy Storage Research (JCESR), Argonne National Laboratory, Lemont, Illinois 60439, United States
| | - Paul C Redfern
- Materials Science Division, Argonne National Laboratory, Lemont, Illinois 60439, United States
| | - Badri Narayanan
- Department of Mechanical Engineering, University of Louisville, Louisville, Kentucky 40292, United States
| | - Ian T Foster
- Joint Center for Energy Storage Research (JCESR), Argonne National Laboratory, Lemont, Illinois 60439, United States.,Data Science and Learning Division, Argonne National Laboratory, Lemont, Illinois 60439, United States.,Department of Computer Science, University of Chicago, Chicago, Illinois 60637, United States
| | - Larry A Curtiss
- Materials Science Division, Argonne National Laboratory, Lemont, Illinois 60439, United States.,Joint Center for Energy Storage Research (JCESR), Argonne National Laboratory, Lemont, Illinois 60439, United States
| |
Collapse
|
32
|
Casier B, Carniato S, Miteva T, Capron N, Sisourat N. Using principal component analysis for neural network high-dimensional potential energy surface. J Chem Phys 2020; 152:234103. [DOI: 10.1063/5.0009264] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Affiliation(s)
- Bastien Casier
- Sorbonne Université, CNRS, Laboratoire de Chimie Physique Matière et Rayonnement, UMR 7614, F-75005 Paris, France
| | - Stéphane Carniato
- Sorbonne Université, CNRS, Laboratoire de Chimie Physique Matière et Rayonnement, UMR 7614, F-75005 Paris, France
| | - Tsveta Miteva
- Sorbonne Université, CNRS, Laboratoire de Chimie Physique Matière et Rayonnement, UMR 7614, F-75005 Paris, France
| | - Nathalie Capron
- Sorbonne Université, CNRS, Laboratoire de Chimie Physique Matière et Rayonnement, UMR 7614, F-75005 Paris, France
| | - Nicolas Sisourat
- Sorbonne Université, CNRS, Laboratoire de Chimie Physique Matière et Rayonnement, UMR 7614, F-75005 Paris, France
| |
Collapse
|
33
|
von Lilienfeld OA, Müller KR, Tkatchenko A. Exploring chemical compound space with quantum-based machine learning. Nat Rev Chem 2020; 4:347-358. [PMID: 37127950 DOI: 10.1038/s41570-020-0189-9] [Citation(s) in RCA: 131] [Impact Index Per Article: 32.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/23/2020] [Indexed: 12/16/2022]
Abstract
Rational design of compounds with specific properties requires understanding and fast evaluation of molecular properties throughout chemical compound space - the huge set of all potentially stable molecules. Recent advances in combining quantum-mechanical calculations with machine learning provide powerful tools for exploring wide swathes of chemical compound space. We present our perspective on this exciting and quickly developing field by discussing key advances in the development and applications of quantum-mechanics-based machine-learning methods to diverse compounds and properties, and outlining the challenges ahead. We argue that significant progress in the exploration and understanding of chemical compound space can be made through a systematic combination of rigorous physical theories, comprehensive synthetic data sets of microscopic and macroscopic properties, and modern machine-learning methods that account for physical and chemical knowledge.
Collapse
|
34
|
Käser S, Unke OT, Meuwly M. Isomerization and decomposition reactions of acetaldehyde relevant to atmospheric processes from dynamics simulations on neural network-based potential energy surfaces. J Chem Phys 2020; 152:214304. [DOI: 10.1063/5.0008223] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022] Open
Affiliation(s)
- Silvan Käser
- Department of Chemistry, University of Basel, Klingelbergstrasse 80, CH-4056 Basel, Switzerland
| | - Oliver T. Unke
- Department of Chemistry, University of Basel, Klingelbergstrasse 80, CH-4056 Basel, Switzerland
| | - Markus Meuwly
- Department of Chemistry, University of Basel, Klingelbergstrasse 80, CH-4056 Basel, Switzerland
| |
Collapse
|
35
|
Jindal S, Bulusu SS. Structural evolution in gold nanoparticles using artificial neural network based interatomic potentials. J Chem Phys 2020; 152:154302. [PMID: 32321271 DOI: 10.1063/1.5142903] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Affiliation(s)
- Shweta Jindal
- Discipline of Chemistry, Indian Institute of Technology Indore, Simrol, Indore 453552, India
| | - Satya S. Bulusu
- Discipline of Chemistry, Indian Institute of Technology Indore, Simrol, Indore 453552, India
| |
Collapse
|
36
|
Fabrizio A, Meyer B, Corminboeuf C. Machine learning models of the energy curvature vs particle number for optimal tuning of long-range corrected functionals. J Chem Phys 2020; 152:154103. [DOI: 10.1063/5.0005039] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/16/2023] Open
Affiliation(s)
- Alberto Fabrizio
- Laboratory for Computational Molecular Design, Institute of Chemical Sciences and Engineering, École Polytechnique Fédérale de Lausanne, CH-1015 Lausanne, Switzerland
- National Centre for Computational Design and Discovery of Novel Materials (MARVEL), École Polytechnique Fédérale de Lausanne, 1015 Lausanne, Switzerland
| | - Benjamin Meyer
- Laboratory for Computational Molecular Design, Institute of Chemical Sciences and Engineering, École Polytechnique Fédérale de Lausanne, CH-1015 Lausanne, Switzerland
- National Centre for Computational Design and Discovery of Novel Materials (MARVEL), École Polytechnique Fédérale de Lausanne, 1015 Lausanne, Switzerland
| | - Clemence Corminboeuf
- Laboratory for Computational Molecular Design, Institute of Chemical Sciences and Engineering, École Polytechnique Fédérale de Lausanne, CH-1015 Lausanne, Switzerland
- National Centre for Computational Design and Discovery of Novel Materials (MARVEL), École Polytechnique Fédérale de Lausanne, 1015 Lausanne, Switzerland
| |
Collapse
|
37
|
Heinen S, Schwilk M, von Rudorff GF, von Lilienfeld OA. Machine learning the computational cost of quantum chemistry. MACHINE LEARNING-SCIENCE AND TECHNOLOGY 2020. [DOI: 10.1088/2632-2153/ab6ac4] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
|
38
|
Unke OT, Koner D, Patra S, Käser S, Meuwly M. High-dimensional potential energy surfaces for molecular simulations: from empiricism to machine learning. MACHINE LEARNING-SCIENCE AND TECHNOLOGY 2020. [DOI: 10.1088/2632-2153/ab5922] [Citation(s) in RCA: 25] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/26/2022]
|
39
|
Sweeny BC, Pan H, Kassem A, Sawyer JC, Ard SG, Shuman NS, Viggiano AA, Brickel S, Unke OT, Upadhyay M, Meuwly M. Thermal activation of methane by MgO+: temperature dependent kinetics, reactive molecular dynamics simulations and statistical modeling. Phys Chem Chem Phys 2020; 22:8913-8923. [DOI: 10.1039/d0cp00668h] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023]
Abstract
The kinetics methane activation (MgO+ + CH4) was studied experimentally and computationally by running and analyzing reactive atomistic simulations.
Collapse
Affiliation(s)
- Brendan C. Sweeny
- NRC Postdoc at Air Force Research Laboratory
- Space Vehicles Directorate
- Kirtland Air Force Base
- USA
| | - Hanqing Pan
- USRA Space Scholar at Air Force Research Laboratory
- Space Vehicles Directorate
- Kirtland Air Force Base
- USA
| | - Asmaa Kassem
- USRA Space Scholar at Air Force Research Laboratory
- Space Vehicles Directorate
- Kirtland Air Force Base
- USA
| | - Jordan C. Sawyer
- NRC Postdoc at Air Force Research Laboratory
- Space Vehicles Directorate
- Kirtland Air Force Base
- USA
| | - Shaun G. Ard
- Air Force Research Laboratory
- Space Vehicles Directorate
- Kirtland Air Force Base
- USA
| | - Nicholas S. Shuman
- Air Force Research Laboratory
- Space Vehicles Directorate
- Kirtland Air Force Base
- USA
| | - Albert A. Viggiano
- Air Force Research Laboratory
- Space Vehicles Directorate
- Kirtland Air Force Base
- USA
| | | | - Oliver T. Unke
- Department of Chemistry
- University of Basel
- CH-4056 Basel
- Switzerland
| | - Meenu Upadhyay
- Department of Chemistry
- University of Basel
- CH-4056 Basel
- Switzerland
| | - Markus Meuwly
- Department of Chemistry
- University of Basel
- CH-4056 Basel
- Switzerland
| |
Collapse
|
40
|
Rennekamp B, Kutzki F, Obarska-Kosinska A, Zapp C, Gräter F. Hybrid Kinetic Monte Carlo/Molecular Dynamics Simulations of Bond Scissions in Proteins. J Chem Theory Comput 2019; 16:553-563. [DOI: 10.1021/acs.jctc.9b00786] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023]
Affiliation(s)
- Benedikt Rennekamp
- Heidelberg Institute for Theoretical Studies, Schloss-Wolfsbrunnenweg 35, 69118 Heidelberg, Germany
- Institute for Theoretical Physics, Heidelberg University, Philosophenweg 16, 69120 Heidelberg, Germany
| | - Fabian Kutzki
- Heidelberg Institute for Theoretical Studies, Schloss-Wolfsbrunnenweg 35, 69118 Heidelberg, Germany
- Institute of Physical Chemistry, Karlsruhe Institute of Technology, Fritz-Haber-Weg 2, 76131 Karlsruhe, Germany
| | - Agnieszka Obarska-Kosinska
- Heidelberg Institute for Theoretical Studies, Schloss-Wolfsbrunnenweg 35, 69118 Heidelberg, Germany
- Hamburg Unit c/o DESY, European Molecular Biology Laboratory, Notkestrasse 85, 22607 Hamburg, Germany
| | - Christopher Zapp
- Heidelberg Institute for Theoretical Studies, Schloss-Wolfsbrunnenweg 35, 69118 Heidelberg, Germany
- Institute for Theoretical Physics, Heidelberg University, Philosophenweg 16, 69120 Heidelberg, Germany
| | - Frauke Gräter
- Heidelberg Institute for Theoretical Studies, Schloss-Wolfsbrunnenweg 35, 69118 Heidelberg, Germany
- Interdisciplinary Center for Scientific Computing, Heidelberg University, INF 205, 69120 Heidelberg, Germany
| |
Collapse
|
41
|
von Rudorff GF, von Lilienfeld OA. Atoms in Molecules from Alchemical Perturbation Density Functional Theory. J Phys Chem B 2019; 123:10073-10082. [PMID: 31647233 DOI: 10.1021/acs.jpcb.9b07799] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Based on thermodynamic integration, we introduce atoms in molecules (AIM) using the orbital-free framework of alchemical perturbation density functional theory (APDFT). Within APDFT, atomic energies and electron densities in molecules are arbitrary because any reference system and integration path can be selected as long as it meets the boundary conditions. We choose the uniform electron gas (jellium) as a reference and linearly scale up all nuclear charges, situated at any query molecule's atomic coordinates. Within the approximations made when calculating one-particle electron densities, this universal choice affords unambiguous and exact definitions of energies and electron densities of AIMs. Numerical results are presented for neutral small molecules (CO, N2, BF, CO2), various small molecules with different electronic hybridization states of carbon (CH4, C2H6, C2H4, C2H2, HCN), and all of the possible BN-doped mutants connecting benzene to borazine (C2nB3-nN3-nH6, 0 ≤ n ≤ 3). Our results, as well as comparison to atomic energy estimates resulting from either DFT trained neural network models or atomic basis set overlap within CCSD, suggest that APDFT based AIMs enable meaningful, interesting, and counterintuitive interpretations of chemical bonding and molecular electron densities.
Collapse
Affiliation(s)
- Guido Falk von Rudorff
- Institute of Physical Chemistry and National Center for Computational Design and Discovery of Novel Materials (MARVEL), Department of Chemistry , University of Basel , Klingelbergstrasse 80 , CH-4056 Basel , Switzerland
| | - O Anatole von Lilienfeld
- Institute of Physical Chemistry and National Center for Computational Design and Discovery of Novel Materials (MARVEL), Department of Chemistry , University of Basel , Klingelbergstrasse 80 , CH-4056 Basel , Switzerland
| |
Collapse
|
42
|
Glavatskikh M, Leguy J, Hunault G, Cauchy T, Da Mota B. Dataset's chemical diversity limits the generalizability of machine learning predictions. J Cheminform 2019; 11:69. [PMID: 33430991 PMCID: PMC6852905 DOI: 10.1186/s13321-019-0391-2] [Citation(s) in RCA: 48] [Impact Index Per Article: 9.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2019] [Accepted: 10/28/2019] [Indexed: 01/18/2023] Open
Abstract
The QM9 dataset has become the golden standard for Machine Learning (ML) predictions of various chemical properties. QM9 is based on the GDB, which is a combinatorial exploration of the chemical space. ML molecular predictions have been recently published with an accuracy on par with Density Functional Theory calculations. Such ML models need to be tested and generalized on real data. PC9, a new QM9 equivalent dataset (only H, C, N, O and F and up to 9 “heavy” atoms) of the PubChemQC project is presented in this article. A statistical study of bonding distances and chemical functions shows that this new dataset encompasses more chemical diversity. Kernel Ridge Regression, Elastic Net and the Neural Network model provided by SchNet have been used on both datasets. The overall accuracy in energy prediction is higher for the QM9 subset. However, a model trained on PC9 shows a stronger ability to predict energies of the other dataset. ![]()
Collapse
Affiliation(s)
- Marta Glavatskikh
- LERIA, University of Angers, 2 Bd Lavoisier, 49045, Angers, France.,Laboratoire MOLTECH-Anjou, UMR CNRS 6200, SFR MATRIX, UNIV Angers, 2 Bd Lavoisier, 49045, Angers, France
| | - Jules Leguy
- LERIA, University of Angers, 2 Bd Lavoisier, 49045, Angers, France
| | - Gilles Hunault
- LERIA, University of Angers, 2 Bd Lavoisier, 49045, Angers, France.,HIFIH, EA 3859, Institut de Biologie en Santé PBH-IRIS, CHU, University of Angers, 4, Rue Larrey, 49933, Angers, France
| | - Thomas Cauchy
- Laboratoire MOLTECH-Anjou, UMR CNRS 6200, SFR MATRIX, UNIV Angers, 2 Bd Lavoisier, 49045, Angers, France.
| | - Benoit Da Mota
- LERIA, University of Angers, 2 Bd Lavoisier, 49045, Angers, France
| |
Collapse
|
43
|
Lu J, Wang C, Zhang Y. Predicting Molecular Energy Using Force-Field Optimized Geometries and Atomic Vector Representations Learned from an Improved Deep Tensor Neural Network. J Chem Theory Comput 2019; 15:4113-4121. [PMID: 31142110 PMCID: PMC6615995 DOI: 10.1021/acs.jctc.9b00001] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
The use of neural networks to predict molecular properties calculated from high level quantum mechanical calculations has made significant advances in recent years, but most models need input geometries from DFT optimizations which limit their applicability in practice. In this work, we explored how machine learning can be used to predict molecular atomization energies and conformation stability using optimized geometries from Merck Molecular Force Field (MMFF). On the basis of the recently introduced deep tensor neural network (DTNN) approach, we first improved its training efficiency and performed an extensive search of its hyperparameters, and developed a DTNN_7ib model which has a test accuracy of 0.34 kcal/mol mean absolute error (MAE) on QM9 data set. Then using atomic vector representations in the DTNN_7ib model, we employed transfer learning (TL) strategy to train readout layers on the QM9M data set, in which QM properties are the same as in QM9 [calculated at the B3LYP/6-31G(2df,p) level] while molecular geometries are corresponding local minima optimized with MMFF94 force field. The developed TL_QM9M model can achieve an MAE of 0.79 kcal/mol using MMFF optimized geometries. Furthermore, we demonstrated that the same transfer learning strategy with the same atomic vector representation can be used to develop a machine learning model that can achieve an MAE of 0.51 kcal/mol in molecular energy prediction using MMFF geometries for an eMol9_CM conformation data set, which consists of 9959 molecules and 88 234 conformations with energies calculated at the B3LYP/6-31G* level. Our results indicate that DFT-level accuracy of molecular energy prediction can be achieved using force-field optimized geometries and atomic vector representations learned from deep tensor neural network, and integrated molecular modeling and machine learning would be a promising approach to develop more powerful computational tools for molecular conformation analysis.
Collapse
Affiliation(s)
- Jianing Lu
- Department of Chemistry, New York University, New York, New York 10003
| | - Cheng Wang
- Department of Chemistry, New York University, New York, New York 10003
| | - Yingkai Zhang
- Department of Chemistry, New York University, New York, New York 10003
- NYU-ECNU Center for Computational Chemistry at NYU Shanghai, Shanghai 200062, China
| |
Collapse
|
44
|
Koner D, Unke OT, Boe K, Bemish RJ, Meuwly M. Exhaustive state-to-state cross sections for reactive molecular collisions from importance sampling simulation and a neural network representation. J Chem Phys 2019; 150:211101. [DOI: 10.1063/1.5097385] [Citation(s) in RCA: 27] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Affiliation(s)
- Debasish Koner
- Department of Chemistry, University of Basel, Klingelbergstrasse 80, CH-4056 Basel, Switzerland
| | - Oliver T. Unke
- Department of Chemistry, University of Basel, Klingelbergstrasse 80, CH-4056 Basel, Switzerland
| | - Kyle Boe
- Boston College, Institute for Scientific Research, Chestnut Hill, Massachusetts 02467, USA
| | - Raymond J. Bemish
- Air Force Research Laboratory, Space Vehicles Directorate, Kirtland AFB, Albuquerque, New Mexico 87117, USA
| | - Markus Meuwly
- Department of Chemistry, University of Basel, Klingelbergstrasse 80, CH-4056 Basel, Switzerland
| |
Collapse
|
45
|
Stuke A, Todorović M, Rupp M, Kunkel C, Ghosh K, Himanen L, Rinke P. Chemical diversity in molecular orbital energy predictions with kernel ridge regression. J Chem Phys 2019; 150:204121. [DOI: 10.1063/1.5086105] [Citation(s) in RCA: 33] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open
Affiliation(s)
- Annika Stuke
- Department of Applied Physics, Aalto University, P.O. Box 11100, Aalto FI-00076, Finland
| | - Milica Todorović
- Department of Applied Physics, Aalto University, P.O. Box 11100, Aalto FI-00076, Finland
| | - Matthias Rupp
- Fritz Haber Institute of the Max Planck Society, Faradayweg 4-6, 14195 Berlin, Germany
| | - Christian Kunkel
- Department of Applied Physics, Aalto University, P.O. Box 11100, Aalto FI-00076, Finland
- Chair for Theoretical Chemistry and Catalysis Research Center, Technische Universität München, Lichtenbergstr. 4, 85747 Garching, Germany
| | - Kunal Ghosh
- Department of Applied Physics, Aalto University, P.O. Box 11100, Aalto FI-00076, Finland
- Department of Computer Science, Aalto University, P.O. Box 15400, Aaalto FI-00076, Finland
| | - Lauri Himanen
- Department of Applied Physics, Aalto University, P.O. Box 11100, Aalto FI-00076, Finland
| | - Patrick Rinke
- Department of Applied Physics, Aalto University, P.O. Box 11100, Aalto FI-00076, Finland
- Chair for Theoretical Chemistry and Catalysis Research Center, Technische Universität München, Lichtenbergstr. 4, 85747 Garching, Germany
| |
Collapse
|
46
|
Unke OT, Meuwly M. PhysNet: A Neural Network for Predicting Energies, Forces, Dipole Moments, and Partial Charges. J Chem Theory Comput 2019; 15:3678-3693. [DOI: 10.1021/acs.jctc.9b00181] [Citation(s) in RCA: 285] [Impact Index Per Article: 57.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/30/2023]
Affiliation(s)
- Oliver T. Unke
- Department of Chemistry, University of Basel, Klingelbergstrasse 80, CH-4056 Basel, Switzerland
| | - Markus Meuwly
- Department of Chemistry, University of Basel, Klingelbergstrasse 80, CH-4056 Basel, Switzerland
| |
Collapse
|
47
|
Jindal S, Bulusu SS. A transferable artificial neural network model for atomic forces in nanoparticles. J Chem Phys 2018; 149:194101. [DOI: 10.1063/1.5043247] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Affiliation(s)
- Shweta Jindal
- Discipline of Chemistry, Indian Institute of Technology Indore, Simrol, Indore 453552, India
| | - Satya S. Bulusu
- Discipline of Chemistry, Indian Institute of Technology Indore, Simrol, Indore 453552, India
| |
Collapse
|
48
|
Meldgaard SA, Kolsbjerg EL, Hammer B. Machine learning enhanced global optimization by clustering local environments to enable bundled atomic energies. J Chem Phys 2018; 149:134104. [DOI: 10.1063/1.5048290] [Citation(s) in RCA: 30] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Affiliation(s)
- Søren A. Meldgaard
- Department of Physics and Astronomy and Interdisciplinary Nanoscience Center (iNANO), Aarhus University, 8000 Aarhus, Denmark
| | - Esben L. Kolsbjerg
- Department of Physics and Astronomy and Interdisciplinary Nanoscience Center (iNANO), Aarhus University, 8000 Aarhus, Denmark
| | - Bjørk Hammer
- Department of Physics and Astronomy and Interdisciplinary Nanoscience Center (iNANO), Aarhus University, 8000 Aarhus, Denmark
| |
Collapse
|
49
|
Meuwly M. Reactive molecular dynamics: From small molecules to proteins. WILEY INTERDISCIPLINARY REVIEWS-COMPUTATIONAL MOLECULAR SCIENCE 2018. [DOI: 10.1002/wcms.1386] [Citation(s) in RCA: 20] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/12/2023]
Affiliation(s)
- Markus Meuwly
- Department of Chemistry University of Basel Basel Switzerland
- Department of Chemistry Brown University Providence Rhode Island
| |
Collapse
|
50
|
Rupp M, von Lilienfeld OA, Burke K. Guest Editorial: Special Topic on Data-Enabled Theoretical Chemistry. J Chem Phys 2018; 148:241401. [DOI: 10.1063/1.5043213] [Citation(s) in RCA: 62] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023] Open
Affiliation(s)
- Matthias Rupp
- Fritz Haber Institute of the Max Planck Society, Faradayweg 4-6, 14195 Berlin, Germany
| | - O. Anatole von Lilienfeld
- Department of Chemistry, Institute of Physical Chemistry and National Center for Computational Design and Discovery of Novel Materials, University of Basel, 4056 Basel, Switzerland
| | - Kieron Burke
- Departments of Chemistry and Physics, University of California, Irvine, California 92697, USA
| |
Collapse
|