1
|
Nandi A, Pandey P, Houston PL, Qu C, Yu Q, Conte R, Tkatchenko A, Bowman JM. Δ-Machine Learning to Elevate DFT-Based Potentials and a Force Field to the CCSD( T) Level Illustrated for Ethanol. J Chem Theory Comput 2024; 20:8807-8819. [PMID: 39361051 PMCID: PMC11500277 DOI: 10.1021/acs.jctc.4c00977] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2024] [Revised: 09/17/2024] [Accepted: 09/18/2024] [Indexed: 10/23/2024]
Abstract
Progress in machine learning has facilitated the development of potentials that offer both the accuracy of first-principles techniques and vast increases in the speed of evaluation. Recently, Δ-machine learning has been used to elevate the quality of a potential energy surface (PES) based on low-level, e.g., density functional theory (DFT) energies and gradients to close to the gold-standard coupled cluster level of accuracy. We have demonstrated the success of this approach for molecules, ranging in size from H3O+ to 15-atom acetyl-acetone and tropolone. These were all done using the B3LYP functional. Here, we investigate the generality of this approach for the PBE, M06, M06-2X, and PBE0 + MBD functionals, using ethanol as the example molecule. Linear regression with permutationally invariant polynomials is used to fit both low-level and correction PESs. These PESs are employed for standard RMSE analysis for training and test data sets, and then general fidelity tests such as energetics of stationary points, normal-mode frequencies, and torsional potentials are examined. We achieve similar improvements in all cases. Interestingly, we obtained significant improvement over DFT gradients where coupled cluster gradients were not used to correct the low-level PES. Finally, we present some results for correcting a recent molecular mechanics force field for ethanol and comment on the possible generality of this approach.
Collapse
Affiliation(s)
- Apurba Nandi
- Department
of Physics and Materials Science, University
of Luxembourg, L-1511 Luxembourg City, Luxembourg
| | - Priyanka Pandey
- Department
of Chemistry and Cherry L. Emerson Center for Scientific Computation, Emory University, Atlanta, Georgia 30322, United States
| | - Paul L. Houston
- Department
of Chemistry and Chemical Biology, Cornell
University, Ithaca, New York 14853, United States
- Department
of Chemistry and Biochemistry, Georgia Institute
of Technology, Atlanta, Georgia 30332, United States
| | - Chen Qu
- Independent
Researcher, Toronto, Ontario M9B0E3, Canada
| | - Qi Yu
- Department
of Chemistry, Fudan University, Shanghai 200438, P. R. China
| | - Riccardo Conte
- Dipartimento
di Chimica, Università degli Studi
di Milano, via Golgi 19, 20133 Milano, Italy
| | - Alexandre Tkatchenko
- Department
of Physics and Materials Science, University
of Luxembourg, L-1511 Luxembourg City, Luxembourg
| | - Joel M. Bowman
- Department
of Chemistry and Cherry L. Emerson Center for Scientific Computation, Emory University, Atlanta, Georgia 30322, United States
| |
Collapse
|
2
|
Sit MK, Das S, Samanta K. Machine Learning-Assisted Mixed Quantum-Classical Dynamics without Explicit Nonadiabatic Coupling: Application to the Photodissociation of Peroxynitric Acid. J Phys Chem A 2024; 128:8244-8253. [PMID: 39283987 DOI: 10.1021/acs.jpca.4c02876] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/27/2024]
Abstract
We have devised a hybrid quantum-classical scheme utilizing machine-learned potential energy surfaces (PES), which circumvents the need for explicit computation of nonadiabatic coupling elements. The quantities necessary to account for the nonadiabatic effects are directly obtained from the PESs. The simulation of dynamics is based on the fewest-switches surface-hopping method. We applied this scheme to model the photodissociation of both N-O and O-O bonds in a conformer of peroxynitric acid (HO2NO2). Adiabatic PES data for the six lowest states of this molecule were computed at the CASSCF level for various nuclear configurations. These served as the training data for the machine-learning models for the PESs. The dynamics simulation was initiated on the lowest optically bright singlet excited state (S4) and propagated along the two Jacobi coordinates J → 1 and J → 2 while accounting for the nonadiabatic effects through transitions between PESs. Our analysis revealed that there is a very high chance of dissociation of the N-O bond leading to the HO2 and NO2 fragments.
Collapse
Affiliation(s)
- Mahesh K Sit
- School of Basic Sciences, Indian Institute of Technology Bhubaneswar, Argul, Odisha 752050, India
| | - Subhasish Das
- School of Basic Sciences, Indian Institute of Technology Bhubaneswar, Argul, Odisha 752050, India
| | - Kousik Samanta
- School of Basic Sciences, Indian Institute of Technology Bhubaneswar, Argul, Odisha 752050, India
| |
Collapse
|
3
|
Gutierrez-Cardenas J, Gibbas BD, Whitaker K, Kaledin M, Kaledin AL. A Low-Order Permutationally Invariant Polynomial Approach to Learning Potential Energy Surfaces Using the Bond-Order Charge-Density Matrix: Application to C n Clusters for n = 3-10, 20. J Phys Chem A 2024; 128:7703-7713. [PMID: 39205486 PMCID: PMC11407436 DOI: 10.1021/acs.jpca.4c04281] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/04/2024]
Abstract
A representation for learning potential energy surfaces (PESs) in terms of permutationally invariant polynomials (PIPs) using the Hartree-Fock expression for electronic energy is proposed. Our approach is based on the one-electron core Hamiltonian weighted by the configuration-dependent elements of the bond-order charge density matrix (CDM). While the previously reported model used an s-function Gaussian basis for the CDM, the present formulation is expanded with p-functions, which are crucial for describing chemical bonding. Detailed results are demonstrated on linear and cyclic Cn clusters (n = 3-10) trained on extensive B3LYP/aug-cc-pVTZ data. The described method facilitates PES learning by reducing the root mean squared error (RMSE) by a factor of 5 relative to the s-function formulation and by a factor of 20 relative to the conventional PIP approach. This is equivalent to using CDM and an sp basis with a PIP of order M to achieve the same RMSE as with the conventional method with a PIP of order M + 2. Implications for large-scale problems are discussed using the case of the PES of the C20 fullerene in full permutational symmetry.
Collapse
Affiliation(s)
- Jose Gutierrez-Cardenas
- Department of Chemistry & Biochemistry, Kennesaw State University, 370 Paulding Ave NW ,Box#1203,Kennesaw 30144, Georgia
| | - Benjamin D Gibbas
- Department of Chemistry & Biochemistry, Kennesaw State University, 370 Paulding Ave NW ,Box#1203,Kennesaw 30144, Georgia
| | - Kyle Whitaker
- Department of Chemistry & Biochemistry, Kennesaw State University, 370 Paulding Ave NW ,Box#1203,Kennesaw 30144, Georgia
| | - Martina Kaledin
- Department of Chemistry & Biochemistry, Kennesaw State University, 370 Paulding Ave NW ,Box#1203,Kennesaw 30144, Georgia
| | - Alexey L Kaledin
- Cherry L. Emerson Center for Scientific Computation and Department of Chemistry, Emory University, 1515 Dickey Drive ,Atlanta 30322, Georgia
| |
Collapse
|
4
|
Fisher KE, Herbst MF, Marzouk YM. Multitask methods for predicting molecular properties from heterogeneous data. J Chem Phys 2024; 161:014114. [PMID: 38958501 DOI: 10.1063/5.0201681] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2024] [Accepted: 06/12/2024] [Indexed: 07/04/2024] Open
Abstract
Data generation remains a bottleneck in training surrogate models to predict molecular properties. We demonstrate that multitask Gaussian process regression overcomes this limitation by leveraging both expensive and cheap data sources. In particular, we consider training sets constructed from coupled-cluster (CC) and density functional theory (DFT) data. We report that multitask surrogates can predict at CC-level accuracy with a reduction in data generation cost by over an order of magnitude. Of note, our approach allows the training set to include DFT data generated by a heterogeneous mix of exchange-correlation functionals without imposing any artificial hierarchy on functional accuracy. More generally, the multitask framework can accommodate a wider range of training set structures-including the full disparity between the different levels of fidelity-than existing kernel approaches based on Δ-learning although we show that the accuracy of the two approaches can be similar. Consequently, multitask regression can be a tool for reducing data generation costs even further by opportunistically exploiting existing data sources.
Collapse
Affiliation(s)
- K E Fisher
- Department of Aeronautics and Astronautics, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA
| | - M F Herbst
- Mathematics for Materials Modelling, Institute of Mathematics and Institute of Materials, École Polytechnique Fédérale de Lausanne, 1015 Lausanne, Switzerland
- National Centre for Computational Design and Discovery of Novel Materials (MARVEL), École Polytechnique Fédérale de Lausanne, 1015 Lausanne, Switzerland
| | - Y M Marzouk
- Department of Aeronautics and Astronautics, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA
| |
Collapse
|
5
|
Aldossary A, Campos-Gonzalez-Angulo JA, Pablo-García S, Leong SX, Rajaonson EM, Thiede L, Tom G, Wang A, Avagliano D, Aspuru-Guzik A. In Silico Chemical Experiments in the Age of AI: From Quantum Chemistry to Machine Learning and Back. ADVANCED MATERIALS (DEERFIELD BEACH, FLA.) 2024; 36:e2402369. [PMID: 38794859 DOI: 10.1002/adma.202402369] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/15/2024] [Revised: 04/28/2024] [Indexed: 05/26/2024]
Abstract
Computational chemistry is an indispensable tool for understanding molecules and predicting chemical properties. However, traditional computational methods face significant challenges due to the difficulty of solving the Schrödinger equations and the increasing computational cost with the size of the molecular system. In response, there has been a surge of interest in leveraging artificial intelligence (AI) and machine learning (ML) techniques to in silico experiments. Integrating AI and ML into computational chemistry increases the scalability and speed of the exploration of chemical space. However, challenges remain, particularly regarding the reproducibility and transferability of ML models. This review highlights the evolution of ML in learning from, complementing, or replacing traditional computational chemistry for energy and property predictions. Starting from models trained entirely on numerical data, a journey set forth toward the ideal model incorporating or learning the physical laws of quantum mechanics. This paper also reviews existing computational methods and ML models and their intertwining, outlines a roadmap for future research, and identifies areas for improvement and innovation. Ultimately, the goal is to develop AI architectures capable of predicting accurate and transferable solutions to the Schrödinger equation, thereby revolutionizing in silico experiments within chemistry and materials science.
Collapse
Affiliation(s)
- Abdulrahman Aldossary
- Department of Chemistry, University of Toronto, 80 St. George Street, Toronto, ON, M5S 3H6, Canada
| | | | - Sergio Pablo-García
- Department of Chemistry, University of Toronto, 80 St. George Street, Toronto, ON, M5S 3H6, Canada
- Department of Computer Science, University of Toronto, 40 St. George Street, Toronto, ON, M5S 2E4, Canada
| | - Shi Xuan Leong
- Department of Chemistry, University of Toronto, 80 St. George Street, Toronto, ON, M5S 3H6, Canada
| | - Ella Miray Rajaonson
- Department of Chemistry, University of Toronto, 80 St. George Street, Toronto, ON, M5S 3H6, Canada
- Vector Institute for Artificial Intelligence, 661 University Ave. Suite 710, Toronto, ON, M5G 1M1, Canada
| | - Luca Thiede
- Department of Computer Science, University of Toronto, 40 St. George Street, Toronto, ON, M5S 2E4, Canada
- Vector Institute for Artificial Intelligence, 661 University Ave. Suite 710, Toronto, ON, M5G 1M1, Canada
| | - Gary Tom
- Department of Chemistry, University of Toronto, 80 St. George Street, Toronto, ON, M5S 3H6, Canada
- Vector Institute for Artificial Intelligence, 661 University Ave. Suite 710, Toronto, ON, M5G 1M1, Canada
| | - Andrew Wang
- Department of Chemistry, University of Toronto, 80 St. George Street, Toronto, ON, M5S 3H6, Canada
| | - Davide Avagliano
- Chimie ParisTech, PSL University, CNRS, Institute of Chemistry for Life and Health Sciences (iCLeHS UMR 8060), Paris, F-75005, France
| | - Alán Aspuru-Guzik
- Department of Chemistry, University of Toronto, 80 St. George Street, Toronto, ON, M5S 3H6, Canada
- Department of Computer Science, University of Toronto, 40 St. George Street, Toronto, ON, M5S 2E4, Canada
- Vector Institute for Artificial Intelligence, 661 University Ave. Suite 710, Toronto, ON, M5G 1M1, Canada
- Department of Materials Science & Engineering, University of Toronto, 184 College St., Toronto, ON, M5S 3E4, Canada
- Department of Chemical Engineering & Applied Chemistry, University of Toronto, 200 College St., Toronto, ON, M5S 3E5, Canada
- Lebovic Fellow, Canadian Institute for Advanced Research (CIFAR), 66118 University Ave., Toronto, M5G 1M1, Canada
- Acceleration Consortium, 80 St George St, Toronto, M5S 3H6, Canada
| |
Collapse
|
6
|
Ge F, Wang R, Qu C, Zheng P, Nandi A, Conte R, Houston PL, Bowman JM, Dral PO. Tell Machine Learning Potentials What They Are Needed For: Simulation-Oriented Training Exemplified for Glycine. J Phys Chem Lett 2024; 15:4451-4460. [PMID: 38626460 DOI: 10.1021/acs.jpclett.4c00746] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/18/2024]
Abstract
Machine learning potentials (MLPs) are widely applied as an efficient alternative way to represent potential energy surfaces (PESs) in many chemical simulations. The MLPs are often evaluated with the root-mean-square errors on the test set drawn from the same distribution as the training data. Here, we systematically investigate the relationship between such test errors and the simulation accuracy with MLPs on an example of a full-dimensional, global PES for the glycine amino acid. Our results show that the errors in the test set do not unambiguously reflect the MLP performance in different simulation tasks, such as relative conformer energies, barriers, vibrational levels, and zero-point vibrational energies. We also offer an easily accessible solution for improving the MLP quality in a simulation-oriented manner, yielding the most precise relative conformer energies and barriers. This solution also passed the stringent test by diffusion Monte Carlo simulations.
Collapse
Affiliation(s)
- Fuchun Ge
- State Key Laboratory of Physical Chemistry of Solid Surfaces, College of Chemistry and Chemical Engineering, Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, and Innovation Laboratory for Sciences and Technologies of Energy Materials of Fujian Province (IKKEM), Xiamen University, Xiamen, Fujian 361005, China
| | - Ran Wang
- State Key Laboratory of Physical Chemistry of Solid Surfaces, College of Chemistry and Chemical Engineering, Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, and Innovation Laboratory for Sciences and Technologies of Energy Materials of Fujian Province (IKKEM), Xiamen University, Xiamen, Fujian 361005, China
| | - Chen Qu
- Independent Researcher, Toronto, Ontario M9B0E3, Canada
| | - Peikun Zheng
- State Key Laboratory of Physical Chemistry of Solid Surfaces, College of Chemistry and Chemical Engineering, Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, and Innovation Laboratory for Sciences and Technologies of Energy Materials of Fujian Province (IKKEM), Xiamen University, Xiamen, Fujian 361005, China
| | - Apurba Nandi
- Department of Chemistry and Cherry L. Emerson Center for Scientific Computation, Emory University, Atlanta, Georgia 30322, United States
- Department of Physics and Materials Science, University of Luxembourg, Luxembourg City L-1511, Luxembourg
| | - Riccardo Conte
- Dipartimento di Chimica, Università degli Studi di Milano, via Golgi 19, 20133 Milano, Italy
| | - Paul L Houston
- Department of Chemistry and Chemical Biology, Cornell University, Ithaca, New York 14853, United States
| | - Joel M Bowman
- Department of Chemistry and Cherry L. Emerson Center for Scientific Computation, Emory University, Atlanta, Georgia 30322, United States
| | - Pavlo O Dral
- State Key Laboratory of Physical Chemistry of Solid Surfaces, College of Chemistry and Chemical Engineering, Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, and Innovation Laboratory for Sciences and Technologies of Energy Materials of Fujian Province (IKKEM), Xiamen University, Xiamen, Fujian 361005, China
| |
Collapse
|
7
|
Schröder B, Rauhut G. From the Automated Calculation of Potential Energy Surfaces to Accurate Infrared Spectra. J Phys Chem Lett 2024; 15:3159-3169. [PMID: 38478898 PMCID: PMC10961845 DOI: 10.1021/acs.jpclett.4c00186] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2024] [Revised: 02/20/2024] [Accepted: 02/28/2024] [Indexed: 03/22/2024]
Abstract
Advances in the development of quantum chemical methods and progress in multicore architectures in computer science made the simulation of infrared spectra of isolated molecules competitive with respect to established experimental methods. Although it is mainly the multidimensional potential energy surface that controls the accuracy of these calculations, the subsequent vibrational structure calculations need to be carefully converged in order to yield accurate results. As both aspects need to be considered in a balanced way, we focus on approaches for molecules of up to 12-15 atoms with respect to both parts, which have been automated to some extent so that they can be employed in routine applications. Alternatives to machine learning will be discussed, which appear to be attractive, as long as local regions of the potential energy surface are sufficient. The automatization of these methods is still in its infancy, and the generalization to molecules with large amplitude motions or molecular clusters is far from trivial, but many systems relevant for astrophysical studies are already in reach.
Collapse
Affiliation(s)
- Benjamin Schröder
- Institute
of Physical Chemistry, University of Goettingen, Tammannstrasse 6, Göttingen 37077, Germany
| | - Guntram Rauhut
- Institute
for Theoretical Chemistry, University of
Stuttgart, Pfaffenwaldring 55, Stuttgart 70569, Germany
| |
Collapse
|
8
|
Dral PO. AI in computational chemistry through the lens of a decade-long journey. Chem Commun (Camb) 2024; 60:3240-3258. [PMID: 38444290 DOI: 10.1039/d4cc00010b] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/07/2024]
Abstract
This article gives a perspective on the progress of AI tools in computational chemistry through the lens of the author's decade-long contributions put in the wider context of the trends in this rapidly expanding field. This progress over the last decade is tremendous: while a decade ago we had a glimpse of what was to come through many proof-of-concept studies, now we witness the emergence of many AI-based computational chemistry tools that are mature enough to make faster and more accurate simulations increasingly routine. Such simulations in turn allow us to validate and even revise experimental results, deepen our understanding of the physicochemical processes in nature, and design better materials, devices, and drugs. The rapid introduction of powerful AI tools gives rise to unique challenges and opportunities that are discussed in this article too.
Collapse
Affiliation(s)
- Pavlo O Dral
- State Key Laboratory of Physical Chemistry of Solid Surfaces, College of Chemistry and Chemical Engineering, Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, and Innovation Laboratory for Sciences and Technologies of Energy Materials of Fujian Province (IKKEM), Xiamen University, Xiamen, Fujian 361005, China.
| |
Collapse
|
9
|
Izvekov S, Kroonblawd MP, Larentzos JP, Brennan JK, Rice BM. Maximum Entropy Theory of Multiscale Coarse-Graining via Matching Thermodynamic Forces: Application to a Molecular Crystal (TATB). J Phys Chem B 2024. [PMID: 38489758 DOI: 10.1021/acs.jpcb.3c07078] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/17/2024]
Abstract
The MSCG/FM (multiscale coarse-graining via force-matching) approach is an efficient supervised machine learning method to develop microscopically informed coarse-grained (CG) models. We present a theory based on the principle of maximum entropy (PME) enveloping the existing MSCG/FM approaches. This theory views the MSCG/FM method as a special case of matching the thermodynamic forces from the extended ensemble described by the set of thermodynamic (relevant) system coordinates. This set may include CG coordinates, the stress tensor, applied external fields, and so forth, and may be characterized by nonequilibrium conditions. Following the presentation of the theory, we discuss the consistent matching of both bonded and nonbonded interactions. The proposed PME formulation is used as a starting point to extend the MSCG/FM method to the constant strain ensemble, which together with the explicit matching of the bonded forces is better suited for coarse-graining anisotropic media at a submolecular resolution. The theory is demonstrated by performing the fine coarse-graining of crystalline 1,3,5-triamino-2,4,6-trinitrobenzene (TATB), a well-known insensitive molecular energetic material, which exhibits highly anisotropic mechanical properties.
Collapse
Affiliation(s)
- Sergei Izvekov
- U.S. Army DEVCOM Army Research Laboratory, Aberdeen Proving Ground, Maryland 21005, United States
| | - Matthew P Kroonblawd
- Physical and Life Sciences Directorate, Lawrence Livermore National Laboratory, Livermore, California 94550, United States
| | - James P Larentzos
- U.S. Army DEVCOM Army Research Laboratory, Aberdeen Proving Ground, Maryland 21005, United States
| | - John K Brennan
- U.S. Army DEVCOM Army Research Laboratory, Aberdeen Proving Ground, Maryland 21005, United States
| | - Betsy M Rice
- U.S. Army DEVCOM Army Research Laboratory, Aberdeen Proving Ground, Maryland 21005, United States
| |
Collapse
|
10
|
Célerse F, Wodrich MD, Vela S, Gallarati S, Fabregat R, Juraskova V, Corminboeuf C. From Organic Fragments to Photoswitchable Catalysts: The OFF-ON Structural Repository for Transferable Kernel-Based Potentials. J Chem Inf Model 2024; 64:1201-1212. [PMID: 38319296 PMCID: PMC10900300 DOI: 10.1021/acs.jcim.3c01953] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2023] [Revised: 01/18/2024] [Accepted: 01/22/2024] [Indexed: 02/07/2024]
Abstract
Structurally and conformationally diverse databases are needed to train accurate neural networks or kernel-based potentials capable of exploring the complex free energy landscape of flexible functional organic molecules. Curating such databases for species beyond "simple" drug-like compounds or molecules composed of well-defined building blocks (e.g., peptides) is challenging as it requires thorough chemical space mapping and evaluation of both chemical and conformational diversities. Here, we introduce the OFF-ON (organic fragments from organocatalysts that are non-modular) database, a repository of 7869 equilibrium and 67,457 nonequilibrium geometries of organic compounds and dimers aimed at describing conformationally flexible functional organic molecules, with an emphasis on photoswitchable organocatalysts. The relevance of this database is then demonstrated by training a local kernel regression model on a low-cost semiempirical baseline and comparing it with a PBE0-D3 reference for several known catalysts, notably the free energy surfaces of exemplary photoswitchable organocatalysts. Our results demonstrate that the OFF-ON data set offers reliable predictions for simulating the conformational behavior of virtually any (photoswitchable) organocatalyst or organic compound composed of H, C, N, O, F, and S atoms, thereby opening a computationally feasible route to explore complex free energy surfaces in order to rationalize and predict catalytic behavior.
Collapse
Affiliation(s)
- Frédéric Célerse
- Laboratory
for Computational Molecular Design (LCMD), Institute of Chemical Sciences
and Engineering, Ecole Polytechnique Fédérale
de Lausanne (EPFL), Lausanne 1015, Switzerland
| | - Matthew D. Wodrich
- Laboratory
for Computational Molecular Design (LCMD), Institute of Chemical Sciences
and Engineering, Ecole Polytechnique Fédérale
de Lausanne (EPFL), Lausanne 1015, Switzerland
- National
Center for Competence in Research-Catalysis (NCCR-Catalysis), Ecole Polytechnique Fédérale de Lausanne, Lausanne 1015, Switzerland
| | - Sergi Vela
- Laboratory
for Computational Molecular Design (LCMD), Institute of Chemical Sciences
and Engineering, Ecole Polytechnique Fédérale
de Lausanne (EPFL), Lausanne 1015, Switzerland
| | - Simone Gallarati
- Laboratory
for Computational Molecular Design (LCMD), Institute of Chemical Sciences
and Engineering, Ecole Polytechnique Fédérale
de Lausanne (EPFL), Lausanne 1015, Switzerland
| | - Raimon Fabregat
- Laboratory
for Computational Molecular Design (LCMD), Institute of Chemical Sciences
and Engineering, Ecole Polytechnique Fédérale
de Lausanne (EPFL), Lausanne 1015, Switzerland
| | - Veronika Juraskova
- Laboratory
for Computational Molecular Design (LCMD), Institute of Chemical Sciences
and Engineering, Ecole Polytechnique Fédérale
de Lausanne (EPFL), Lausanne 1015, Switzerland
| | - Clémence Corminboeuf
- Laboratory
for Computational Molecular Design (LCMD), Institute of Chemical Sciences
and Engineering, Ecole Polytechnique Fédérale
de Lausanne (EPFL), Lausanne 1015, Switzerland
- National
Center for Competence in Research-Catalysis (NCCR-Catalysis), Ecole Polytechnique Fédérale de Lausanne, Lausanne 1015, Switzerland
- National
Centre for Computational Design and Discovery of Novel Materials (MARVEL), Ecole Polytechnique Fédérale de Lausanne, Lausanne 1015, Switzerland
| |
Collapse
|
11
|
Dral PO, Ge F, Hou YF, Zheng P, Chen Y, Barbatti M, Isayev O, Wang C, Xue BX, Pinheiro Jr M, Su Y, Dai Y, Chen Y, Zhang L, Zhang S, Ullah A, Zhang Q, Ou Y. MLatom 3: A Platform for Machine Learning-Enhanced Computational Chemistry Simulations and Workflows. J Chem Theory Comput 2024; 20:1193-1213. [PMID: 38270978 PMCID: PMC10867807 DOI: 10.1021/acs.jctc.3c01203] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2023] [Revised: 12/29/2023] [Accepted: 01/03/2024] [Indexed: 01/26/2024]
Abstract
Machine learning (ML) is increasingly becoming a common tool in computational chemistry. At the same time, the rapid development of ML methods requires a flexible software framework for designing custom workflows. MLatom 3 is a program package designed to leverage the power of ML to enhance typical computational chemistry simulations and to create complex workflows. This open-source package provides plenty of choice to the users who can run simulations with the command-line options, input files, or with scripts using MLatom as a Python package, both on their computers and on the online XACS cloud computing service at XACScloud.com. Computational chemists can calculate energies and thermochemical properties, optimize geometries, run molecular and quantum dynamics, and simulate (ro)vibrational, one-photon UV/vis absorption, and two-photon absorption spectra with ML, quantum mechanical, and combined models. The users can choose from an extensive library of methods containing pretrained ML models and quantum mechanical approximations such as AIQM1 approaching coupled-cluster accuracy. The developers can build their own models using various ML algorithms. The great flexibility of MLatom is largely due to the extensive use of the interfaces to many state-of-the-art software packages and libraries.
Collapse
Affiliation(s)
- Pavlo O. Dral
- State
Key Laboratory of Physical Chemistry of Solid Surfaces, College of
Chemistry and Chemical Engineering, and Innovation Laboratory for
Sciences and Technologies of Energy Materials of Fujian Province (IKKEM), Xiamen University, Xiamen, Fujian 361005, China
- Fujian
Provincial Key Laboratory of Theoretical and Computational Chemistry, Xiamen, Fujian 361005, China
| | - Fuchun Ge
- State
Key Laboratory of Physical Chemistry of Solid Surfaces, College of
Chemistry and Chemical Engineering, and Innovation Laboratory for
Sciences and Technologies of Energy Materials of Fujian Province (IKKEM), Xiamen University, Xiamen, Fujian 361005, China
- Fujian
Provincial Key Laboratory of Theoretical and Computational Chemistry, Xiamen, Fujian 361005, China
| | - Yi-Fan Hou
- State
Key Laboratory of Physical Chemistry of Solid Surfaces, College of
Chemistry and Chemical Engineering, and Innovation Laboratory for
Sciences and Technologies of Energy Materials of Fujian Province (IKKEM), Xiamen University, Xiamen, Fujian 361005, China
- Fujian
Provincial Key Laboratory of Theoretical and Computational Chemistry, Xiamen, Fujian 361005, China
| | - Peikun Zheng
- State
Key Laboratory of Physical Chemistry of Solid Surfaces, College of
Chemistry and Chemical Engineering, and Innovation Laboratory for
Sciences and Technologies of Energy Materials of Fujian Province (IKKEM), Xiamen University, Xiamen, Fujian 361005, China
- Fujian
Provincial Key Laboratory of Theoretical and Computational Chemistry, Xiamen, Fujian 361005, China
| | - Yuxinxin Chen
- State
Key Laboratory of Physical Chemistry of Solid Surfaces, College of
Chemistry and Chemical Engineering, and Innovation Laboratory for
Sciences and Technologies of Energy Materials of Fujian Province (IKKEM), Xiamen University, Xiamen, Fujian 361005, China
- Fujian
Provincial Key Laboratory of Theoretical and Computational Chemistry, Xiamen, Fujian 361005, China
| | - Mario Barbatti
- Aix
Marseille University, CNRS, ICR, Marseille 13013, France
- Institut
Universitaire de France, Paris 75231, France
| | - Olexandr Isayev
- Department
of Chemistry, Carnegie Mellon University, Pittsburgh, Pennsylvania15213, United States
| | - Cheng Wang
- State
Key Laboratory of Physical Chemistry of Solid Surfaces, College of
Chemistry and Chemical Engineering, and Innovation Laboratory for
Sciences and Technologies of Energy Materials of Fujian Province (IKKEM), Xiamen University, Xiamen, Fujian 361005, China
- iChem, Xiamen University, Xiamen, Fujian 361005, China
| | - Bao-Xin Xue
- State
Key Laboratory of Physical Chemistry of Solid Surfaces, College of
Chemistry and Chemical Engineering, and Innovation Laboratory for
Sciences and Technologies of Energy Materials of Fujian Province (IKKEM), Xiamen University, Xiamen, Fujian 361005, China
- Fujian
Provincial Key Laboratory of Theoretical and Computational Chemistry, Xiamen, Fujian 361005, China
| | - Max Pinheiro Jr
- Aix
Marseille University, CNRS, ICR, Marseille 13013, France
| | - Yuming Su
- State
Key Laboratory of Physical Chemistry of Solid Surfaces, College of
Chemistry and Chemical Engineering, and Innovation Laboratory for
Sciences and Technologies of Energy Materials of Fujian Province (IKKEM), Xiamen University, Xiamen, Fujian 361005, China
- iChem, Xiamen University, Xiamen, Fujian 361005, China
| | - Yiheng Dai
- State
Key Laboratory of Physical Chemistry of Solid Surfaces, College of
Chemistry and Chemical Engineering, and Innovation Laboratory for
Sciences and Technologies of Energy Materials of Fujian Province (IKKEM), Xiamen University, Xiamen, Fujian 361005, China
- iChem, Xiamen University, Xiamen, Fujian 361005, China
| | - Yangtao Chen
- State
Key Laboratory of Physical Chemistry of Solid Surfaces, College of
Chemistry and Chemical Engineering, and Innovation Laboratory for
Sciences and Technologies of Energy Materials of Fujian Province (IKKEM), Xiamen University, Xiamen, Fujian 361005, China
- iChem, Xiamen University, Xiamen, Fujian 361005, China
| | - Lina Zhang
- State
Key Laboratory of Physical Chemistry of Solid Surfaces, College of
Chemistry and Chemical Engineering, and Innovation Laboratory for
Sciences and Technologies of Energy Materials of Fujian Province (IKKEM), Xiamen University, Xiamen, Fujian 361005, China
- Fujian
Provincial Key Laboratory of Theoretical and Computational Chemistry, Xiamen, Fujian 361005, China
| | - Shuang Zhang
- State
Key Laboratory of Physical Chemistry of Solid Surfaces, College of
Chemistry and Chemical Engineering, and Innovation Laboratory for
Sciences and Technologies of Energy Materials of Fujian Province (IKKEM), Xiamen University, Xiamen, Fujian 361005, China
- Fujian
Provincial Key Laboratory of Theoretical and Computational Chemistry, Xiamen, Fujian 361005, China
| | - Arif Ullah
- School
of Physics and Optoelectronic Engineering, Anhui University, Hefei230601, China
| | - Quanhao Zhang
- State
Key Laboratory of Physical Chemistry of Solid Surfaces, College of
Chemistry and Chemical Engineering, and Innovation Laboratory for
Sciences and Technologies of Energy Materials of Fujian Province (IKKEM), Xiamen University, Xiamen, Fujian 361005, China
- Fujian
Provincial Key Laboratory of Theoretical and Computational Chemistry, Xiamen, Fujian 361005, China
| | - Yanchi Ou
- State
Key Laboratory of Physical Chemistry of Solid Surfaces, College of
Chemistry and Chemical Engineering, and Innovation Laboratory for
Sciences and Technologies of Energy Materials of Fujian Province (IKKEM), Xiamen University, Xiamen, Fujian 361005, China
- Fujian
Provincial Key Laboratory of Theoretical and Computational Chemistry, Xiamen, Fujian 361005, China
| |
Collapse
|
12
|
Vinod V, Maity S, Zaspel P, Kleinekathöfer U. Multifidelity Machine Learning for Molecular Excitation Energies. J Chem Theory Comput 2023; 19:7658-7670. [PMID: 37862054 DOI: 10.1021/acs.jctc.3c00882] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2023]
Abstract
The accurate but fast calculation of molecular excited states is still a very challenging topic. For many applications, detailed knowledge of the energy funnel in larger molecular aggregates is of key importance, requiring highly accurate excitation energies. To this end, machine learning techniques can be a very useful tool, though the cost of generating highly accurate training data sets still remains a severe challenge. To overcome this hurdle, this work proposes the use of multifidelity machine learning where very little training data from high accuracies is combined with cheaper and less accurate data to achieve the accuracy of the costlier level. In the present study, the approach is employed to predict vertical excitation energies to the first excited state for three molecules of increasing size, namely, benzene, naphthalene, and anthracene. The energies are trained and tested for conformations stemming from classical molecular dynamics and density functional based tight-binding simulations. It can be shown that the multifidelity machine learning model can achieve the same accuracy as a machine learning model built only on high-cost training data while expending a much lower computational effort to generate the data. The numerical gain observed in these benchmark test calculations was over a factor of 30 but certainly can be much higher for high-accuracy data.
Collapse
Affiliation(s)
- Vivin Vinod
- School of Mathematics and Natural Science, University of Wuppertal, Wuppertal 42119, Germany
- School of Computer Science and Engineering, Constructor University, Campus Ring 1, Bremen 28759, Germany
| | - Sayan Maity
- School of Science, Constructor University, Campus Ring 1, Bremen 28759, Germany
| | - Peter Zaspel
- School of Mathematics and Natural Science, University of Wuppertal, Wuppertal 42119, Germany
- School of Computer Science and Engineering, Constructor University, Campus Ring 1, Bremen 28759, Germany
| | | |
Collapse
|
13
|
Liu Y, Guo H. A Gaussian Process Based Δ-Machine Learning Approach to Reactive Potential Energy Surfaces. J Phys Chem A 2023; 127:8765-8772. [PMID: 37815868 DOI: 10.1021/acs.jpca.3c05318] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/12/2023]
Abstract
The Gaussian process (GP) is an efficient nonparametric machine learning (ML) method. A distinct advantage of the GP is its ability to provide an estimate of statistical uncertainties. This is particularly useful in constructing high-dimensional potential energy surfaces (PESs) from ab initio data as it offers an optimal way to add new geometries to reduce the overall error. In this work, GP is employed in the context of Δ-machine learning (Δ-ML), in which a correction PES to an inaccurate low-level PES is constructed using a small number of high-level ab initio calculations. This new method is tested in three prototypical reactive systems, namely, the H + H2 → H + H2, OH + H2 → H2O + H, and H + CH4 → H2 + CH3 reactions. The results show that the GP-based Δ-ML approach is more efficient than its direct application in constructing high-level PESs. We also compare the new method to a previously proposed neural-network-based Δ-ML approach [Liu and Li J. Phys. Chem. Lett. 2022, 13, 4729-4738]. The results indicate that the two Δ-ML methods have comparable efficiencies in constructing accurate PESs.
Collapse
Affiliation(s)
- Yang Liu
- Department of Chemistry and Chemical Biology, University of New Mexico, Albuquerque, New Mexico 87131, United States
| | - Hua Guo
- Department of Chemistry and Chemical Biology, University of New Mexico, Albuquerque, New Mexico 87131, United States
| |
Collapse
|
14
|
Hu F, He F, Yaron DJ. Treating Semiempirical Hamiltonians as Flexible Machine Learning Models Yields Accurate and Interpretable Results. J Chem Theory Comput 2023; 19:6185-6196. [PMID: 37705220 PMCID: PMC10536991 DOI: 10.1021/acs.jctc.3c00491] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2023] [Indexed: 09/15/2023]
Abstract
Quantum chemistry provides chemists with invaluable information, but the high computational cost limits the size and type of systems that can be studied. Machine learning (ML) has emerged as a means to dramatically lower the cost while maintaining high accuracy. However, ML models often sacrifice interpretability by using components such as the artificial neural networks of deep learning that function as black boxes. These components impart the flexibility needed to learn from large volumes of data but make it difficult to gain insight into the physical or chemical basis for the predictions. Here, we demonstrate that semiempirical quantum chemical (SEQC) models can learn from large volumes of data without sacrificing interpretability. The SEQC model is that of density-functional-based tight binding (DFTB) with fixed atomic orbital energies and interactions that are one-dimensional functions of the interatomic distance. This model is trained to ab initio data in a manner that is analogous to that used to train deep learning models. Using benchmarks that reflect the accuracy of the training data, we show that the resulting model maintains a physically reasonable functional form while achieving an accuracy, relative to coupled cluster energies with a complete basis set extrapolation (CCSD(T)*/CBS), that is comparable to that of density functional theory (DFT). This suggests that trained SEQC models can achieve a low computational cost and high accuracy without sacrificing interpretability. Use of a physically motivated model form also substantially reduces the amount of ab initio data needed to train the model compared to that required for deep learning models.
Collapse
Affiliation(s)
- Frank Hu
- Department of Chemistry, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213, United States
| | - Francis He
- Department of Chemistry, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213, United States
| | - David J. Yaron
- Department of Chemistry, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213, United States
| |
Collapse
|
15
|
Yu Q, Qu C, Houston PL, Nandi A, Pandey P, Conte R, Bowman JM. A Status Report on "Gold Standard" Machine-Learned Potentials for Water. J Phys Chem Lett 2023; 14:8077-8087. [PMID: 37656898 PMCID: PMC10510435 DOI: 10.1021/acs.jpclett.3c01791] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2023] [Accepted: 08/28/2023] [Indexed: 09/03/2023]
Abstract
Owing to the central importance of water to life as well as its unusual properties, potentials for water have been the subject of extensive research over the past 50 years. Recently, five potentials based on different machine learning approaches have been reported that are at or near the "gold standard" CCSD(T) level of theory. The development of such high-level potentials enables efficient and accurate simulations of water systems using classical and quantum dynamical approaches. This Perspective serves as a status report of these potentials, focusing on their methodology and applications to water systems across different phases. Their performances on the energies of gas phase water clusters, as well as condensed phase structural and dynamical properties, are discussed.
Collapse
Affiliation(s)
- Qi Yu
- Department
of Chemistry and Cherry L. Emerson Center for Scientific Computation, Emory University, Atlanta, Georgia 30322, United States
| | - Chen Qu
- Independent
Researcher, Toronto, Ontario M9B 0E3, Canada
| | - Paul L. Houston
- Department
of Chemistry and Chemical Biology, Cornell
University, Ithaca, New York 14853, United States
- Department of Chemistry
and Biochemistry, Georgia Institute of Technology, Atlanta, Georgia 30332, United States
| | - Apurba Nandi
- Department
of Chemistry and Cherry L. Emerson Center for Scientific Computation, Emory University, Atlanta, Georgia 30322, United States
- Department
of Physics and Materials Science, University
of Luxembourg, L-1511, Luxembourg City, Luxembourg
| | - Priyanka Pandey
- Department
of Chemistry and Cherry L. Emerson Center for Scientific Computation, Emory University, Atlanta, Georgia 30322, United States
| | - Riccardo Conte
- Dipartimento
di Chimica, Università degli Studi
di Milano, via Golgi 19, 20133 Milano, Italy
| | - Joel M. Bowman
- Department
of Chemistry and Cherry L. Emerson Center for Scientific Computation, Emory University, Atlanta, Georgia 30322, United States
| |
Collapse
|
16
|
Hashem Y, Foust K, Kaledin M, Kaledin AL. Fitting Potential Energy Surfaces by Learning the Charge Density Matrix with Permutationally Invariant Polynomials. J Chem Theory Comput 2023; 19:5690-5700. [PMID: 37561135 PMCID: PMC10501011 DOI: 10.1021/acs.jctc.3c00586] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2023] [Indexed: 08/11/2023]
Abstract
The electronic energy in the Hartree-Fock (HF) theory is the trace of the product of the charge density matrix (CDM) with the one-electron and two-electron matrices represented in an atomic orbital basis, where the two-electron matrix is also a function of the same CDM. In this work, we examine a formalism of analytic representation of a generic molecular potential energy surface (PES) as a sum of a linearly parameterized HF and a correction term, the latter formally representing the electron correlation energy, also linearly parameterized, by expressing the elements of CDM using permutationally invariant polynomials (PIPs). We show on a variety of numerical examples, ranging from exemplary two-electron systems HeH+ and H3+ to the more challenging cases of methanium (CH5+) fragmentation and high-energy tautomerization of formamide to formimidic acid that such a formulation requires significantly fewer, 10-20% of PIPs, to accomplish the same accuracy of the fit as the conventional representation at practically the same computational cost.
Collapse
Affiliation(s)
- Younos Hashem
- Department
of Chemistry & Biochemistry, Kennesaw
State University, 370 Paulding Ave NW, Box # 1203, Kennesaw 30144, Georgia
| | - Katheryn Foust
- Department
of Chemistry & Biochemistry, Kennesaw
State University, 370 Paulding Ave NW, Box # 1203, Kennesaw 30144, Georgia
| | - Martina Kaledin
- Department
of Chemistry & Biochemistry, Kennesaw
State University, 370 Paulding Ave NW, Box # 1203, Kennesaw 30144, Georgia
| | - Alexey L. Kaledin
- Cherry
L. Emerson Center for Scientific Computation and Department of Chemistry, Emory University, 1515 Dickey Drive, Atlanta 30322, Georgia
| |
Collapse
|
17
|
Izvekov S, Rice BM. Hierarchical Machine Learning of Low-Resolution Coarse-Grained Free Energy Potentials. J Chem Theory Comput 2023. [PMID: 37256918 DOI: 10.1021/acs.jctc.3c00128] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/02/2023]
Abstract
A force-matching-based method for supervised machine learning (ML) of coarse-grained (CG) free energy (FE) potentials─known as multiscale coarse-graining via force-matching (MSCG/FM)─is an efficient method to develop microscopically informed CG models that are thermodynamically and statistically equivalent to the reference microscopic models. For low-resolution models, when the coarse-graining is at supramolecular scales, objective-oriented clustering of nonbonded particles is required and the reduced description becomes a function of the clustering algorithm. In the present work, we explore the dependence of the ML of the CG Helmholtz FE potential on the clustering algorithm. We consider coarse-graining based on partitional (k-means, leading to Voronoi diagram) and hierarchical agglomerative (bottom-up) clustering algorithms common in unsupervised ML and develop theory connecting the MSCG/FM learned CG Helmholtz potential and the clustering statistics. By combining the agglomerative clustering and the MSCG/FM learning in a recursive manner, we propose an efficient ML methodology to develop the fine-to-low resolution hierarchies of the CG models. The methodology does not suffer from degrading accuracy or increased computational cost to construct larger hierarchies and as such does not impose an upper size limitation of the CG particles resulting from the extended hierarchies. The utility of the methodology is demonstrated by obtaining the bottom-up agglomerative hierarchy for liquid nitromethane from all-atom molecular dynamics (MD) simulations. For agglomerative hierarchies, we prove the existence of renormalization group transformations that indicate self-similarity and allow for learning the low-resolution MSCG/FM potentials at low computational cost by rescaling and renormalizing the certain finer-resolution members of the hierarchy. The hierarchies of the CG models can be used to carry out simulations under constant-pressure conditions.
Collapse
Affiliation(s)
- Sergei Izvekov
- U.S. Army DEVCOM Army Research Laboratory, Aberdeen Proving Ground, Maryland 21005, United States
| | - Betsy M Rice
- U.S. Army DEVCOM Army Research Laboratory, Aberdeen Proving Ground, Maryland 21005, United States
| |
Collapse
|
18
|
Murakami T, Ibuki S, Hashimoto Y, Kikuma Y, Takayanagi T. Dynamics study of the post-transition-state-bifurcation process of the (HCOOH)H + → CO + H 3O +/HCO + + H 2O dissociation: application of machine-learning techniques. Phys Chem Chem Phys 2023; 25:14016-14027. [PMID: 37161528 DOI: 10.1039/d3cp00252g] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/11/2023]
Abstract
The process of protonated formic acid dissociating from the transition state was studied using ring-polymer molecular dynamics (RPMD), classical MD, and quasi-classical trajectory (QCT) simulations. Temperature had a strong influence on the branching fractions for the HCO+ + H2O and CO + H3O+ dissociation channels. The RPMD and classical MD simulations showed similar behavior, but the QCT dynamics were significantly different owing to the excess energies in the quasi-classical trajectories. Machine-learning analysis identified several key features in the phase information of the vibrational motions at the transition state. We found that the initial configuration and momentum of a hydrogen atom connected to a carbon atom and the shrinking coordinate of the CO bond at the transition state play a role in the dynamics of HCO+ + H2O production.
Collapse
Affiliation(s)
- Tatsuhiro Murakami
- Department of Chemistry, Saitama University, Shimo-Okubo 255, Sakura-ku, Saitama City, Saitama, 338-8570, Japan.
- Department of Materials & Life Sciences, Faculty of Science & Technology, Sophia University, 7-1 Kioicho, Chiyoda-ku, Tokyo, 102-8554, Japan
| | - Shunichi Ibuki
- Department of Chemistry, Saitama University, Shimo-Okubo 255, Sakura-ku, Saitama City, Saitama, 338-8570, Japan.
| | - Yu Hashimoto
- Department of Chemistry, Saitama University, Shimo-Okubo 255, Sakura-ku, Saitama City, Saitama, 338-8570, Japan.
| | - Yuya Kikuma
- Department of Chemistry, Saitama University, Shimo-Okubo 255, Sakura-ku, Saitama City, Saitama, 338-8570, Japan.
| | - Toshiyuki Takayanagi
- Department of Chemistry, Saitama University, Shimo-Okubo 255, Sakura-ku, Saitama City, Saitama, 338-8570, Japan.
| |
Collapse
|
19
|
Bougueroua S, Bricage M, Aboulfath Y, Barth D, Gaigeot MP. Algorithmic Graph Theory, Reinforcement Learning and Game Theory in MD Simulations: From 3D Structures to Topological 2D-Molecular Graphs (2D-MolGraphs) and Vice Versa. Molecules 2023; 28:molecules28072892. [PMID: 37049654 PMCID: PMC10096312 DOI: 10.3390/molecules28072892] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2023] [Revised: 03/17/2023] [Accepted: 03/18/2023] [Indexed: 04/14/2023] Open
Abstract
This paper reviews graph-theory-based methods that were recently developed in our group for post-processing molecular dynamics trajectories. We show that the use of algorithmic graph theory not only provides a direct and fast methodology to identify conformers sampled over time but also allows to follow the interconversions between the conformers through graphs of transitions in time. Examples of gas phase molecules and inhomogeneous aqueous solid interfaces are presented to demonstrate the power of topological 2D graphs and their versatility for post-processing molecular dynamics trajectories. An even more complex challenge is to predict 3D structures from topological 2D graphs. Our first attempts to tackle such a challenge are presented with the development of game theory and reinforcement learning methods for predicting the 3D structure of a gas-phase peptide.
Collapse
Affiliation(s)
- Sana Bougueroua
- Université Paris-Saclay, University Evry, CY Cergy Paris Université, CNRS, LAMBE UMR8587, 91025 Evry-Courcouronnes, France
| | - Marie Bricage
- Université Paris-Saclay, University Versailles Saint Quentin, DAVID, 78000 Versailles, France
| | - Ylène Aboulfath
- Université Paris-Saclay, University Versailles Saint Quentin, DAVID, 78000 Versailles, France
| | - Dominique Barth
- Université Paris-Saclay, University Versailles Saint Quentin, DAVID, 78000 Versailles, France
| | - Marie-Pierre Gaigeot
- Université Paris-Saclay, University Evry, CY Cergy Paris Université, CNRS, LAMBE UMR8587, 91025 Evry-Courcouronnes, France
| |
Collapse
|
20
|
Zaverkin V, Holzmüller D, Bonfirraro L, Kästner J. Transfer learning for chemically accurate interatomic neural network potentials. Phys Chem Chem Phys 2023; 25:5383-5396. [PMID: 36748821 DOI: 10.1039/d2cp05793j] [Citation(s) in RCA: 11] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023]
Abstract
Developing machine learning-based interatomic potentials from ab initio electronic structure methods remains a challenging task for computational chemistry and materials science. This work studies the capability of transfer learning, in particular discriminative fine-tuning, for efficiently generating chemically accurate interatomic neural network potentials on organic molecules from the MD17 and ANI data sets. We show that pre-training the network parameters on data obtained from density functional calculations considerably improves the sample efficiency of models trained on more accurate ab initio data. Additionally, we show that fine-tuning with energy labels alone can suffice to obtain accurate atomic forces and run large-scale atomistic simulations, provided a well-designed fine-tuning data set. We also investigate possible limitations of transfer learning, especially regarding the design and size of the pre-training and fine-tuning data sets. Finally, we provide GM-NN potentials pre-trained and fine-tuned on the ANI-1x and ANI-1ccx data sets, which can easily be fine-tuned on and applied to organic molecules.
Collapse
Affiliation(s)
- Viktor Zaverkin
- Faculty of Chemistry, Institute for Theoretical Chemistry, University of Stuttgart, Germany.
| | - David Holzmüller
- Faculty of Mathematics and Physics, Institute for Stochastics and Applications, University of Stuttgart, Germany.
| | - Luca Bonfirraro
- Faculty of Chemistry, Institute for Theoretical Chemistry, University of Stuttgart, Germany.
| | - Johannes Kästner
- Faculty of Chemistry, Institute for Theoretical Chemistry, University of Stuttgart, Germany.
| |
Collapse
|
21
|
Käser S, Vazquez-Salazar LI, Meuwly M, Töpfer K. Neural network potentials for chemistry: concepts, applications and prospects. DIGITAL DISCOVERY 2023; 2:28-58. [PMID: 36798879 PMCID: PMC9923808 DOI: 10.1039/d2dd00102k] [Citation(s) in RCA: 17] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/23/2022] [Accepted: 12/20/2022] [Indexed: 12/24/2022]
Abstract
Artificial Neural Networks (NN) are already heavily involved in methods and applications for frequent tasks in the field of computational chemistry such as representation of potential energy surfaces (PES) and spectroscopic predictions. This perspective provides an overview of the foundations of neural network-based full-dimensional potential energy surfaces, their architectures, underlying concepts, their representation and applications to chemical systems. Methods for data generation and training procedures for PES construction are discussed and means for error assessment and refinement through transfer learning are presented. A selection of recent results illustrates the latest improvements regarding accuracy of PES representations and system size limitations in dynamics simulations, but also NN application enabling direct prediction of physical results without dynamics simulations. The aim is to provide an overview for the current state-of-the-art NN approaches in computational chemistry and also to point out the current challenges in enhancing reliability and applicability of NN methods on a larger scale.
Collapse
Affiliation(s)
- Silvan Käser
- Department of Chemistry, University of Basel Klingelbergstrasse 80 CH-4056 Basel Switzerland
| | | | - Markus Meuwly
- Department of Chemistry, University of Basel Klingelbergstrasse 80 CH-4056 Basel Switzerland
| | - Kai Töpfer
- Department of Chemistry, University of Basel Klingelbergstrasse 80 CH-4056 Basel Switzerland
| |
Collapse
|
22
|
Bougueroua S, Aboulfath Y, Barth D, Gaigeot MP. Algorithmic graph theory for post-processing molecular dynamics trajectories. Mol Phys 2023. [DOI: 10.1080/00268976.2022.2162456] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/21/2023]
Affiliation(s)
- Sana Bougueroua
- Université Paris-Saclay, Univ Evry, CNRS, LAMBE UMR8587, Evry-Courcouronnes, France
| | - Ylène Aboulfath
- Université Paris-Saclay, Univ Versailles SQ, DAVID, Versailles, France
| | - Dominique Barth
- Université Paris-Saclay, Univ Versailles SQ, DAVID, Versailles, France
| | - Marie-Pierre Gaigeot
- Université Paris-Saclay, Univ Evry, CNRS, LAMBE UMR8587, Evry-Courcouronnes, France
| |
Collapse
|
23
|
Teng C, Huang D, Bao JL. A spur to molecular geometry optimization: Gradient-enhanced universal kriging with on-the-fly adaptive ab initio prior mean functions in curvilinear coordinates. J Chem Phys 2023; 158:024112. [PMID: 36641392 DOI: 10.1063/5.0133675] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022] Open
Abstract
We present a molecular geometry optimization algorithm based on the gradient-enhanced universal kriging (GEUK) formalism with ab initio prior mean functions, which incorporates prior physical knowledge to surrogate-based optimization. In this formalism, we have demonstrated the advantage of allowing the prior mean functions to be adaptive during geometry optimization over a pre-fixed choice of prior functions. Our implementation is general and flexible in two senses. First, the optimizations on the surrogate surface can be in both Cartesian coordinates and curvilinear coordinates. We explore four representative curvilinear coordinates in this work, including the redundant Coulombic coordinates, the redundant internal coordinates, the non-redundant delocalized internal coordinates, and the non-redundant hybrid delocalized internal Z-matrix coordinates. We show that our GEUK optimizer accelerates geometry optimization as compared to conventional non-surrogate-based optimizers in internal coordinates. We further showcase the power of the GEUK with on-the-fly adaptive priors for efficient optimizations of challenging molecules (Criegee intermediates) with a high-accuracy electronic structure method (the coupled-cluster method). Second, we present the usage of internal coordinates under the complete curvilinear scheme. A complete curvilinear scheme performs both surrogate potential-energy surface (PES) fitting and structure optimization entirely in the curvilinear coordinates. Our benchmark indicates that the complete curvilinear scheme significantly reduces the cost of structure minimization on the surrogate compared to the incomplete curvilinear scheme, which fits the surrogate PES in curvilinear coordinates partially and optimizes a structure in Cartesian coordinates through curvilinear coordinates via the chain rule.
Collapse
Affiliation(s)
- Chong Teng
- Department of Chemistry, Boston College, Chestnut Hill, Massachusetts 02467, USA
| | - Daniel Huang
- Department of Computer Science, San Francisco State University, San Francisco, California 94132, USA
| | - Junwei Lucas Bao
- Department of Chemistry, Boston College, Chestnut Hill, Massachusetts 02467, USA
| |
Collapse
|
24
|
Bowman JM, Qu C, Conte R, Nandi A, Houston PL, Yu Q. Δ-Machine Learned Potential Energy Surfaces and Force Fields. J Chem Theory Comput 2023; 19:1-17. [PMID: 36527383 DOI: 10.1021/acs.jctc.2c01034] [Citation(s) in RCA: 25] [Impact Index Per Article: 25.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
Abstract
There has been great progress in developing machine-learned potential energy surfaces (PESs) for molecules and clusters with more than 10 atoms. Unfortunately, this number of atoms generally limits the level of electronic structure theory to less than the "gold standard" CCSD(T) level. Indeed, for the well-known MD17 dataset for molecules with 9-20 atoms, all of the energies and forces were obtained with DFT calculations (PBE). This Perspective is focused on a Δ-machine learning method that we recently proposed and applied to bring DFT-based PESs to close to CCSD(T) accuracy. This is demonstrated for hydronium, N-methylacetamide, acetyl acetone, and ethanol. For 15-atom tropolone, it appears that special approaches (e.g., molecular tailoring, local CCSD(T)) are needed to obtain the CCSD(T) energies. A new aspect of this approach is the extension of Δ-machine learning to force fields. The approach is based on many-body corrections to polarizable force field potentials. This is examined in detail using the TTM2.1 water potential. The corrections make use of our recent CCSD(T) datasets for 2-b, 3-b, and 4-b interactions for water. These datasets were used to develop a new fully ab initio potential for water, termed q-AQUA.
Collapse
Affiliation(s)
- Joel M Bowman
- Department of Chemistry and Cherry L. Emerson Center for Scientific Computation, Emory University, Atlanta, Georgia 30322, United States
| | - Chen Qu
- Independent Researcher, Toronto, Canada 66777
| | - Riccardo Conte
- Dipartimento di Chimica, Università Degli Studi di Milano, via Golgi 19, 20133 Milano, Italy
| | - Apurba Nandi
- Department of Chemistry and Cherry L. Emerson Center for Scientific Computation, Emory University, Atlanta, Georgia 30322, United States
| | - Paul L Houston
- Department of Chemistry and Chemical Biology, Cornell University, Ithaca, New York 14853, United States.,Department of Chemistry and Biochemistry, Georgia Institute of Technology, Atlanta, Georgia 30332, United States
| | - Qi Yu
- Department of Chemistry, Yale University, New Haven, Connecticut 06520, United States
| |
Collapse
|
25
|
Giese TJ, Zeng J, York DM. Multireference Generalization of the Weighted Thermodynamic Perturbation Method. J Phys Chem A 2022; 126:8519-8533. [PMID: 36301936 PMCID: PMC9771595 DOI: 10.1021/acs.jpca.2c06201] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
We describe the generalized weighted thermodynamic perturbation (gwTP) method for estimating the free energy surface of an expensive "high-level" potential energy function from the umbrella sampling performed with multiple inexpensive "low-level" reference potentials. The gwTP method is a generalization of the weighted thermodynamic perturbation (wTP) method developed by Li and co-workers [J. Chem. Theory Comput. 2018, 14, 5583-5596] that uses a single "low-level" reference potential. The gwTP method offers new possibilities in model design whereby the sampling generated from several low-level potentials may be combined (e.g., specific reaction parameter models that might have variable accuracy at different stages of a multistep reaction). The gwTP method is especially well suited for use with machine learning potentials (MLPs) that are trained against computationally expensive ab initio quantum mechanical/molecular mechanical (QM/MM) energies and forces using active learning procedures that naturally produce multiple distinct neural network potentials. Simulations can be performed with greater sampling using the fast MLPs and then corrected to the ab initio level using gwTP. The capabilities of the gwTP method are demonstrated by creating reference potentials based on the MNDO/d and DFTB2/MIO semiempirical models supplemented with the "range-corrected deep potential" (DPRc). The DPRc parameters are trained to ab initio QM/MM data, and the potentials are used to calculate the free energy surface of stepwise mechanisms for nonenzymatic RNA 2'-O-transesterification model reactions. The extended sampling made possible by the reference potentials allows one to identify unequilibrated portions of the simulations that are not always evident from the short time scale commonly used with ab initio QM/MM potentials. We show that the reference potential approach can yield more accurate ab initio free energy predictions than the wTP method or what can be reasonably afforded from explicit ab initio QM/MM sampling.
Collapse
Affiliation(s)
- Timothy J. Giese
- Laboratory for Biomolecular Simulation Research, Institute for Quantitative Biomedicine and Department of Chemistry and Chemical Biology, Rutgers University, Piscataway, NJ 08854, USA
| | - Jinzhe Zeng
- Laboratory for Biomolecular Simulation Research, Institute for Quantitative Biomedicine and Department of Chemistry and Chemical Biology, Rutgers University, Piscataway, NJ 08854, USA
| | - Darrin M. York
- Laboratory for Biomolecular Simulation Research, Institute for Quantitative Biomedicine and Department of Chemistry and Chemical Biology, Rutgers University, Piscataway, NJ 08854, USA
| |
Collapse
|
26
|
Kuntz D, Wilson AK. Machine learning, artificial intelligence, and chemistry: how smart algorithms are reshaping simulation and the laboratory. PURE APPL CHEM 2022. [DOI: 10.1515/pac-2022-0202] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
Abstract
Machine learning and artificial intelligence are increasingly gaining in prominence through image analysis, language processing, and automation, to name a few applications. Machine learning is also making profound changes in chemistry. From revisiting decades-old analytical techniques for the purpose of creating better calibration curves, to assisting and accelerating traditional in silico simulations, to automating entire scientific workflows, to being used as an approach to deduce underlying physics of unexplained chemical phenomena, machine learning and artificial intelligence are reshaping chemistry, accelerating scientific discovery, and yielding new insights. This review provides an overview of machine learning and artificial intelligence from a chemist’s perspective and focuses on a number of examples of the use of these approaches in computational chemistry and in the laboratory.
Collapse
Affiliation(s)
- David Kuntz
- Department of Chemistry , University of North Texas , Denton , TX 76201 , USA
| | - Angela K. Wilson
- Department of Chemistry , Michigan State University , East Lansing , MI 48824 , USA
| |
Collapse
|
27
|
Young TA, Johnston-Wood T, Zhang H, Duarte F. Reaction dynamics of Diels-Alder reactions from machine learned potentials. Phys Chem Chem Phys 2022; 24:20820-20827. [PMID: 36004770 DOI: 10.1039/d2cp02978b] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
Recent advances in the development of reactive machine-learned potentials (MLPs) promise to transform reaction modelling. However, such methods have remained computationally expensive and limited to experts. Here, we employ different MLP methods (ACE, NequIP, GAP), combined with automated fitting and active learning, to study the reaction dynamics of representative Diels-Alder reactions. We demonstrate that the ACE and NequIP MLPs can consistently achieve chemical accuracy (±1 kcal mol-1) to the ground-truth surface with only a few hundred reference calculations. These strategies are shown to enable routine ab initio-quality classical and quantum dynamics, and obtain dynamical quantities such as product ratios and free energies from non-static methods. For ambimodal reactions, product distributions were found to be strongly dependent on the QM method and less so on the type of dynamics propagated.
Collapse
Affiliation(s)
- Tom A Young
- Chemistry Research Laboratory, 12 Mansfield Road, Oxford, OX1 3TA, UK.
| | | | - Hanwen Zhang
- Chemistry Research Laboratory, 12 Mansfield Road, Oxford, OX1 3TA, UK.
| | - Fernanda Duarte
- Chemistry Research Laboratory, 12 Mansfield Road, Oxford, OX1 3TA, UK.
| |
Collapse
|
28
|
Teng C, Wang Y, Huang D, Martin K, Tristan JB, Bao JL. Dual-Level Training of Gaussian Processes with Physically Inspired Priors for Geometry Optimizations. J Chem Theory Comput 2022; 18:5739-5754. [PMID: 35939760 DOI: 10.1021/acs.jctc.2c00546] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Gaussian process (GP) regression has been recently developed as an effective method in molecular geometry optimization. The prior mean function is one of the crucial parts of the GP. We design and validate two types of physically inspired prior mean functions: force-field-based priors and posterior-type priors. In this work, we implement a dual-level training (DLT) optimizer for the posterior-type priors. The DLT optimizers can be considered as a class of optimization algorithms that belong to the delta-machine learning paradigm but with several major differences compared to the previously proposed algorithms in the same paradigm. In the first level of the DLT, we incorporate the classical mechanical descriptions of the equilibrium geometries into the prior function, which enhances the performance of the GP optimizer as compared to the one using a constant (or zero) prior. In the second level, we utilize the surrogate potential energy surfaces (PESs), which incorporate the physics learned in the first-level training, as the prior function to refine the model performance further. We find that the force-field-based priors and posterior-type priors reduce the overall optimization steps by a factor of 2-3 when compared to the limited-memory Broyden-Fletcher-Goldfarb-Shanno (L-BFGS) optimizer as well as the constant-prior GP optimizer proposed in previous works. We also demonstrate the potential of recovering the real PESs with GP with a force-field prior. This work shows the importance of including domain knowledge as an ingredient in the GP, which offers a potentially robust learning model for molecular geometry optimization and for exploring molecular PESs.
Collapse
Affiliation(s)
- Chong Teng
- Department of Chemistry, Boston College, Chestnut Hill, Massachusetts 02467, United States
| | - Yang Wang
- Department of Chemistry, Boston College, Chestnut Hill, Massachusetts 02467, United States
| | - Daniel Huang
- Department of Computer Science, San Francisco State University, San Francisco, California 94132, United States
| | - Katherine Martin
- Department of Chemistry, Boston College, Chestnut Hill, Massachusetts 02467, United States
| | - Jean-Baptiste Tristan
- Department of Computer Science, Boston College, Chestnut Hill, Massachusetts 02467, United States
| | - Junwei Lucas Bao
- Department of Chemistry, Boston College, Chestnut Hill, Massachusetts 02467, United States
| |
Collapse
|
29
|
Song Z, Trozzi F, Tian H, Yin C, Tao P. Mechanistic Insights into Enzyme Catalysis from Explaining Machine-Learned Quantum Mechanical and Molecular Mechanical Minimum Energy Pathways. ACS PHYSICAL CHEMISTRY AU 2022; 2:316-330. [PMID: 35936506 PMCID: PMC9344433 DOI: 10.1021/acsphyschemau.2c00005] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/20/2023]
Abstract
With the increasing popularity of machine learning (ML) applications, the demand for explainable artificial intelligence techniques to explain ML models developed for computational chemistry has also emerged. In this study, we present the development of the Boltzmann-weighted cumulative integrated gradients (BCIG) approach for effective explanation of mechanistic insights into ML models trained on high-level quantum mechanical and molecular mechanical (QM/MM) minimum energy pathways. Using the acylation reactions of the Toho-1 β-lactamase and two antibiotics (ampicillin and cefalexin) as the model systems, we show that the BCIG approach could quantitatively attribute the energetic contribution in one system and the relative reactivity of individual steps across different systems to specific chemical processes such as the bond making/breaking and proton transfers. The proposed BCIG contribution attribution method quantifies chemistry-interpretable insights in terms of contributions from each elementary chemical process, which is in agreement with the validating QM/MM calculations and our intuitive mechanistic understandings of the model reactions.
Collapse
|
30
|
Habershon S. Program Synthesis of Sparse Algorithms for Wave Function and Energy Prediction in Grid-Based Quantum Simulations. J Chem Theory Comput 2022; 18:2462-2478. [PMID: 35293216 PMCID: PMC9009083 DOI: 10.1021/acs.jctc.2c00035] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
We have recently shown how program synthesis (PS), or the concept of "self-writing code", can generate novel algorithms that solve the vibrational Schrödinger equation, providing approximations to the allowed wave functions for bound, one-dimensional (1-D) potential energy surfaces (PESs). The resulting algorithms use a grid-based representation of the underlying wave function ψ(x) and PES V(x), providing codes which represent approximations to standard discrete variable representation (DVR) methods. In this Article, we show how this inductive PS strategy can be improved and modified to enable prediction of both vibrational wave functions and energy eigenvalues of representative model PESs (both 1-D and multidimensional). We show that PS can generate algorithms that offer some improvements in energy eigenvalue accuracy over standard DVR schemes; however, we also demonstrate that PS can identify accurate numerical methods that exhibit desirable computational features, such as employing very sparse (tridiagonal) matrices. The resulting PS-generated algorithms are initially developed and tested for 1-D vibrational eigenproblems, before solution of multidimensional problems is demonstrated; we find that our new PS-generated algorithms can reduce calculation times for grid-based eigenvector computation by an order of magnitude or more. More generally, with further development and optimization, we anticipate that PS-generated algorithms based on effective Hamiltonian approximations, such as those proposed here, could be useful in direct simulations of quantum dynamics via wave function propagation and evaluation of molecular electronic structure.
Collapse
Affiliation(s)
- Scott Habershon
- Department of Chemistry, University of Warwick, Coventry, CV4 7AL, United Kingdom
| |
Collapse
|
31
|
Zhang L, Zhang S, Owens A, Yurchenko SN, Dral PO. VIB5 database with accurate ab initio quantum chemical molecular potential energy surfaces. Sci Data 2022; 9:84. [PMID: 35277513 PMCID: PMC8917215 DOI: 10.1038/s41597-022-01185-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2021] [Accepted: 01/19/2022] [Indexed: 11/09/2022] Open
Abstract
High-level ab initio quantum chemical (QC) molecular potential energy surfaces (PESs) are crucial for accurately simulating molecular rotation-vibration spectra. Machine learning (ML) can help alleviate the cost of constructing such PESs, but requires access to the original ab initio PES data, namely potential energies computed on high-density grids of nuclear geometries. In this work, we present a new structured PES database called VIB5, which contains high-quality ab initio data on 5 small polyatomic molecules of astrophysical significance (CH3Cl, CH4, SiH4, CH3F, and NaOH). The VIB5 database is based on previously used PESs, which, however, are either publicly unavailable or lacking key information to make them suitable for ML applications. The VIB5 database provides tens of thousands of grid points for each molecule with theoretical best estimates of potential energies along with their constituent energy correction terms and a data-extraction script. In addition, new complementary QC calculations of energies and energy gradients have been performed to provide a consistent database, which, e.g., can be used for gradient-based ML methods. Measurement(s) | potential energy surfaces | Technology Type(s) | quantum chemistry computational methods |
Collapse
Affiliation(s)
- Lina Zhang
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, Department of Chemistry, and College of Chemistry and Chemical Engineering, Xiamen University, Xiamen, 361005, China
| | - Shuang Zhang
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, Department of Chemistry, and College of Chemistry and Chemical Engineering, Xiamen University, Xiamen, 361005, China
| | - Alec Owens
- Department of Physics and Astronomy, University College London, Gower Street, WC1E 6BT, London, United Kingdom.
| | - Sergei N Yurchenko
- Department of Physics and Astronomy, University College London, Gower Street, WC1E 6BT, London, United Kingdom
| | - Pavlo O Dral
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, Department of Chemistry, and College of Chemistry and Chemical Engineering, Xiamen University, Xiamen, 361005, China.
| |
Collapse
|
32
|
Fan J, Lan H, Ning W, Zhong R, Chen F, Yan G, Cai K. Modeling amide-I vibrations of alanine dipeptide in solution by using neural network protocol. SPECTROCHIMICA ACTA. PART A, MOLECULAR AND BIOMOLECULAR SPECTROSCOPY 2022; 268:120675. [PMID: 34890871 DOI: 10.1016/j.saa.2021.120675] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/05/2021] [Revised: 10/27/2021] [Accepted: 11/26/2021] [Indexed: 06/13/2023]
Abstract
Infrared spectroscopy is a powerful tool for the understanding of molecular structure and function of polypeptides. Theoretical interpretation of IR spectra relies on ab initio calculations may be very costly in computational resources. Herein, we developed a neural network (NN) modeling protocol to evaluate a model dipeptide's backbone amide-I spectra. DFT calculations were performed for the amide-I vibrational motions and structural parameters of alanine dipeptide (ALAD) conformers in different micro-environments ranging from polar to non-polar ones. The obtained backbone dihedrals, C = O bond lengths and amide-I frequencies of ALAD were gather together for NN architecture. The applications of built NN protocols for the prediction of amide-I frequencies of ALAD in other solvation conditions are quite satisfactory with much less computational cost comparing with electronic structure calculations. The results show that this cost-effective way enables us to decipher the polypeptide's dynamic secondary structures and biological functions with their backbone vibrational probes.
Collapse
Affiliation(s)
- Jianping Fan
- College of Chemistry and Materials Science, Fujian Provincial Key Laboratory of Advanced Materials Oriented Chemical Engineering, Fujian Normal University, Fuzhou 350007, PR China; Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, Xiamen 361005, PR China; Fujian Provincial Key Laboratory of Featured Biochemical and Chemical Materials, Ningde Normal University, Ningde 352100, PR China
| | - Huaying Lan
- College of Chemistry and Materials Science, Fujian Provincial Key Laboratory of Advanced Materials Oriented Chemical Engineering, Fujian Normal University, Fuzhou 350007, PR China; Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, Xiamen 361005, PR China
| | - Wenfeng Ning
- College of Chemistry and Materials Science, Fujian Provincial Key Laboratory of Advanced Materials Oriented Chemical Engineering, Fujian Normal University, Fuzhou 350007, PR China; Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, Xiamen 361005, PR China
| | - Rongzhen Zhong
- College of Chemistry and Materials Science, Fujian Provincial Key Laboratory of Advanced Materials Oriented Chemical Engineering, Fujian Normal University, Fuzhou 350007, PR China; Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, Xiamen 361005, PR China
| | - Feng Chen
- Fujian Provincial Key Laboratory of Featured Biochemical and Chemical Materials, Ningde Normal University, Ningde 352100, PR China
| | - Guiyang Yan
- Fujian Provincial Key Laboratory of Featured Biochemical and Chemical Materials, Ningde Normal University, Ningde 352100, PR China
| | - Kaicong Cai
- College of Chemistry and Materials Science, Fujian Provincial Key Laboratory of Advanced Materials Oriented Chemical Engineering, Fujian Normal University, Fuzhou 350007, PR China; Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, Xiamen 361005, PR China; Fujian Provincial Key Laboratory of Featured Biochemical and Chemical Materials, Ningde Normal University, Ningde 352100, PR China
| |
Collapse
|
33
|
Toward accurate and efficient dynamic computational strategy for heterogeneous catalysis: Temperature-dependent thermodynamics and kinetics for the chemisorbed on-surface CO. CHINESE CHEM LETT 2022. [DOI: 10.1016/j.cclet.2022.03.080] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
|
34
|
Baiardi A, Grimmel SA, Steiner M, Türtscher PL, Unsleber JP, Weymuth T, Reiher M. Expansive Quantum Mechanical Exploration of Chemical Reaction Paths. Acc Chem Res 2022; 55:35-43. [PMID: 34918903 DOI: 10.1021/acs.accounts.1c00472] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
Quantum mechanical methods have been well-established for the elucidation of reaction paths of chemical processes and for the explicit dynamics of molecular systems. While they are usually deployed in routine manual calculations on reactions for which some insights are already available (typically from experiment), new algorithms and continuously increasing capabilities of modern computer hardware allow for exploratory open-ended computational campaigns that are unbiased and therefore enable unexpected discoveries. Highly efficient and even automated procedures facilitate systematic approaches toward the exploration of uncharted territory in molecular transformations and dynamics. In this work, we elaborate on such explorative approaches that range from reaction network explorations with (stationary) quantum chemical methods to explorative molecular dynamics and migrant wave packet dynamics. The focus is on recent developments that cover the following strategies. (i) Pruning search options for elementary reaction steps by heuristic rules based on the first-principles of quantum mechanics: Rules are required for reducing the combinatorial explosion of potentially reactive atom pairings, and rooting them in concepts derived from the electronic wave function makes them applicable to any molecular system. (ii) Enforcing reactive events by external biases: Inducing a reaction requires constraints that steer and direct elementary-step searches, which can be formulated in terms of forces, velocities, or supplementary potentials. (iii) Manual steering facilitated by interactive quantum mechanics: As ultrafast quantum chemical methods allow for real-time manual interactions with molecular systems, human-intuition-guided paths can be easily explored with suitable human-machine interfaces. (iv) New approaches for transition-state optimization with continuous curve representations can provide stable schemes to be driven in an automated way by allowing for an efficient tuning of the curve's parameters (instead of a manipulation of a collection of structures along the path), and (v) reactive molecular dynamics and direct wave packet propagation exploit the equations of motion of an underlying mechanical theory (usually, classical Newtonian mechanics or Schrödinger quantum mechanics). Explorative approaches are likely to replace the current state of the art in computational chemistry, because they reduce the human effort to be invested in reaction path elucidations, they are less prone to errors and bias-free, and they cover more extensive regions of the relevant configuration space. As a result, computational investigations that rely on these techniques are more likely to deliver surprising discoveries.
Collapse
Affiliation(s)
- Alberto Baiardi
- Laboratory of Physical Chemistry, ETH Zurich, Vladimir-Prelog-Weg 2, 8093 Zurich, Switzerland
| | - Stephanie A. Grimmel
- Laboratory of Physical Chemistry, ETH Zurich, Vladimir-Prelog-Weg 2, 8093 Zurich, Switzerland
| | - Miguel Steiner
- Laboratory of Physical Chemistry, ETH Zurich, Vladimir-Prelog-Weg 2, 8093 Zurich, Switzerland
| | - Paul L. Türtscher
- Laboratory of Physical Chemistry, ETH Zurich, Vladimir-Prelog-Weg 2, 8093 Zurich, Switzerland
| | - Jan P. Unsleber
- Laboratory of Physical Chemistry, ETH Zurich, Vladimir-Prelog-Weg 2, 8093 Zurich, Switzerland
| | - Thomas Weymuth
- Laboratory of Physical Chemistry, ETH Zurich, Vladimir-Prelog-Weg 2, 8093 Zurich, Switzerland
| | - Markus Reiher
- Laboratory of Physical Chemistry, ETH Zurich, Vladimir-Prelog-Weg 2, 8093 Zurich, Switzerland
| |
Collapse
|
35
|
Abstract
In the past two decades, machine learning potentials (MLPs) have reached a level of maturity that now enables applications to large-scale atomistic simulations of a wide range of systems in chemistry, physics, and materials science. Different machine learning algorithms have been used with great success in the construction of these MLPs. In this review, we discuss an important group of MLPs relying on artificial neural networks to establish a mapping from the atomic structure to the potential energy. In spite of this common feature, there are important conceptual differences among MLPs, which concern the dimensionality of the systems, the inclusion of long-range electrostatic interactions, global phenomena like nonlocal charge transfer, and the type of descriptor used to represent the atomic structure, which can be either predefined or learnable. A concise overview is given along with a discussion of the open challenges in the field. Expected final online publication date for the Annual Review of Physical Chemistry, Volume 73 is April 2022. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.
Collapse
Affiliation(s)
- Emir Kocer
- Institut für Physikalische Chemie, Theoretische Chemie, Universität Göttingen, Göttingen, Germany;, ,
| | - Tsz Wai Ko
- Institut für Physikalische Chemie, Theoretische Chemie, Universität Göttingen, Göttingen, Germany;, ,
| | - Jörg Behler
- Institut für Physikalische Chemie, Theoretische Chemie, Universität Göttingen, Göttingen, Germany;, ,
| |
Collapse
|
36
|
Gaigeot MP. Some opinions on MD-based vibrational spectroscopy of gas phase molecules and their assembly: An overview of what has been achieved and where to go. SPECTROCHIMICA ACTA. PART A, MOLECULAR AND BIOMOLECULAR SPECTROSCOPY 2021; 260:119864. [PMID: 34052762 DOI: 10.1016/j.saa.2021.119864] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/14/2021] [Revised: 04/13/2021] [Accepted: 04/18/2021] [Indexed: 06/12/2023]
Abstract
We hereby review molecular dynamics simulations for anharmonic gas phase spectroscopy and provide some of our opinions of where the field is heading. With these new directions, the theoretical IR/Raman spectroscopy of large (bio)-molecular systems will be more easily achievable over longer time-scale MD trajectories for an increase in accuracy of the MD-IR and MD-Raman calculated spectra. With the new directions presented here, the high throughput 'decoding' of experimental IR/Raman spectra into 3D-structures should thus be possible, hence advancing e.g. the field of MS-IR for structural characterization by spectroscopy. We also review the assignment of vibrational spectra in terms of anharmonic molecular modes from the MD trajectories, and especially introduce our recent developments based on Graph Theory algorithms. Graph Theory algorithmic is also introduced in this review for the identification of the molecular 3D-structures sampled over MD trajectories.
Collapse
Affiliation(s)
- Marie-Pierre Gaigeot
- Université Paris-Saclay, Univ Evry, CNRS, LAMBE UMR8587, 91025 Evry-Courcouronnes, France.
| |
Collapse
|
37
|
Abstract
We demonstrate that a program synthesis approach based on a linear code representation can be used to generate algorithms that approximate the ground-state solutions of one-dimensional time-independent Schrödinger equations constructed with bound polynomial potential energy surfaces (PESs). Here, an algorithm is constructed as a linear series of instructions operating on a set of input vectors, matrices, and constants that define the problem characteristics, such as the PES. Discrete optimization is performed using simulated annealing in order to identify sequences of code-lines, operating on the program inputs that can reproduce the expected ground-state wavefunctions ψ(x) for a set of target PESs. The outcome of this optimization is not simply a mathematical function approximating ψ(x) but is, instead, a complete algorithm that converts the input vectors describing the system into a ground-state solution of the Schrödinger equation. These initial results point the way toward an alternative route for developing novel algorithms for quantum chemistry applications.
Collapse
Affiliation(s)
- Scott Habershon
- Department of Chemistry, University of Warwick, Coventry CV4 7AL, United Kingdom
| |
Collapse
|
38
|
Saleh Y, Sanjay V, Iske A, Yachmenev A, Küpper J. Active learning of potential-energy surfaces of weakly bound complexes with regression-tree ensembles. J Chem Phys 2021; 155:144109. [PMID: 34654290 DOI: 10.1063/5.0057051] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Several pool-based active learning (AL) algorithms were employed to model potential-energy surfaces (PESs) with a minimum number of electronic structure calculations. Theoretical and empirical results suggest that superior strategies can be obtained by sampling molecular structures corresponding to large uncertainties in their predictions while at the same time not deviating much from the true distribution of the data. To model PESs in an AL framework, we propose to use a regression version of stochastic query by forest, a hybrid method that samples points corresponding to large uncertainties while avoiding collecting too many points from sparse regions of space. The algorithm is implemented with decision trees that come with relatively small computational costs. We empirically show that this algorithm requires around half the data to converge to the same accuracy in comparison to the uncertainty-based query-by-committee algorithm. Moreover, the algorithm is fully automatic and does not require any prior knowledge of the PES. Simulations on a 6D PES of pyrrole(H2O) show that <15 000 configurations are enough to build a PES with a generalization error of 16 cm-1, whereas the final model with around 50 000 configurations has a generalization error of 11 cm-1.
Collapse
Affiliation(s)
- Yahya Saleh
- Center for Free-Electron Laser Science CFEL, Deutsches Elektronen-Synchrotron DESY, Notkestraße 85, 22607 Hamburg, Germany
| | - Vishnu Sanjay
- Center for Free-Electron Laser Science CFEL, Deutsches Elektronen-Synchrotron DESY, Notkestraße 85, 22607 Hamburg, Germany
| | - Armin Iske
- Department of Mathematics, Universität Hamburg, Bundesstraße 55, 20146 Hamburg, Germany
| | - Andrey Yachmenev
- Center for Free-Electron Laser Science CFEL, Deutsches Elektronen-Synchrotron DESY, Notkestraße 85, 22607 Hamburg, Germany
| | - Jochen Küpper
- Center for Free-Electron Laser Science CFEL, Deutsches Elektronen-Synchrotron DESY, Notkestraße 85, 22607 Hamburg, Germany
| |
Collapse
|
39
|
Deringer VL, Bartók AP, Bernstein N, Wilkins DM, Ceriotti M, Csányi G. Gaussian Process Regression for Materials and Molecules. Chem Rev 2021; 121:10073-10141. [PMID: 34398616 PMCID: PMC8391963 DOI: 10.1021/acs.chemrev.1c00022] [Citation(s) in RCA: 249] [Impact Index Per Article: 83.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2021] [Indexed: 12/18/2022]
Abstract
We provide an introduction to Gaussian process regression (GPR) machine-learning methods in computational materials science and chemistry. The focus of the present review is on the regression of atomistic properties: in particular, on the construction of interatomic potentials, or force fields, in the Gaussian Approximation Potential (GAP) framework; beyond this, we also discuss the fitting of arbitrary scalar, vectorial, and tensorial quantities. Methodological aspects of reference data generation, representation, and regression, as well as the question of how a data-driven model may be validated, are reviewed and critically discussed. A survey of applications to a variety of research questions in chemistry and materials science illustrates the rapid growth in the field. A vision is outlined for the development of the methodology in the years to come.
Collapse
Affiliation(s)
- Volker L. Deringer
- Department
of Chemistry, Inorganic Chemistry Laboratory, University of Oxford, Oxford OX1 3QR, United Kingdom
| | - Albert P. Bartók
- Department
of Physics and Warwick Centre for Predictive Modelling, School of
Engineering, University of Warwick, Coventry CV4 7AL, United Kingdom
| | - Noam Bernstein
- Center
for Computational Materials Science, U.S.
Naval Research Laboratory, Washington D.C. 20375, United States
| | - David M. Wilkins
- Atomistic
Simulation Centre, School of Mathematics and Physics, Queen’s University Belfast, Belfast BT7 1NN, Northern Ireland, United Kingdom
| | - Michele Ceriotti
- Laboratory
of Computational Science and Modeling, IMX, École Polytechnique Fédérale de Lausanne, Lausanne 1015, Switzerland
- National
Centre for Computational Design and Discovery of Novel Materials (MARVEL), École Polytechnique Fédérale
de Lausanne, Lausanne, Switzerland
| | - Gábor Csányi
- Engineering
Laboratory, University of Cambridge, Cambridge CB2 1PZ, United Kingdom
| |
Collapse
|
40
|
Westermayr J, Marquetand P. Machine Learning for Electronically Excited States of Molecules. Chem Rev 2021; 121:9873-9926. [PMID: 33211478 PMCID: PMC8391943 DOI: 10.1021/acs.chemrev.0c00749] [Citation(s) in RCA: 173] [Impact Index Per Article: 57.7] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2020] [Indexed: 12/11/2022]
Abstract
Electronically excited states of molecules are at the heart of photochemistry, photophysics, as well as photobiology and also play a role in material science. Their theoretical description requires highly accurate quantum chemical calculations, which are computationally expensive. In this review, we focus on not only how machine learning is employed to speed up such excited-state simulations but also how this branch of artificial intelligence can be used to advance this exciting research field in all its aspects. Discussed applications of machine learning for excited states include excited-state dynamics simulations, static calculations of absorption spectra, as well as many others. In order to put these studies into context, we discuss the promises and pitfalls of the involved machine learning techniques. Since the latter are mostly based on quantum chemistry calculations, we also provide a short introduction into excited-state electronic structure methods and approaches for nonadiabatic dynamics simulations and describe tricks and problems when using them in machine learning for excited states of molecules.
Collapse
Affiliation(s)
- Julia Westermayr
- Institute
of Theoretical Chemistry, Faculty of Chemistry, University of Vienna, Währinger Strasse 17, 1090 Vienna, Austria
| | - Philipp Marquetand
- Institute
of Theoretical Chemistry, Faculty of Chemistry, University of Vienna, Währinger Strasse 17, 1090 Vienna, Austria
- Vienna
Research Platform on Accelerating Photoreaction Discovery, University of Vienna, Währinger Strasse 17, 1090 Vienna, Austria
- Data
Science @ Uni Vienna, University of Vienna, Währinger Strasse 29, 1090 Vienna, Austria
| |
Collapse
|
41
|
Abstract
Electronically excited states of molecules are at the heart of photochemistry, photophysics, as well as photobiology and also play a role in material science. Their theoretical description requires highly accurate quantum chemical calculations, which are computationally expensive. In this review, we focus on not only how machine learning is employed to speed up such excited-state simulations but also how this branch of artificial intelligence can be used to advance this exciting research field in all its aspects. Discussed applications of machine learning for excited states include excited-state dynamics simulations, static calculations of absorption spectra, as well as many others. In order to put these studies into context, we discuss the promises and pitfalls of the involved machine learning techniques. Since the latter are mostly based on quantum chemistry calculations, we also provide a short introduction into excited-state electronic structure methods and approaches for nonadiabatic dynamics simulations and describe tricks and problems when using them in machine learning for excited states of molecules.
Collapse
Affiliation(s)
- Julia Westermayr
- Institute of Theoretical Chemistry, Faculty of Chemistry, University of Vienna, Währinger Strasse 17, 1090 Vienna, Austria
| | - Philipp Marquetand
- Institute of Theoretical Chemistry, Faculty of Chemistry, University of Vienna, Währinger Strasse 17, 1090 Vienna, Austria
- Vienna Research Platform on Accelerating Photoreaction Discovery, University of Vienna, Währinger Strasse 17, 1090 Vienna, Austria
- Data Science @ Uni Vienna, University of Vienna, Währinger Strasse 29, 1090 Vienna, Austria
| |
Collapse
|
42
|
Westermayr J, Maurer RJ. Physically inspired deep learning of molecular excitations and photoemission spectra. Chem Sci 2021; 12:10755-10764. [PMID: 34447563 PMCID: PMC8372319 DOI: 10.1039/d1sc01542g] [Citation(s) in RCA: 20] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2021] [Accepted: 06/29/2021] [Indexed: 12/29/2022] Open
Abstract
Modern functional materials consist of large molecular building blocks with significant chemical complexity which limits spectroscopic property prediction with accurate first-principles methods. Consequently, a targeted design of materials with tailored optoelectronic properties by high-throughput screening is bound to fail without efficient methods to predict molecular excited-state properties across chemical space. In this work, we present a deep neural network that predicts charged quasiparticle excitations for large and complex organic molecules with a rich elemental diversity and a size well out of reach of accurate many body perturbation theory calculations. The model exploits the fundamental underlying physics of molecular resonances as eigenvalues of a latent Hamiltonian matrix and is thus able to accurately describe multiple resonances simultaneously. The performance of this model is demonstrated for a range of organic molecules across chemical composition space and configuration space. We further showcase the model capabilities by predicting photoemission spectra at the level of the GW approximation for previously unseen conjugated molecules.
Collapse
Affiliation(s)
- Julia Westermayr
- Department of Chemistry, University of Warwick Gibbet Hill Road Coventry CV4 7AL UK
| | - Reinhard J Maurer
- Department of Chemistry, University of Warwick Gibbet Hill Road Coventry CV4 7AL UK
| |
Collapse
|
43
|
Young TA, Johnston-Wood T, Deringer VL, Duarte F. A transferable active-learning strategy for reactive molecular force fields. Chem Sci 2021; 12:10944-10955. [PMID: 34476072 PMCID: PMC8372546 DOI: 10.1039/d1sc01825f] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2021] [Accepted: 07/04/2021] [Indexed: 11/25/2022] Open
Abstract
Predictive molecular simulations require fast, accurate and reactive interatomic potentials. Machine learning offers a promising approach to construct such potentials by fitting energies and forces to high-level quantum-mechanical data, but doing so typically requires considerable human intervention and data volume. Here we show that, by leveraging hierarchical and active learning, accurate Gaussian Approximation Potential (GAP) models can be developed for diverse chemical systems in an autonomous manner, requiring only hundreds to a few thousand energy and gradient evaluations on a reference potential-energy surface. The approach uses separate intra- and inter-molecular fits and employs a prospective error metric to assess the accuracy of the potentials. We demonstrate applications to a range of molecular systems with relevance to computational organic chemistry: ranging from bulk solvents, a solvated metal ion and a metallocage onwards to chemical reactivity, including a bifurcating Diels-Alder reaction in the gas phase and non-equilibrium dynamics (a model SN2 reaction) in explicit solvent. The method provides a route to routinely generating machine-learned force fields for reactive molecular systems.
Collapse
Affiliation(s)
- Tom A Young
- Chemistry Research Laboratory, University of Oxford Mansfield Road Oxford OX1 3TA UK
| | - Tristan Johnston-Wood
- Chemistry Research Laboratory, University of Oxford Mansfield Road Oxford OX1 3TA UK
| | - Volker L Deringer
- Department of Chemistry, Inorganic Chemistry Laboratory, University of Oxford Oxford OX1 3QR UK
| | - Fernanda Duarte
- Chemistry Research Laboratory, University of Oxford Mansfield Road Oxford OX1 3TA UK
| |
Collapse
|
44
|
Westermayr J, Gastegger M, Schütt KT, Maurer RJ. Perspective on integrating machine learning into computational chemistry and materials science. J Chem Phys 2021; 154:230903. [PMID: 34241249 DOI: 10.1063/5.0047760] [Citation(s) in RCA: 67] [Impact Index Per Article: 22.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/12/2023] Open
Abstract
Machine learning (ML) methods are being used in almost every conceivable area of electronic structure theory and molecular simulation. In particular, ML has become firmly established in the construction of high-dimensional interatomic potentials. Not a day goes by without another proof of principle being published on how ML methods can represent and predict quantum mechanical properties-be they observable, such as molecular polarizabilities, or not, such as atomic charges. As ML is becoming pervasive in electronic structure theory and molecular simulation, we provide an overview of how atomistic computational modeling is being transformed by the incorporation of ML approaches. From the perspective of the practitioner in the field, we assess how common workflows to predict structure, dynamics, and spectroscopy are affected by ML. Finally, we discuss how a tighter and lasting integration of ML methods with computational chemistry and materials science can be achieved and what it will mean for research practice, software development, and postgraduate training.
Collapse
Affiliation(s)
- Julia Westermayr
- Department of Chemistry, University of Warwick, Gibbet Hill Road, Coventry CV4 7AL, United Kingdom
| | - Michael Gastegger
- Machine Learning Group, Technische Universität Berlin, 10587 Berlin, Germany
| | - Kristof T Schütt
- Machine Learning Group, Technische Universität Berlin, 10587 Berlin, Germany
| | - Reinhard J Maurer
- Department of Chemistry, University of Warwick, Gibbet Hill Road, Coventry CV4 7AL, United Kingdom
| |
Collapse
|
45
|
Dral PO, Ge F, Xue BX, Hou YF, Pinheiro M, Huang J, Barbatti M. MLatom 2: An Integrative Platform for Atomistic Machine Learning. Top Curr Chem (Cham) 2021; 379:27. [PMID: 34101036 PMCID: PMC8187220 DOI: 10.1007/s41061-021-00339-5] [Citation(s) in RCA: 28] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2021] [Accepted: 05/07/2021] [Indexed: 11/24/2022]
Abstract
Atomistic machine learning (AML) simulations are used in chemistry at an ever-increasing pace. A large number of AML models has been developed, but their implementations are scattered among different packages, each with its own conventions for input and output. Thus, here we give an overview of our MLatom 2 software package, which provides an integrative platform for a wide variety of AML simulations by implementing from scratch and interfacing existing software for a range of state-of-the-art models. These include kernel method-based model types such as KREG (native implementation), sGDML, and GAP-SOAP as well as neural-network-based model types such as ANI, DeepPot-SE, and PhysNet. The theoretical foundations behind these methods are overviewed too. The modular structure of MLatom allows for easy extension to more AML model types. MLatom 2 also has many other capabilities useful for AML simulations, such as the support of custom descriptors, farthest-point and structure-based sampling, hyperparameter optimization, model evaluation, and automatic learning curve generation. It can also be used for such multi-step tasks as Δ-learning, self-correction approaches, and absorption spectrum simulation within the machine-learning nuclear-ensemble approach. Several of these MLatom 2 capabilities are showcased in application examples.
Collapse
Affiliation(s)
- Pavlo O Dral
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, Xiamen, 361005, China.
- Department of Chemistry, and College of Chemistry and Chemical Engineering, Xiamen University, Xiamen, 361005, China.
| | - Fuchun Ge
- Department of Chemistry, and College of Chemistry and Chemical Engineering, Xiamen University, Xiamen, 361005, China
| | - Bao-Xin Xue
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, Xiamen, 361005, China
- Department of Chemistry, and College of Chemistry and Chemical Engineering, Xiamen University, Xiamen, 361005, China
| | - Yi-Fan Hou
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, Xiamen, 361005, China
- Department of Chemistry, and College of Chemistry and Chemical Engineering, Xiamen University, Xiamen, 361005, China
| | - Max Pinheiro
- Aix Marseille University, CNRS, ICR, Marseille, France
| | - Jianxing Huang
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, Xiamen, 361005, China
- Department of Chemistry, and College of Chemistry and Chemical Engineering, Xiamen University, Xiamen, 361005, China
| | | |
Collapse
|
46
|
Laurens G, Rabary M, Lam J, Peláez D, Allouche AR. Infrared spectra of neutral polycyclic aromatic hydrocarbons based on machine learning potential energy surface and dipole mapping. Theor Chem Acc 2021. [DOI: 10.1007/s00214-021-02773-6] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
|
47
|
Gupta A, Chakraborty S, Ramakrishnan R. Revving up 13C NMR shielding predictions across chemical space: benchmarks for atoms-in-molecules kernel machine learning with new data for 134 kilo molecules. MACHINE LEARNING: SCIENCE AND TECHNOLOGY 2021. [DOI: 10.1088/2632-2153/abe347] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022] Open
Abstract
Abstract
The requirement for accelerated and quantitatively accurate screening of nuclear magnetic resonance spectra across the small molecules chemical compound space is two-fold: (1) a robust ‘local’ machine learning (ML) strategy capturing the effect of the neighborhood on an atom’s ‘near-sighted’ property—chemical shielding; (2) an accurate reference dataset generated with a state-of-the-art first-principles method for training. Herein we report the QM9-NMR dataset comprising isotropic shielding of over 0.8 million C atoms in 134k molecules of the QM9 dataset in gas and five common solvent phases. Using these data for training, we present benchmark results for the prediction transferability of kernel-ridge regression models with popular local descriptors. Our best model, trained on 100k samples, accurately predicts isotropic shielding of 50k ‘hold-out’ atoms with a mean error of less than 1.9 ppm. For the rapid prediction of new query molecules, the models were trained on geometries from an inexpensive theory. Furthermore, by using a Δ-ML strategy, we quench the error below 1.4 ppm. Finally, we test the transferability on non-trivial benchmark sets that include benchmark molecules comprising 10–17 heavy atoms and drugs.
Collapse
|
48
|
Koutsoukos S, Philippi F, Malaret F, Welton T. A review on machine learning algorithms for the ionic liquid chemical space. Chem Sci 2021; 12:6820-6843. [PMID: 34123314 PMCID: PMC8153233 DOI: 10.1039/d1sc01000j] [Citation(s) in RCA: 49] [Impact Index Per Article: 16.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2021] [Accepted: 04/28/2021] [Indexed: 01/05/2023] Open
Abstract
There are thousands of papers published every year investigating the properties and possible applications of ionic liquids. Industrial use of these exceptional fluids requires adequate understanding of their physical properties, in order to create the ionic liquid that will optimally suit the application. Computational property prediction arose from the urgent need to minimise the time and cost that would be required to experimentally test different combinations of ions. This review discusses the use of machine learning algorithms as property prediction tools for ionic liquids (either as standalone methods or in conjunction with molecular dynamics simulations), presents common problems of training datasets and proposes ways that could lead to more accurate and efficient models.
Collapse
Affiliation(s)
- Spyridon Koutsoukos
- Department of Chemistry, Molecular Sciences Research Hub, Imperial College London White City Campus London W12 0BZ UK
| | - Frederik Philippi
- Department of Chemistry, Molecular Sciences Research Hub, Imperial College London White City Campus London W12 0BZ UK
| | - Francisco Malaret
- Department of Chemical Engineering, Imperial College London South Kensington Campus London SW7 2AZ UK
| | - Tom Welton
- Department of Chemistry, Molecular Sciences Research Hub, Imperial College London White City Campus London W12 0BZ UK
| |
Collapse
|
49
|
Ceriotti M, Clementi C, Anatole von Lilienfeld O. Machine learning meets chemical physics. J Chem Phys 2021; 154:160401. [PMID: 33940847 DOI: 10.1063/5.0051418] [Citation(s) in RCA: 22] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
Over recent years, the use of statistical learning techniques applied to chemical problems has gained substantial momentum. This is particularly apparent in the realm of physical chemistry, where the balance between empiricism and physics-based theory has traditionally been rather in favor of the latter. In this guest Editorial for the special topic issue on "Machine Learning Meets Chemical Physics," a brief rationale is provided, followed by an overview of the topics covered. We conclude by making some general remarks.
Collapse
Affiliation(s)
- Michele Ceriotti
- Laboratory of Computational Science and Modeling, IMX, École Polytechnique Fédérale de Lausanne, 1015 Lausanne, Switzerland
| | - Cecilia Clementi
- Department of Physics, Freie Universität Berlin, Arnimallee 14, 14195 Berlin, Germany
| | | |
Collapse
|
50
|
Nandi A, Qu C, Houston PL, Conte R, Bowman JM. Δ-machine learning for potential energy surfaces: A PIP approach to bring a DFT-based PES to CCSD(T) level of theory. J Chem Phys 2021; 154:051102. [DOI: 10.1063/5.0038301] [Citation(s) in RCA: 43] [Impact Index Per Article: 14.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023] Open
Affiliation(s)
- Apurba Nandi
- Department of Chemistry and Cherry L. Emerson Center for Scientific Computation, Emory University, Atlanta, Georgia 30322, USA
| | - Chen Qu
- Department of Chemistry and Biochemistry, University of Maryland, College Park, Maryland 20742, USA
| | - Paul L. Houston
- Department of Chemistry and Chemical Biology, Cornell University, Ithaca, New York 14853, USA and Department of Chemistry and Biochemistry, Georgia Institute of Technology, Atlanta, Georgia 30332, USA
| | - Riccardo Conte
- Dipartimento di Chimica, Università Degli Studi di Milano, Via Golgi 19, 20133 Milano, Italy
| | - Joel M. Bowman
- Department of Chemistry and Cherry L. Emerson Center for Scientific Computation, Emory University, Atlanta, Georgia 30322, USA
| |
Collapse
|