101
|
Pinheiro M, Ge F, Ferré N, Dral PO, Barbatti M. Choosing the right molecular machine learning potential. Chem Sci 2021; 12:14396-14413. [PMID: 34880991 PMCID: PMC8580106 DOI: 10.1039/d1sc03564a] [Citation(s) in RCA: 45] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2021] [Accepted: 09/14/2021] [Indexed: 11/21/2022] Open
Abstract
Quantum-chemistry simulations based on potential energy surfaces of molecules provide invaluable insight into the physicochemical processes at the atomistic level and yield such important observables as reaction rates and spectra. Machine learning potentials promise to significantly reduce the computational cost and hence enable otherwise unfeasible simulations. However, the surging number of such potentials begs the question of which one to choose or whether we still need to develop yet another one. Here, we address this question by evaluating the performance of popular machine learning potentials in terms of accuracy and computational cost. In addition, we deliver structured information for non-specialists in machine learning to guide them through the maze of acronyms, recognize each potential's main features, and judge what they could expect from each one.
Collapse
Affiliation(s)
- Max Pinheiro
- Aix Marseille University, CNRS, ICR Marseille France
| | - Fuchun Ge
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, Department of Chemistry, College of Chemistry and Chemical Engineering, Xiamen University China
| | - Nicolas Ferré
- Aix Marseille University, CNRS, ICR Marseille France
| | - Pavlo O Dral
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, Department of Chemistry, College of Chemistry and Chemical Engineering, Xiamen University China
| | - Mario Barbatti
- Aix Marseille University, CNRS, ICR Marseille France
- Institut Universitaire de France 75231 Paris France
| |
Collapse
|
102
|
Cheng Z, Du J, Zhang L, Ma J, Li W, Li S. Building quantum mechanics quality force fields of proteins with the generalized energy-based fragmentation approach and machine learning. Phys Chem Chem Phys 2021; 24:1326-1337. [PMID: 34718360 DOI: 10.1039/d1cp03934b] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]
Abstract
We combined our generalized energy-based fragmentation (GEBF) approach and machine learning (ML) technique to construct quantum mechanics (QM) quality force fields for proteins. In our scheme, the training sets for a protein are only constructed from its small subsystems, which capture all short-range interactions in the target system. The energy of a given protein is expressed as the summation of atomic contributions from QM calculations of various subsystems, corrected by long-range Coulomb and van der Waals interactions. With the Gaussian approximation potential (GAP) method, our protocol can automatically generate training sets with high efficiency. To facilitate the construction of training sets for proteins, we store all trained subsystem data in a library. If subsystems in the library are detected in a new protein, corresponding datasets can be directly reused as a part of the training set on this new protein. With two polypeptides, 4ZNN and 1XQ8 segment, as examples, the energies and forces predicted by GEBF-GAP are in good agreement with those from conventional QM calculations, and dihedral angle distributions from GEBF-GAP molecular dynamics (MD) simulations can also well reproduce those from ab initio MD simulations. In addition, with the training set generated from GEBF-GAP, we also demonstrate that GEBF-ML force fields constructed by neural network (NN) methods can also show QM quality. Therefore, the present work provides an efficient and systematic way to build QM quality force fields for biological systems.
Collapse
Affiliation(s)
- Zheng Cheng
- Key Laboratory of Mesoscopic Chemistry of Ministry of Education, Institute of Theoretical and Computational Chemistry, School of Theoretical and Computational Chemistry, School of Chemistry and Chemical Engineering, Nanjing University, Nanjing, 210023, P. R. China.
| | - Jiahui Du
- Key Laboratory of Mesoscopic Chemistry of Ministry of Education, Institute of Theoretical and Computational Chemistry, School of Theoretical and Computational Chemistry, School of Chemistry and Chemical Engineering, Nanjing University, Nanjing, 210023, P. R. China.
| | - Lei Zhang
- Key Laboratory of Mesoscopic Chemistry of Ministry of Education, Institute of Theoretical and Computational Chemistry, School of Theoretical and Computational Chemistry, School of Chemistry and Chemical Engineering, Nanjing University, Nanjing, 210023, P. R. China.
| | - Jing Ma
- Key Laboratory of Mesoscopic Chemistry of Ministry of Education, Institute of Theoretical and Computational Chemistry, School of Theoretical and Computational Chemistry, School of Chemistry and Chemical Engineering, Nanjing University, Nanjing, 210023, P. R. China.
| | - Wei Li
- Key Laboratory of Mesoscopic Chemistry of Ministry of Education, Institute of Theoretical and Computational Chemistry, School of Theoretical and Computational Chemistry, School of Chemistry and Chemical Engineering, Nanjing University, Nanjing, 210023, P. R. China.
| | - Shuhua Li
- Key Laboratory of Mesoscopic Chemistry of Ministry of Education, Institute of Theoretical and Computational Chemistry, School of Theoretical and Computational Chemistry, School of Chemistry and Chemical Engineering, Nanjing University, Nanjing, 210023, P. R. China.
| |
Collapse
|
103
|
Niblett SP, Galib M, Limmer DT. Learning intermolecular forces at liquid-vapor interfaces. J Chem Phys 2021; 155:164101. [PMID: 34717371 DOI: 10.1063/5.0067565] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022] Open
Abstract
By adopting a perspective informed by contemporary liquid-state theory, we consider how to train an artificial neural network potential to describe inhomogeneous, disordered systems. We find that neural network potentials based on local representations of atomic environments are capable of describing some properties of liquid-vapor interfaces but typically fail for properties that depend on unbalanced long-ranged interactions that build up in the presence of broken translation symmetry. These same interactions cancel in the translationally invariant bulk, allowing local neural network potentials to describe bulk properties correctly. By incorporating explicit models of the slowly varying long-ranged interactions and training neural networks only on the short-ranged components, we can arrive at potentials that robustly recover interfacial properties. We find that local neural network models can sometimes approximate a local molecular field potential to correct for the truncated interactions, but this behavior is variable and hard to learn. Generally, we find that models with explicit electrostatics are easier to train and have higher accuracy. We demonstrate this perspective in a simple model of an asymmetric dipolar fluid, where the exact long-ranged interaction is known, and in an ab initio water model, where it is approximated.
Collapse
Affiliation(s)
- Samuel P Niblett
- Department of Chemistry, University of California, Berkeley California 94609, USA
| | - Mirza Galib
- Department of Chemistry, University of California, Berkeley California 94609, USA
| | - David T Limmer
- Department of Chemistry, University of California, Berkeley California 94609, USA
| |
Collapse
|
104
|
Seo B, Lin ZY, Zhao Q, Webb MA, Savoie BM. Topology Automated Force-Field Interactions (TAFFI): A Framework for Developing Transferable Force Fields. J Chem Inf Model 2021; 61:5013-5027. [PMID: 34533949 DOI: 10.1021/acs.jcim.1c00491] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Force-field development has undergone a revolution in the past decade with the proliferation of quantum chemistry based parametrizations and the introduction of machine learning approximations of the atomistic potential energy surface. Nevertheless, transferable force fields with broad coverage of organic chemical space remain necessary for applications in materials and chemical discovery where throughput, consistency, and computational cost are paramount. Here, we introduce a force-field development framework called Topology Automated Force-Field Interactions (TAFFI) for developing transferable force fields of varying complexity against an extensible database of quantum chemistry calculations. TAFFI formalizes the concept of atom typing and makes it the basis for generating systematic training data that maintains a one-to-one correspondence with force-field terms. This feature makes TAFFI arbitrarily extensible to new chemistries while maintaining internal consistency and transferability. As a demonstration of TAFFI, we have developed a fixed-charge force-field, TAFFI-gen, from scratch that includes coverage for common organic functional groups that is comparable to established transferable force fields. The performance of TAFFI-gen was benchmarked against OPLS and GAFF for reproducing several experimental properties of 87 organic liquids. The consistent performance of these force fields, despite their distinct origins, validates the TAFFI framework while also providing evidence of the representability limitations of fixed-charge force fields.
Collapse
Affiliation(s)
- Bumjoon Seo
- Davidson School of Chemical Engineering, Purdue University, West Lafayette, Indiana 47906, United States
| | - Zih-Yu Lin
- Davidson School of Chemical Engineering, Purdue University, West Lafayette, Indiana 47906, United States
| | - Qiyuan Zhao
- Davidson School of Chemical Engineering, Purdue University, West Lafayette, Indiana 47906, United States
| | - Michael A Webb
- Department of Chemical and Biological Engineering, Princeton University, Princeton, New Jersey 08540, United States
| | - Brett M Savoie
- Davidson School of Chemical Engineering, Purdue University, West Lafayette, Indiana 47906, United States
| |
Collapse
|
105
|
Desai SA, Mattheakis M, Roberts SJ. Variational integrator graph networks for learning energy-conserving dynamical systems. Phys Rev E 2021; 104:035310. [PMID: 34654151 DOI: 10.1103/physreve.104.035310] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2021] [Accepted: 09/15/2021] [Indexed: 11/07/2022]
Abstract
Recent advances show that neural networks embedded with physics-informed priors significantly outperform vanilla neural networks in learning and predicting the long-term dynamics of complex physical systems from noisy data. Despite this success, there has only been a limited study on how to optimally combine physics priors to improve predictive performance. To tackle this problem we unpack and generalize recent innovations into individual inductive bias segments. As such, we are able to systematically investigate all possible combinations of inductive biases of which existing methods are a natural subset. Using this framework we introduce variational integrator graph networks-a novel method that unifies the strengths of existing approaches by combining an energy constraint, high-order symplectic variational integrators, and graph neural networks. We demonstrate, across an extensive ablation, that the proposed unifying framework outperforms existing methods, for data-efficient learning and in predictive accuracy, across both single- and many-body problems studied in the recent literature. We empirically show that the improvements arise because high-order variational integrators combined with a potential energy constraint induce coupled learning of generalized position and momentum updates which can be formalized via the partitioned Runge-Kutta method.
Collapse
Affiliation(s)
- Shaan A Desai
- Machine Learning Research Group, University of Oxford Eagle House, Oxford OX2 6ED, United Kingdom and John A. Paulson School of Engineering and Applied Sciences, Harvard University, Cambridge, Massachusetts 02138, USA
| | - Marios Mattheakis
- John A. Paulson School of Engineering and Applied Sciences, Harvard University, Cambridge, Massachusetts 02138, USA
| | - Stephen J Roberts
- Machine Learning Research Group, University of Oxford Eagle House, Oxford OX2 6ED, United Kingdom
| |
Collapse
|
106
|
Desai SA, Mattheakis M, Sondak D, Protopapas P, Roberts SJ. Port-Hamiltonian neural networks for learning explicit time-dependent dynamical systems. Phys Rev E 2021; 104:034312. [PMID: 34654178 DOI: 10.1103/physreve.104.034312] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2021] [Accepted: 09/14/2021] [Indexed: 11/07/2022]
Abstract
Accurately learning the temporal behavior of dynamical systems requires models with well-chosen learning biases. Recent innovations embed the Hamiltonian and Lagrangian formalisms into neural networks and demonstrate a significant improvement over other approaches in predicting trajectories of physical systems. These methods generally tackle autonomous systems that depend implicitly on time or systems for which a control signal is known a priori. Despite this success, many real world dynamical systems are nonautonomous, driven by time-dependent forces and experience energy dissipation. In this study, we address the challenge of learning from such nonautonomous systems by embedding the port-Hamiltonian formalism into neural networks, a versatile framework that can capture energy dissipation and time-dependent control forces. We show that the proposed port-Hamiltonian neural network can efficiently learn the dynamics of nonlinear physical systems of practical interest and accurately recover the underlying stationary Hamiltonian, time-dependent force, and dissipative coefficient. A promising outcome of our network is its ability to learn and predict chaotic systems such as the Duffing equation, for which the trajectories are typically hard to learn.
Collapse
Affiliation(s)
- Shaan A Desai
- Machine Learning Research Group, University of Oxford Eagle House, Oxford OX26ED, United Kingdom
| | - Marios Mattheakis
- John A. Paulson School of Engineering and Applied Sciences, Harvard University Cambridge, Massachusetts 02138, USA
| | - David Sondak
- John A. Paulson School of Engineering and Applied Sciences, Harvard University Cambridge, Massachusetts 02138, USA
| | - Pavlos Protopapas
- John A. Paulson School of Engineering and Applied Sciences, Harvard University Cambridge, Massachusetts 02138, USA
| | - Stephen J Roberts
- Machine Learning Research Group, University of Oxford Eagle House, Oxford OX26ED, United Kingdom
| |
Collapse
|
107
|
Zaverkin V, Holzmüller D, Steinwart I, Kästner J. Fast and Sample-Efficient Interatomic Neural Network Potentials for Molecules and Materials Based on Gaussian Moments. J Chem Theory Comput 2021; 17:6658-6670. [PMID: 34585927 DOI: 10.1021/acs.jctc.1c00527] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Artificial neural networks (NNs) are one of the most frequently used machine learning approaches to construct interatomic potentials and enable efficient large-scale atomistic simulations with almost ab initio accuracy. However, the simultaneous training of NNs on energies and forces, which are a prerequisite for, e.g., molecular dynamics simulations, can be demanding. In this work, we present an improved NN architecture based on the previous GM-NN model [Zaverkin V.; Kästner, J. J. Chem. Theory Comput. 2020, 16, 5410-5421], which shows an improved prediction accuracy and considerably reduced training times. Moreover, we extend the applicability of Gaussian moment-based interatomic potentials to periodic systems and demonstrate the overall excellent transferability and robustness of the respective models. The fast training by the improved methodology is a prerequisite for training-heavy workflows such as active learning or learning-on-the-fly.
Collapse
Affiliation(s)
- Viktor Zaverkin
- Institute for Theoretical Chemistry, University of Stuttgart, Pfaffenwaldring 55, 70569 Stuttgart, Germany
| | - David Holzmüller
- Institute for Stochastics and Applications, University of Stuttgart, Pfaffenwaldring 57, 70569 Stuttgart, Germany
| | - Ingo Steinwart
- Institute for Stochastics and Applications, University of Stuttgart, Pfaffenwaldring 57, 70569 Stuttgart, Germany
| | - Johannes Kästner
- Institute for Theoretical Chemistry, University of Stuttgart, Pfaffenwaldring 55, 70569 Stuttgart, Germany
| |
Collapse
|
108
|
Lambros E, Dasgupta S, Palos E, Swee S, Hu J, Paesani F. General Many-Body Framework for Data-Driven Potentials with Arbitrary Quantum Mechanical Accuracy: Water as a Case Study. J Chem Theory Comput 2021; 17:5635-5650. [PMID: 34370954 DOI: 10.1021/acs.jctc.1c00541] [Citation(s) in RCA: 22] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
We present a general framework for the development of data-driven many-body (MB) potential energy functions (MB-QM PEFs) that represent the interactions between small molecules at an arbitrary quantum-mechanical (QM) level of theory. As a demonstration, a family of MB-QM PEFs for water is rigorously derived from density functionals belonging to different rungs across Jacob's ladder of approximations within density functional theory (MB-DFT) and from Møller-Plesset perturbation theory (MB-MP2). Through a systematic analysis of individual MB contributions to the interaction energies of water clusters, we demonstrate that all MB-QM PEFs preserve the same accuracy as the corresponding ab initio calculations, with the exception of those derived from density functionals within the generalized gradient approximation (GGA). The differences between the DFT and MB-DFT results are traced back to density-driven errors that prevent GGA functionals from accurately representing the underlying molecular interactions for different cluster sizes and hydrogen-bonding arrangements. We show that this shortcoming may be overcome, within the MB formalism, by using density-corrected functionals (DC-DFT) that provide a more consistent representation of each individual MB contribution. This is demonstrated through the development of a MB-DFT PEF derived from DC-PBE-D3 data, which more accurately reproduce the corresponding ab initio results.
Collapse
Affiliation(s)
- Eleftherios Lambros
- Department of Chemistry and Biochemistry, University of California San Diego, La Jolla, California 92093, United States
| | - Saswata Dasgupta
- Department of Chemistry and Biochemistry, University of California San Diego, La Jolla, California 92093, United States
| | - Etienne Palos
- Department of Chemistry and Biochemistry, University of California San Diego, La Jolla, California 92093, United States
| | - Steven Swee
- Department of Chemistry and Biochemistry, University of California San Diego, La Jolla, California 92093, United States
| | - Jie Hu
- Department of Chemistry and Biochemistry, University of California San Diego, La Jolla, California 92093, United States
| | - Francesco Paesani
- Department of Chemistry and Biochemistry, University of California San Diego, La Jolla, California 92093, United States.,Materials Science and Engineering, University of California San Diego, La Jolla, California 92093, United States.,San Diego Supercomputer Center, University of California San Diego, La Jolla, California 92093, United States
| |
Collapse
|
109
|
Shimamura K, Takeshita Y, Fukushima S, Koura A, Shimojo F. Estimating thermal conductivity of α-Ag2Se using ANN potential with Chebyshev descriptor. Chem Phys Lett 2021. [DOI: 10.1016/j.cplett.2021.138748] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
|
110
|
Wang X, Xu Y, Zheng H, Yu K. A Scalable Graph Neural Network Method for Developing an Accurate Force Field of Large Flexible Organic Molecules. J Phys Chem Lett 2021; 12:7982-7987. [PMID: 34433274 DOI: 10.1021/acs.jpclett.1c02214] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
An accurate force field is the key to the success of all molecular mechanics simulations on organic polymers and biomolecules. Accurate correlated wave function (CW) methods scale poorly with system size, so this poses a great challenge to the development of an extendible ab initio force field for large flexible organic molecules at the CW level of accuracy. In this work, we combine the physics-driven nonbonding potential with a data-driven subgraph neural network bonding model (named sGNN). Tests on polyethylene glycol, polyethene, and their block polymers show that our strategy is highly accurate and robust for molecules of different sizes and chemical compositions. Therefore, one can develop a parameter library of small molecular fragments (with sizes easily accessible to CW methods) and assemble them to predict the energy of large polymers, thus opening a new path to next-generation organic force fields.
Collapse
Affiliation(s)
- Xufei Wang
- Two Sigma Investments, New York, New York 10013, United States
| | - Yuanda Xu
- The Program in Applied & Computational Mathematics, Princeton University, Princeton, New Jersey 08544-1000, United States
| | - Han Zheng
- Tsinghua-Berkeley Shenzhen Institute (TBSI), Institute of Materials Research (iMR), Tsinghua Shenzhen International Graduate School (TSIGS), Tsinghua University, Shenzhen 518055, P. R. China
| | - Kuang Yu
- Tsinghua-Berkeley Shenzhen Institute (TBSI), Institute of Materials Research (iMR), Tsinghua Shenzhen International Graduate School (TSIGS), Tsinghua University, Shenzhen 518055, P. R. China
| |
Collapse
|
111
|
Westermayr J, Marquetand P. Machine Learning for Electronically Excited States of Molecules. Chem Rev 2021; 121:9873-9926. [PMID: 33211478 PMCID: PMC8391943 DOI: 10.1021/acs.chemrev.0c00749] [Citation(s) in RCA: 162] [Impact Index Per Article: 54.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2020] [Indexed: 12/11/2022]
Abstract
Electronically excited states of molecules are at the heart of photochemistry, photophysics, as well as photobiology and also play a role in material science. Their theoretical description requires highly accurate quantum chemical calculations, which are computationally expensive. In this review, we focus on not only how machine learning is employed to speed up such excited-state simulations but also how this branch of artificial intelligence can be used to advance this exciting research field in all its aspects. Discussed applications of machine learning for excited states include excited-state dynamics simulations, static calculations of absorption spectra, as well as many others. In order to put these studies into context, we discuss the promises and pitfalls of the involved machine learning techniques. Since the latter are mostly based on quantum chemistry calculations, we also provide a short introduction into excited-state electronic structure methods and approaches for nonadiabatic dynamics simulations and describe tricks and problems when using them in machine learning for excited states of molecules.
Collapse
Affiliation(s)
- Julia Westermayr
- Institute
of Theoretical Chemistry, Faculty of Chemistry, University of Vienna, Währinger Strasse 17, 1090 Vienna, Austria
| | - Philipp Marquetand
- Institute
of Theoretical Chemistry, Faculty of Chemistry, University of Vienna, Währinger Strasse 17, 1090 Vienna, Austria
- Vienna
Research Platform on Accelerating Photoreaction Discovery, University of Vienna, Währinger Strasse 17, 1090 Vienna, Austria
- Data
Science @ Uni Vienna, University of Vienna, Währinger Strasse 29, 1090 Vienna, Austria
| |
Collapse
|
112
|
Abstract
Chemical compound space (CCS), the set of all theoretically conceivable combinations of chemical elements and (meta-)stable geometries that make up matter, is colossal. The first-principles based virtual sampling of this space, for example, in search of novel molecules or materials which exhibit desirable properties, is therefore prohibitive for all but the smallest subsets and simplest properties. We review studies aimed at tackling this challenge using modern machine learning techniques based on (i) synthetic data, typically generated using quantum mechanics based methods, and (ii) model architectures inspired by quantum mechanics. Such Quantum mechanics based Machine Learning (QML) approaches combine the numerical efficiency of statistical surrogate models with an ab initio view on matter. They rigorously reflect the underlying physics in order to reach universality and transferability across CCS. While state-of-the-art approximations to quantum problems impose severe computational bottlenecks, recent QML based developments indicate the possibility of substantial acceleration without sacrificing the predictive power of quantum mechanics.
Collapse
Affiliation(s)
- Bing Huang
- Faculty
of Physics, University of Vienna, 1090 Vienna, Austria
| | - O. Anatole von Lilienfeld
- Faculty
of Physics, University of Vienna, 1090 Vienna, Austria
- Institute
of Physical Chemistry and National Center for Computational Design
and Discovery of Novel Materials (MARVEL), Department of Chemistry, University of Basel, 4056 Basel, Switzerland
| |
Collapse
|
113
|
Unke O, Chmiela S, Sauceda HE, Gastegger M, Poltavsky I, Schütt KT, Tkatchenko A, Müller KR. Machine Learning Force Fields. Chem Rev 2021; 121:10142-10186. [PMID: 33705118 PMCID: PMC8391964 DOI: 10.1021/acs.chemrev.0c01111] [Citation(s) in RCA: 360] [Impact Index Per Article: 120.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2020] [Indexed: 12/27/2022]
Abstract
In recent years, the use of machine learning (ML) in computational chemistry has enabled numerous advances previously out of reach due to the computational complexity of traditional electronic-structure methods. One of the most promising applications is the construction of ML-based force fields (FFs), with the aim to narrow the gap between the accuracy of ab initio methods and the efficiency of classical FFs. The key idea is to learn the statistical relation between chemical structure and potential energy without relying on a preconceived notion of fixed chemical bonds or knowledge about the relevant interactions. Such universal ML approximations are in principle only limited by the quality and quantity of the reference data used to train them. This review gives an overview of applications of ML-FFs and the chemical insights that can be obtained from them. The core concepts underlying ML-FFs are described in detail, and a step-by-step guide for constructing and testing them from scratch is given. The text concludes with a discussion of the challenges that remain to be overcome by the next generation of ML-FFs.
Collapse
Affiliation(s)
- Oliver
T. Unke
- Machine
Learning Group, Technische Universität
Berlin, 10587 Berlin, Germany
- DFG
Cluster of Excellence “Unifying Systems in Catalysis”
(UniSysCat), Technische Universität Berlin, 10623 Berlin, Germany
| | - Stefan Chmiela
- Machine
Learning Group, Technische Universität
Berlin, 10587 Berlin, Germany
| | - Huziel E. Sauceda
- Machine
Learning Group, Technische Universität
Berlin, 10587 Berlin, Germany
- BASLEARN,
BASF-TU Joint Lab, Technische Universität
Berlin, 10587 Berlin, Germany
| | - Michael Gastegger
- Machine
Learning Group, Technische Universität
Berlin, 10587 Berlin, Germany
- DFG
Cluster of Excellence “Unifying Systems in Catalysis”
(UniSysCat), Technische Universität Berlin, 10623 Berlin, Germany
- BASLEARN,
BASF-TU Joint Lab, Technische Universität
Berlin, 10587 Berlin, Germany
| | - Igor Poltavsky
- Department
of Physics and Materials Science, University
of Luxembourg, L-1511 Luxembourg City, Luxembourg
| | - Kristof T. Schütt
- Machine
Learning Group, Technische Universität
Berlin, 10587 Berlin, Germany
| | - Alexandre Tkatchenko
- Department
of Physics and Materials Science, University
of Luxembourg, L-1511 Luxembourg City, Luxembourg
| | - Klaus-Robert Müller
- Machine
Learning Group, Technische Universität
Berlin, 10587 Berlin, Germany
- BIFOLD−Berlin
Institute for the Foundations of Learning and Data, Berlin, Germany
- Department
of Artificial Intelligence, Korea University, Anam-dong, Seongbuk-gu, Seoul 02841, Korea
- Max Planck
Institute for Informatics, Stuhlsatzenhausweg, 66123 Saarbrücken, Germany
- Google
Research, Brain Team, Berlin, Germany
| |
Collapse
|
114
|
Abstract
Electronically excited states of molecules are at the heart of photochemistry, photophysics, as well as photobiology and also play a role in material science. Their theoretical description requires highly accurate quantum chemical calculations, which are computationally expensive. In this review, we focus on not only how machine learning is employed to speed up such excited-state simulations but also how this branch of artificial intelligence can be used to advance this exciting research field in all its aspects. Discussed applications of machine learning for excited states include excited-state dynamics simulations, static calculations of absorption spectra, as well as many others. In order to put these studies into context, we discuss the promises and pitfalls of the involved machine learning techniques. Since the latter are mostly based on quantum chemistry calculations, we also provide a short introduction into excited-state electronic structure methods and approaches for nonadiabatic dynamics simulations and describe tricks and problems when using them in machine learning for excited states of molecules.
Collapse
Affiliation(s)
- Julia Westermayr
- Institute of Theoretical Chemistry, Faculty of Chemistry, University of Vienna, Währinger Strasse 17, 1090 Vienna, Austria
| | - Philipp Marquetand
- Institute of Theoretical Chemistry, Faculty of Chemistry, University of Vienna, Währinger Strasse 17, 1090 Vienna, Austria
- Vienna Research Platform on Accelerating Photoreaction Discovery, University of Vienna, Währinger Strasse 17, 1090 Vienna, Austria
- Data Science @ Uni Vienna, University of Vienna, Währinger Strasse 29, 1090 Vienna, Austria
| |
Collapse
|
115
|
Yang M, Karmakar T, Parrinello M. Liquid-Liquid Critical Point in Phosphorus. PHYSICAL REVIEW LETTERS 2021; 127:080603. [PMID: 34477397 DOI: 10.1103/physrevlett.127.080603] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/04/2021] [Revised: 07/07/2021] [Accepted: 07/19/2021] [Indexed: 06/13/2023]
Abstract
The study of liquid-liquid phase transitions has attracted considerable attention. One interesting example of this phenomenon is phosphorus, for which the existence of a first-order phase transition between a low density insulating molecular phase and a conducting polymeric phase has been experimentally established. In this Letter, we model this transition by an ab initio quality molecular dynamics simulation and explore a large portion of the liquid section of the phase diagram. We draw the liquid-liquid coexistence curve and determine that it terminates into a second-order critical point. Close to the critical point, large coupled structure and electronic structure fluctuations are observed.
Collapse
Affiliation(s)
- Manyi Yang
- Italian Institute of Technology, Via Melen 83, 16152 Genova, Italy
| | - Tarak Karmakar
- Italian Institute of Technology, Via Melen 83, 16152 Genova, Italy
| | | |
Collapse
|
116
|
Zubatyuk R, Smith JS, Nebgen BT, Tretiak S, Isayev O. Teaching a neural network to attach and detach electrons from molecules. Nat Commun 2021; 12:4870. [PMID: 34381051 PMCID: PMC8357920 DOI: 10.1038/s41467-021-24904-0] [Citation(s) in RCA: 29] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2020] [Accepted: 07/01/2021] [Indexed: 02/07/2023] Open
Abstract
Interatomic potentials derived with Machine Learning algorithms such as Deep-Neural Networks (DNNs), achieve the accuracy of high-fidelity quantum mechanical (QM) methods in areas traditionally dominated by empirical force fields and allow performing massive simulations. Most DNN potentials were parametrized for neutral molecules or closed-shell ions due to architectural limitations. In this work, we propose an improved machine learning framework for simulating open-shell anions and cations. We introduce the AIMNet-NSE (Neural Spin Equilibration) architecture, which can predict molecular energies for an arbitrary combination of molecular charge and spin multiplicity with errors of about 2-3 kcal/mol and spin-charges with error errors ~0.01e for small and medium-sized organic molecules, compared to the reference QM simulations. The AIMNet-NSE model allows to fully bypass QM calculations and derive the ionization potential, electron affinity, and conceptual Density Functional Theory quantities like electronegativity, hardness, and condensed Fukui functions. We show that these descriptors, along with learned atomic representations, could be used to model chemical reactivity through an example of regioselectivity in electrophilic aromatic substitution reactions.
Collapse
Affiliation(s)
- Roman Zubatyuk
- grid.147455.60000 0001 2097 0344Department of Chemistry, Carnegie Mellon University, Pittsburgh, PA USA
| | - Justin S. Smith
- grid.148313.c0000 0004 0428 3079Theoretical Division, Los Alamos National Laboratory, Los Alamos, NM USA
| | - Benjamin T. Nebgen
- grid.148313.c0000 0004 0428 3079Theoretical Division, Los Alamos National Laboratory, Los Alamos, NM USA
| | - Sergei Tretiak
- grid.148313.c0000 0004 0428 3079Theoretical Division, Los Alamos National Laboratory, Los Alamos, NM USA ,grid.148313.c0000 0004 0428 3079Center for Integrated Nanotechnologies, Los Alamos National Laboratory, Los Alamos, NM USA
| | - Olexandr Isayev
- grid.147455.60000 0001 2097 0344Department of Chemistry, Carnegie Mellon University, Pittsburgh, PA USA
| |
Collapse
|
117
|
Konrad M, Wenzel W. CONI-Net: Machine Learning of Separable Intermolecular Force Fields. J Chem Theory Comput 2021; 17:4996-5006. [PMID: 34247485 DOI: 10.1021/acs.jctc.1c00328] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023]
Abstract
Noncovalent interactions (NCIs) play an essential role in soft matter and biomolecular simulations. The ab initio method symmetry-adapted perturbation theory allows a precise quantitative analysis of NCIs and offers an inherent energy decomposition, enabling a deeper understanding of the nature of intermolecular interactions. However, this method is limited to small systems, for instance, dimers of molecules. Here, we present a scale-bridging approach to systematically derive an intermolecular force field from ab initio data while preserving the energy decomposition of the underlying method. We apply the model in molecular dynamics simulations of several solvents and compare two predicted thermodynamic observables-mass density and enthalpy of vaporization-to experiments and established force fields. For a data set limited to hydrocarbons, we investigate the extrapolation capabilities to molecules absent from the training set. Overall, despite the affordable moderate quality of the reference ab initio data, we find promising results. With the straightforward data set generation procedure and the lack of target data in the fitting process, we have developed a method that enables the rapid development of predictive force fields with an extra dimension of insights into the balance of NCIs.
Collapse
Affiliation(s)
- Manuel Konrad
- Institute of Nanotechnology, Karlsruhe Institute of Technology, Hermann-von-Helmholtz-Platz 1, Eggenstein-Leopoldshafen 76344, Germany
| | - Wolfgang Wenzel
- Institute of Nanotechnology, Karlsruhe Institute of Technology, Hermann-von-Helmholtz-Platz 1, Eggenstein-Leopoldshafen 76344, Germany
| |
Collapse
|
118
|
Miksch AM, Morawietz T, Kästner J, Urban A, Artrith N. Strategies for the construction of machine-learning potentials for accurate and efficient atomic-scale simulations. MACHINE LEARNING: SCIENCE AND TECHNOLOGY 2021. [DOI: 10.1088/2632-2153/abfd96] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Abstract
Recent advances in machine-learning interatomic potentials have enabled the efficient modeling of complex atomistic systems with an accuracy that is comparable to that of conventional quantum-mechanics based methods. At the same time, the construction of new machine-learning potentials can seem a daunting task, as it involves data-science techniques that are not yet common in chemistry and materials science. Here, we provide a tutorial-style overview of strategies and best practices for the construction of artificial neural network (ANN) potentials. We illustrate the most important aspects of (a) data collection, (b) model selection, (c) training and validation, and (d) testing and refinement of ANN potentials on the basis of practical examples. Current research in the areas of active learning and delta learning are also discussed in the context of ANN potentials. This tutorial review aims at equipping computational chemists and materials scientists with the required background knowledge for ANN potential construction and application, with the intention to accelerate the adoption of the method, so that it can facilitate exciting research that would otherwise be challenging with conventional strategies.
Collapse
|
119
|
Kulichenko M, Smith JS, Nebgen B, Li YW, Fedik N, Boldyrev AI, Lubbers N, Barros K, Tretiak S. The Rise of Neural Networks for Materials and Chemical Dynamics. J Phys Chem Lett 2021; 12:6227-6243. [PMID: 34196559 DOI: 10.1021/acs.jpclett.1c01357] [Citation(s) in RCA: 35] [Impact Index Per Article: 11.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Machine learning (ML) is quickly becoming a premier tool for modeling chemical processes and materials. ML-based force fields, trained on large data sets of high-quality electron structure calculations, are particularly attractive due their unique combination of computational efficiency and physical accuracy. This Perspective summarizes some recent advances in the development of neural network-based interatomic potentials. Designing high-quality training data sets is crucial to overall model accuracy. One strategy is active learning, in which new data are automatically collected for atomic configurations that produce large ML uncertainties. Another strategy is to use the highest levels of quantum theory possible. Transfer learning allows training to a data set of mixed fidelity. A model initially trained to a large data set of density functional theory calculations can be significantly improved by retraining to a relatively small data set of expensive coupled cluster theory calculations. These advances are exemplified by applications to molecules and materials.
Collapse
Affiliation(s)
- Maksim Kulichenko
- Theoretical Division, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, United States
- Department of Chemistry and Biochemistry, Utah State University, Logan, Utah 84322, United States
| | - Justin S Smith
- Theoretical Division, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, United States
- Center for Nonlinear Studies, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, United States
| | - Benjamin Nebgen
- Theoretical Division, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, United States
| | - Ying Wai Li
- Computer, Computational, and Statistical Sciences Division, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, United States
| | - Nikita Fedik
- Theoretical Division, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, United States
- Department of Chemistry and Biochemistry, Utah State University, Logan, Utah 84322, United States
| | - Alexander I Boldyrev
- Department of Chemistry and Biochemistry, Utah State University, Logan, Utah 84322, United States
| | - Nicholas Lubbers
- Computer, Computational, and Statistical Sciences Division, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, United States
| | - Kipton Barros
- Theoretical Division, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, United States
- Center for Nonlinear Studies, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, United States
| | - Sergei Tretiak
- Theoretical Division, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, United States
- Center for Nonlinear Studies, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, United States
- Center for Integrated Nanotechnologies, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, United States
| |
Collapse
|
120
|
Ren H, Li H, Zhang Q, Liang L, Guo W, Huang F, Luo Y, Jiang J. A machine learning vibrational spectroscopy protocol for spectrum prediction and spectrum-based structure recognition. FUNDAMENTAL RESEARCH 2021. [DOI: 10.1016/j.fmre.2021.05.005] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022] Open
|
121
|
Rajak P, Baradwaj N, Nomura KI, Krishnamoorthy A, Rino JP, Shimamura K, Fukushima S, Shimojo F, Kalia R, Nakano A, Vashishta P. Neural Network Quantum Molecular Dynamics, Intermediate Range Order in GeSe 2, and Neutron Scattering Experiments. J Phys Chem Lett 2021; 12:6020-6028. [PMID: 34165308 DOI: 10.1021/acs.jpclett.1c01272] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
A remarkable property of certain covalent glasses and their melts is intermediate range order, manifested as the first sharp diffraction peak (FSDP) in neutron-scattering experiments, as was exhaustively investigated by Price, Saboungi, and collaborators. Atomistic simulations thus far have relied on either quantum molecular dynamics (QMD), with systems too small to resolve FSDP, or classical molecular dynamics, without quantum-mechanical accuracy. We investigate prototypical FSDP in GeSe2 glass and melt using neural-network quantum molecular dynamics (NNQMD) based on machine learning, which allows large simulation sizes with validated quantum mechanical accuracy to make quantitative comparisons with neutron data. The system-size dependence of the FSDP height is determined by comparing QMD and NNQMD simulations with experimental data. Partial pair distribution functions, bond-angle distributions, partial and neutron structure factors, and ring-size distributions are presented. Calculated FSDP heights agree quantitatively with neutron scattering data for GeSe2 glass at 10 K and melt at 1100 K.
Collapse
Affiliation(s)
- Pankaj Rajak
- Collaboratory for Advanced Computing and Simulations, Department of Chemical Engineering and Materials Science, Department of Physics & Astronomy, and Department of Computer Science, University of Southern California, Los Angeles 90089, United States
- Argonne National Laboratory, Lemont, Illinois 60439, United States
| | - Nitish Baradwaj
- Collaboratory for Advanced Computing and Simulations, Department of Chemical Engineering and Materials Science, Department of Physics & Astronomy, and Department of Computer Science, University of Southern California, Los Angeles 90089, United States
| | - Ken-Ichi Nomura
- Collaboratory for Advanced Computing and Simulations, Department of Chemical Engineering and Materials Science, Department of Physics & Astronomy, and Department of Computer Science, University of Southern California, Los Angeles 90089, United States
| | - Aravind Krishnamoorthy
- Collaboratory for Advanced Computing and Simulations, Department of Chemical Engineering and Materials Science, Department of Physics & Astronomy, and Department of Computer Science, University of Southern California, Los Angeles 90089, United States
| | - Jose P Rino
- Departamento de Fisica, Universidade Federal de São Carlos, São Carlos, São Paulo13565-905, Brazil
| | - Kohei Shimamura
- Department of Physics, Kumamoto University, Kumamoto 860-8555, Japan
| | - Shogo Fukushima
- Department of Physics, Kumamoto University, Kumamoto 860-8555, Japan
| | - Fuyuki Shimojo
- Department of Physics, Kumamoto University, Kumamoto 860-8555, Japan
| | - Rajiv Kalia
- Collaboratory for Advanced Computing and Simulations, Department of Chemical Engineering and Materials Science, Department of Physics & Astronomy, and Department of Computer Science, University of Southern California, Los Angeles 90089, United States
| | - Aiichiro Nakano
- Collaboratory for Advanced Computing and Simulations, Department of Chemical Engineering and Materials Science, Department of Physics & Astronomy, and Department of Computer Science, University of Southern California, Los Angeles 90089, United States
| | - Priya Vashishta
- Collaboratory for Advanced Computing and Simulations, Department of Chemical Engineering and Materials Science, Department of Physics & Astronomy, and Department of Computer Science, University of Southern California, Los Angeles 90089, United States
| |
Collapse
|
122
|
Xu M, Zhu T, Zhang JZH. Automatically Constructed Neural Network Potentials for Molecular Dynamics Simulation of Zinc Proteins. Front Chem 2021; 9:692200. [PMID: 34222200 PMCID: PMC8249736 DOI: 10.3389/fchem.2021.692200] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2021] [Accepted: 05/10/2021] [Indexed: 11/13/2022] Open
Abstract
The development of accurate and efficient potential energy functions for the molecular dynamics simulation of metalloproteins has long been a great challenge for the theoretical chemistry community. An artificial neural network provides the possibility to develop potential energy functions with both the efficiency of the classical force fields and the accuracy of the quantum chemical methods. In this work, neural network potentials were automatically constructed by using the ESOINN-DP method for typical zinc proteins. For the four most common zinc coordination modes in proteins, the potential energy, atomic forces, and atomic charges predicted by neural network models show great agreement with quantum mechanics calculations and the neural network potential can maintain the coordination geometry correctly. In addition, MD simulation and energy optimization with the neural network potential can be readily used for structural refinement. The neural network potential is not limited by the function form and complex parameterization process, and important quantum effects such as polarization and charge transfer can be accurately considered. The algorithm proposed in this work can also be directly applied to proteins containing other metal ions.
Collapse
Affiliation(s)
- Mingyuan Xu
- Shanghai Engineering Research Center of Molecular Therapeutics and New Drug Development, Shanghai Key Laboratory of Green Chemistry and Chemical Process, School of Chemistry and Molecular Engineering, East China Normal University, Shanghai, China
| | - Tong Zhu
- Shanghai Engineering Research Center of Molecular Therapeutics and New Drug Development, Shanghai Key Laboratory of Green Chemistry and Chemical Process, School of Chemistry and Molecular Engineering, East China Normal University, Shanghai, China
- NYU-ECNU Center for Computational Chemistry at NYU Shanghai, Shanghai, China
| | - John Z. H. Zhang
- Shanghai Engineering Research Center of Molecular Therapeutics and New Drug Development, Shanghai Key Laboratory of Green Chemistry and Chemical Process, School of Chemistry and Molecular Engineering, East China Normal University, Shanghai, China
- NYU-ECNU Center for Computational Chemistry at NYU Shanghai, Shanghai, China
- Department of Chemistry, New York University, New York, NY, United States
- Collaborative Innovation Center of Extreme Optics, Shanxi University, Taiyuan, China
| |
Collapse
|
123
|
Friederich P, Häse F, Proppe J, Aspuru-Guzik A. Machine-learned potentials for next-generation matter simulations. NATURE MATERIALS 2021; 20:750-761. [PMID: 34045696 DOI: 10.1038/s41563-020-0777-6] [Citation(s) in RCA: 108] [Impact Index Per Article: 36.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/21/2019] [Accepted: 07/17/2020] [Indexed: 05/18/2023]
Abstract
The choice of simulation methods in computational materials science is driven by a fundamental trade-off: bridging large time- and length-scales with highly accurate simulations at an affordable computational cost. Venturing the investigation of complex phenomena on large scales requires fast yet accurate computational methods. We review the emerging field of machine-learned potentials, which promises to reach the accuracy of quantum mechanical computations at a substantially reduced computational cost. This Review will summarize the basic principles of the underlying machine learning methods, the data acquisition process and active learning procedures. We highlight multiple recent applications of machine-learned potentials in various fields, ranging from organic chemistry and biomolecules to inorganic crystal structure predictions and surface science. We furthermore discuss the developments required to promote a broader use of ML potentials, and the possibility of using them to help solve open questions in materials science and facilitate fully computational materials design.
Collapse
Affiliation(s)
- Pascal Friederich
- Chemical Physics Theory Group, Department of Chemistry, University of Toronto, Toronto, Ontario, Canada
- Department of Computer Science, University of Toronto, Toronto, Ontario, Canada
- Institute of Nanotechnology, Karlsruhe Institute of Technology, Eggenstein-Leopoldshafen, Germany
- Institute of Theoretical Informatics, Karlsruhe Institute of Technology, Karlsruhe, Germany
| | - Florian Häse
- Chemical Physics Theory Group, Department of Chemistry, University of Toronto, Toronto, Ontario, Canada
- Department of Computer Science, University of Toronto, Toronto, Ontario, Canada
- Vector Institute for Artificial Intelligence, Toronto, Ontario, Canada
- Department of Chemistry and Chemical Biology, Harvard University, Cambridge, MA, USA
| | - Jonny Proppe
- Chemical Physics Theory Group, Department of Chemistry, University of Toronto, Toronto, Ontario, Canada
- Department of Computer Science, University of Toronto, Toronto, Ontario, Canada
- Institute of Physical Chemistry, Georg-August University, Göttingen, Germany
| | - Alán Aspuru-Guzik
- Chemical Physics Theory Group, Department of Chemistry, University of Toronto, Toronto, Ontario, Canada.
- Department of Computer Science, University of Toronto, Toronto, Ontario, Canada.
- Vector Institute for Artificial Intelligence, Toronto, Ontario, Canada.
- Department of Chemistry and Chemical Biology, Harvard University, Cambridge, MA, USA.
- Lebovic Fellow, Canadian Institute for Advanced Research (CIFAR), Toronto, Ontario, Canada.
| |
Collapse
|
124
|
Krishnamoorthy A, Nomura KI, Baradwaj N, Shimamura K, Rajak P, Mishra A, Fukushima S, Shimojo F, Kalia R, Nakano A, Vashishta P. Dielectric Constant of Liquid Water Determined with Neural Network Quantum Molecular Dynamics. PHYSICAL REVIEW LETTERS 2021; 126:216403. [PMID: 34114857 DOI: 10.1103/physrevlett.126.216403] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/24/2020] [Accepted: 03/30/2021] [Indexed: 06/12/2023]
Abstract
The static dielectric constant ϵ_{0} and its temperature dependence for liquid water is investigated using neural network quantum molecular dynamics (NNQMD). We compute the exact dielectric constant in canonical ensemble from NNQMD trajectories using fluctuations in macroscopic polarization computed from maximally localized Wannier functions (MLWF). Two deep neural networks are constructed. The first, NNQMD, is trained on QMD configurations for liquid water under a variety of temperature and density conditions to learn potential energy surface and forces and then perform molecular dynamics simulations. The second network, NNMLWF, is trained to predict locations of MLWF of individual molecules using the atomic configurations from NNQMD. Training data for both the neural networks is produced using a highly accurate quantum-mechanical method, DFT-SCAN that yields an excellent description of liquid water. We produce 280×10^{6} configurations of water at 7 temperatures using NNQMD and predict MLWF centers using NNMLWF to compute the polarization fluctuations. The length of trajectories needed for a converged value of the dielectric constant at 0°C is found to be 20 ns (40×10^{6} configurations with 0.5 fs time step). The computed dielectric constants for 0, 15, 30, 45, 60, 75, and 90°C are in good agreement with experiments. Our scalable scheme to compute dielectric constants with quantum accuracy is also applicable to other polar molecular liquids.
Collapse
Affiliation(s)
- Aravind Krishnamoorthy
- Collaboratory for Advanced Computing and Simulations, Department of Chemical Engineering and Materials Science, Department of Physics & Astronomy, and Department of Computer Science, University of Southern California, Los Angeles, California 90089, USA
| | - Ken-Ichi Nomura
- Collaboratory for Advanced Computing and Simulations, Department of Chemical Engineering and Materials Science, Department of Physics & Astronomy, and Department of Computer Science, University of Southern California, Los Angeles, California 90089, USA
| | - Nitish Baradwaj
- Collaboratory for Advanced Computing and Simulations, Department of Chemical Engineering and Materials Science, Department of Physics & Astronomy, and Department of Computer Science, University of Southern California, Los Angeles, California 90089, USA
| | - Kohei Shimamura
- Department of Physics, Kumamoto University, Kumamoto 860-8555, Japan
| | - Pankaj Rajak
- Argonne National Laboratory, Lemont, Illinois 60439, USA
| | - Ankit Mishra
- Collaboratory for Advanced Computing and Simulations, Department of Chemical Engineering and Materials Science, Department of Physics & Astronomy, and Department of Computer Science, University of Southern California, Los Angeles, California 90089, USA
| | - Shogo Fukushima
- Department of Physics, Kumamoto University, Kumamoto 860-8555, Japan
| | - Fuyuki Shimojo
- Department of Physics, Kumamoto University, Kumamoto 860-8555, Japan
| | - Rajiv Kalia
- Collaboratory for Advanced Computing and Simulations, Department of Chemical Engineering and Materials Science, Department of Physics & Astronomy, and Department of Computer Science, University of Southern California, Los Angeles, California 90089, USA
| | - Aiichiro Nakano
- Collaboratory for Advanced Computing and Simulations, Department of Chemical Engineering and Materials Science, Department of Physics & Astronomy, and Department of Computer Science, University of Southern California, Los Angeles, California 90089, USA
| | - Priya Vashishta
- Collaboratory for Advanced Computing and Simulations, Department of Chemical Engineering and Materials Science, Department of Physics & Astronomy, and Department of Computer Science, University of Southern California, Los Angeles, California 90089, USA
| |
Collapse
|
125
|
Han Y, Wang Z, Wei Z, Liu J, Li J. Machine learning builds full-QM precision protein force fields in seconds. Brief Bioinform 2021; 22:6279287. [PMID: 34017993 DOI: 10.1093/bib/bbab158] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2021] [Revised: 03/29/2021] [Accepted: 04/04/2021] [Indexed: 11/14/2022] Open
Abstract
Full-quantum mechanics (QM) calculations are extraordinarily precise but difficult to apply to large systems, such as biomolecules. Motivated by the massive demand for efficient calculations for large systems at the full-QM level and by the significant advances in machine learning, we have designed a neural network-based two-body molecular fractionation with conjugate caps (NN-TMFCC) approach to accelerate the energy and atomic force calculations of proteins. The results show very high precision for the proposed NN potential energy surface models of residue-based fragments, with energy root-mean-squared errors (RMSEs) less than 1.0 kcal/mol and force RMSEs less than 1.3 kcal/mol/Å for both training and testing sets. The proposed NN-TMFCC method calculates the energies and atomic forces of 15 representative proteins with full-QM precision in 10-100 s, which is thousands of times faster than the full-QM calculations. The computational complexity of the NN-TMFCC method is independent of the protein size and only depends on the number of residue species, which makes this method particularly suitable for rapid prediction of large systems with tens of thousands or even hundreds of thousands of times acceleration. This highly precise and efficient NN-TMFCC approach exhibits considerable potential for performing energy and force calculations, structure predictions and molecular dynamics simulations of proteins with full-QM precision.
Collapse
Affiliation(s)
| | | | - Zhiyun Wei
- Shanghai First Maternity and Infant Hospital, Tongji University School of Medicine, Shanghai, China
| | - Jinyun Liu
- Key Laboratory of Functional Molecular Solids of Ministry of Education, Anhui Normal University, China
| | - Jinjin Li
- Key Laboratory for Thin Film and Microfabrication of Ministry of Education, Department of Micro/Nano Electronics, Shanghai Jiao Tong University, China
| |
Collapse
|
126
|
Schriber JB, Nascimento DR, Koutsoukas A, Spronk SA, Cheney DL, Sherrill CD. CLIFF: A component-based, machine-learned, intermolecular force field. J Chem Phys 2021; 154:184110. [PMID: 34241025 DOI: 10.1063/5.0042989] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022] Open
Abstract
Computation of intermolecular interactions is a challenge in drug discovery because accurate ab initio techniques are too computationally expensive to be routinely applied to drug-protein models. Classical force fields are more computationally feasible, and force fields designed to match symmetry adapted perturbation theory (SAPT) interaction energies can remain accurate in this context. Unfortunately, the application of such force fields is complicated by the laborious parameterization required for computations on new molecules. Here, we introduce the component-based machine-learned intermolecular force field (CLIFF), which combines accurate, physics-based equations for intermolecular interaction energies with machine-learning models to enable automatic parameterization. The CLIFF uses functional forms corresponding to electrostatic, exchange-repulsion, induction/polarization, and London dispersion components in SAPT. Molecule-independent parameters are fit with respect to SAPT2+(3)δMP2/aug-cc-pVTZ, and molecule-dependent atomic parameters (atomic widths, atomic multipoles, and Hirshfeld ratios) are obtained from machine learning models developed for C, N, O, H, S, F, Cl, and Br. The CLIFF achieves mean absolute errors (MAEs) no worse than 0.70 kcal mol-1 in both total and component energies across a diverse dimer test set. For the side chain-side chain interaction database derived from protein fragments, the CLIFF produces total interaction energies with an MAE of 0.27 kcal mol-1 with respect to reference data, outperforming similar and even more expensive methods. In applications to a set of model drug-protein interactions, the CLIFF is able to accurately rank-order ligand binding strengths and achieves less than 10% error with respect to SAPT reference values for most complexes.
Collapse
Affiliation(s)
- Jeffrey B Schriber
- Center for Computational Molecular Science and Technology, School of Chemistry and Biochemistry, Georgia Institute of Technology, Atlanta, Georgia 30318, USA
| | - Daniel R Nascimento
- Center for Computational Molecular Science and Technology, School of Chemistry and Biochemistry, Georgia Institute of Technology, Atlanta, Georgia 30318, USA
| | - Alexios Koutsoukas
- Molecular Structure and Design, Bristol Myers Squibb Company, P.O. Box 5400, Princeton, New Jersey 08543, USA
| | - Steven A Spronk
- Molecular Structure and Design, Bristol Myers Squibb Company, P.O. Box 5400, Princeton, New Jersey 08543, USA
| | - Daniel L Cheney
- Molecular Structure and Design, Bristol Myers Squibb Company, P.O. Box 5400, Princeton, New Jersey 08543, USA
| | - C David Sherrill
- Center for Computational Molecular Science and Technology, School of Chemistry and Biochemistry and School of Computational Science and Engineering, Georgia Institute of Technology, Atlanta, Georgia 30318, USA
| |
Collapse
|
127
|
Paleico ML, Behler J. A bin and hash method for analyzing reference data and descriptors in machine learning potentials. MACHINE LEARNING: SCIENCE AND TECHNOLOGY 2021. [DOI: 10.1088/2632-2153/abe663] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022] Open
Abstract
Abstract
In recent years the development of machine learning potentials (MLPs) has become a very active field of research. Numerous approaches have been proposed, which allow one to perform extended simulations of large systems at a small fraction of the computational costs of electronic structure calculations. The key to the success of modern MLPs is the close-to first principles quality description of the atomic interactions. This accuracy is reached by using very flexible functional forms in combination with high-level reference data from electronic structure calculations. These data sets can include up to hundreds of thousands of structures covering millions of atomic environments to ensure that all relevant features of the potential energy surface are well represented. The handling of such large data sets is nowadays becoming one of the main challenges in the construction of MLPs. In this paper we present a method, the bin-and-hash (BAH) algorithm, to overcome this problem by enabling the efficient identification and comparison of large numbers of multidimensional vectors. Such vectors emerge in multiple contexts in the construction of MLPs. Examples are the comparison of local atomic environments to identify and avoid unnecessary redundant information in the reference data sets that is costly in terms of both the electronic structure calculations as well as the training process, the assessment of the quality of the descriptors used as structural fingerprints in many types of MLPs, and the detection of possibly unreliable data points. The BAH algorithm is illustrated for the example of high-dimensional neural network potentials using atom-centered symmetry functions for the geometrical description of the atomic environments, but the method is general and can be combined with any current type of MLP.
Collapse
|
128
|
Doerr S, Majewski M, Pérez A, Krämer A, Clementi C, Noe F, Giorgino T, De Fabritiis G. TorchMD: A Deep Learning Framework for Molecular Simulations. J Chem Theory Comput 2021; 17:2355-2363. [PMID: 33729795 PMCID: PMC8486166 DOI: 10.1021/acs.jctc.0c01343] [Citation(s) in RCA: 70] [Impact Index Per Article: 23.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2020] [Indexed: 11/28/2022]
Abstract
Molecular dynamics simulations provide a mechanistic description of molecules by relying on empirical potentials. The quality and transferability of such potentials can be improved leveraging data-driven models derived with machine learning approaches. Here, we present TorchMD, a framework for molecular simulations with mixed classical and machine learning potentials. All force computations including bond, angle, dihedral, Lennard-Jones, and Coulomb interactions are expressed as PyTorch arrays and operations. Moreover, TorchMD enables learning and simulating neural network potentials. We validate it using standard Amber all-atom simulations, learning an ab initio potential, performing an end-to-end training, and finally learning and simulating a coarse-grained model for protein folding. We believe that TorchMD provides a useful tool set to support molecular simulations of machine learning potentials. Code and data are freely available at github.com/torchmd.
Collapse
Affiliation(s)
| | - Maciej Majewski
- Computational
Science Laboratory, Universitat Pompeu Fabra, 08003 Barcelona, Spain
| | - Adrià Pérez
- Computational
Science Laboratory, Universitat Pompeu Fabra, 08003 Barcelona, Spain
| | - Andreas Krämer
- Department
of Mathematics and Computer Science, Freie
Universität, 14195 Berlin, Germany
| | - Cecilia Clementi
- Department
of Physics, Freie Universität, 14195 Berlin, Germany
- Department
of Chemistry, Rice University, Houston, 77005 Texas, United States
| | - Frank Noe
- Department
of Mathematics and Computer Science, Freie
Universität, 14195 Berlin, Germany
- Department
of Physics, Freie Universität, 14195 Berlin, Germany
- Department
of Chemistry, Rice University, Houston, 77005 Texas, United States
| | - Toni Giorgino
- Biophysics
Institute, National Research Council (CNR-IBF), 20133 Milano, Italy
- Department
of Biosciences, Università degli
Studi di Milano, 20133 Milano, Italy
| | - Gianni De Fabritiis
- Acellera, 08005 Barcelona, Spain
- Computational
Science Laboratory, Universitat Pompeu Fabra, 08003 Barcelona, Spain
- Institució
Catalana de Recerca i Estudis Avançats, 08010 Barcelona, Spain
| |
Collapse
|
129
|
Loeffler JR, Fernández-Quintero ML, Waibl F, Quoika PK, Hofer F, Schauperl M, Liedl KR. Conformational Shifts of Stacked Heteroaromatics: Vacuum vs. Water Studied by Machine Learning. Front Chem 2021; 9:641610. [PMID: 33842433 PMCID: PMC8032969 DOI: 10.3389/fchem.2021.641610] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2020] [Accepted: 03/08/2021] [Indexed: 11/13/2022] Open
Abstract
Stacking interactions play a crucial role in drug design, as we can find aromatic cores or scaffolds in almost any available small molecule drug. To predict optimal binding geometries and enhance stacking interactions, usually high-level quantum mechanical calculations are performed. These calculations have two major drawbacks: they are very time consuming, and solvation can only be considered using implicit solvation. Therefore, most calculations are performed in vacuum. However, recent studies have revealed a direct correlation between the desolvation penalty, vacuum stacking interactions and binding affinity, making predictions even more difficult. To overcome the drawbacks of quantum mechanical calculations, in this study we use neural networks to perform fast geometry optimizations and molecular dynamics simulations of heteroaromatics stacked with toluene in vacuum and in explicit solvation. We show that the resulting energies in vacuum are in good agreement with high-level quantum mechanical calculations. Furthermore, we show that using explicit solvation substantially influences the favored orientations of heteroaromatic rings thereby emphasizing the necessity to include solvation properties starting from the earliest phases of drug design.
Collapse
Affiliation(s)
- Johannes R Loeffler
- Center of Molecular Biosciences Innsbruck, Institute of General, Inorganic and Theoretical Chemistry, University of Innsbruck, Innsbruck, Austria
| | - Monica L Fernández-Quintero
- Center of Molecular Biosciences Innsbruck, Institute of General, Inorganic and Theoretical Chemistry, University of Innsbruck, Innsbruck, Austria
| | - Franz Waibl
- Center of Molecular Biosciences Innsbruck, Institute of General, Inorganic and Theoretical Chemistry, University of Innsbruck, Innsbruck, Austria
| | - Patrick K Quoika
- Center of Molecular Biosciences Innsbruck, Institute of General, Inorganic and Theoretical Chemistry, University of Innsbruck, Innsbruck, Austria
| | - Florian Hofer
- Center of Molecular Biosciences Innsbruck, Institute of General, Inorganic and Theoretical Chemistry, University of Innsbruck, Innsbruck, Austria
| | - Michael Schauperl
- Center of Molecular Biosciences Innsbruck, Institute of General, Inorganic and Theoretical Chemistry, University of Innsbruck, Innsbruck, Austria
| | - Klaus R Liedl
- Center of Molecular Biosciences Innsbruck, Institute of General, Inorganic and Theoretical Chemistry, University of Innsbruck, Innsbruck, Austria
| |
Collapse
|
130
|
Gong S, Wang Y, Tian Y, Wang L, Liu G. Rapid enthalpy prediction of transition states using molecular graph convolutional network. AIChE J 2021. [DOI: 10.1002/aic.17269] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Affiliation(s)
- Siyuan Gong
- Key Laboratory for Green Chemical Technology of Ministry of Education, School of Chemical Engineering and Technology Tianjin University Tianjin China
| | - Yutong Wang
- Key Laboratory for Green Chemical Technology of Ministry of Education, School of Chemical Engineering and Technology Tianjin University Tianjin China
| | - Yajie Tian
- Key Laboratory for Green Chemical Technology of Ministry of Education, School of Chemical Engineering and Technology Tianjin University Tianjin China
- Henan Engineering Research Center of Resource and Energy Recovery from Waste, College of Chemistry and Chemical Engineering Henan University Kaifeng China
| | - Li Wang
- Key Laboratory for Green Chemical Technology of Ministry of Education, School of Chemical Engineering and Technology Tianjin University Tianjin China
- Collaborative Innovation Center of Chemical Science and Engineering (Tianjin) Tianjin University Tianjin China
| | - Guozhu Liu
- Key Laboratory for Green Chemical Technology of Ministry of Education, School of Chemical Engineering and Technology Tianjin University Tianjin China
- Collaborative Innovation Center of Chemical Science and Engineering (Tianjin) Tianjin University Tianjin China
| |
Collapse
|
131
|
Rosenberger D, Smith JS, Garcia AE. Modeling of Peptides with Classical and Novel Machine Learning Force Fields: A Comparison. J Phys Chem B 2021; 125:3598-3612. [DOI: 10.1021/acs.jpcb.0c10401] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023]
Affiliation(s)
- David Rosenberger
- Los Alamos National Laboratory, Theoretical Division, Chemistry and Physics of Materials Group, Los Alamos, 87545 New Mexico, United States
- Los Alamos National Laboratory, Theoretical Division, Center for Nonlinear Studies, Los Alamos, 87545 New Mexico, United States
| | - Justin S. Smith
- Los Alamos National Laboratory, Theoretical Division, Chemistry and Physics of Materials Group, Los Alamos, 87545 New Mexico, United States
| | - Angel E. Garcia
- Los Alamos National Laboratory, Theoretical Division, Center for Nonlinear Studies, Los Alamos, 87545 New Mexico, United States
| |
Collapse
|
132
|
Morawietz T, Artrith N. Machine learning-accelerated quantum mechanics-based atomistic simulations for industrial applications. J Comput Aided Mol Des 2021; 35:557-586. [PMID: 33034008 PMCID: PMC8018928 DOI: 10.1007/s10822-020-00346-6] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2020] [Accepted: 09/26/2020] [Indexed: 01/13/2023]
Abstract
Atomistic simulations have become an invaluable tool for industrial applications ranging from the optimization of protein-ligand interactions for drug discovery to the design of new materials for energy applications. Here we review recent advances in the use of machine learning (ML) methods for accelerated simulations based on a quantum mechanical (QM) description of the system. We show how recent progress in ML methods has dramatically extended the applicability range of conventional QM-based simulations, allowing to calculate industrially relevant properties with enhanced accuracy, at reduced computational cost, and for length and time scales that would have otherwise not been accessible. We illustrate the benefits of ML-accelerated atomistic simulations for industrial R&D processes by showcasing relevant applications from two very different areas, drug discovery (pharmaceuticals) and energy materials. Writing from the perspective of both a molecular and a materials modeling scientist, this review aims to provide a unified picture of the impact of ML-accelerated atomistic simulations on the pharmaceutical, chemical, and materials industries and gives an outlook on the exciting opportunities that could emerge in the future.
Collapse
Affiliation(s)
- Tobias Morawietz
- Bayer AG, Pharmaceuticals, R&D, Digital Technologies, Computational Molecular Design, 42096 Wuppertal, Germany
| | - Nongnuch Artrith
- Department of Chemical Engineering, Columbia University, New York, NY 10027 USA
| |
Collapse
|
133
|
Affiliation(s)
- Jörg Behler
- Universität Göttingen, Institut für Physikalische Chemie, Theoretische Chemie, Tammannstraße 6, 37077 Göttingen, Germany
| |
Collapse
|
134
|
Lu J, Xia S, Lu J, Zhang Y. Dataset Construction to Explore Chemical Space with 3D Geometry and Deep Learning. J Chem Inf Model 2021; 61:1095-1104. [PMID: 33683885 DOI: 10.1021/acs.jcim.1c00007] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
A dataset is the basis of deep learning model development, and the success of deep learning models heavily relies on the quality and size of the dataset. In this work, we present a new data preparation protocol and build a large fragment-based dataset Frag20, which consists of optimized 3D geometries and calculated molecular properties from Merck molecular force field (MMFF) and DFT at the B3LYP/6-31G* level of theory for more than half a million molecules composed of H, B, C, O, N, F, P, S, Cl, and Br with no larger than 20 heavy atoms. Based on the new dataset, we develop robust molecular energy prediction models using a simplified PhysNet architecture for both DFT-optimized and MMFF-optimized geometries, which achieve better than or close to chemical accuracy (1 kcal/mol) on multiple test sets, including CSD20 and Plati20 based on experimental crystal structures.
Collapse
Affiliation(s)
- Jianing Lu
- Department of Chemistry, New York University, New York, New York 10003, United States
| | - Song Xia
- Department of Chemistry, New York University, New York, New York 10003, United States
| | - Jieyu Lu
- Department of Chemistry, New York University, New York, New York 10003, United States
| | - Yingkai Zhang
- Department of Chemistry, New York University, New York, New York 10003, United States.,NYU-ECNU Center for Computational Chemistry at NYU Shanghai, Shanghai 200062, China
| |
Collapse
|
135
|
Vassilev-Galindo V, Fonseca G, Poltavsky I, Tkatchenko A. Challenges for machine learning force fields in reproducing potential energy surfaces of flexible molecules. J Chem Phys 2021; 154:094119. [DOI: 10.1063/5.0038516] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Affiliation(s)
- Valentin Vassilev-Galindo
- Department of Physics and Materials Science, University of Luxembourg, L-1511 Luxembourg City, Luxembourg
| | - Gregory Fonseca
- Department of Physics and Materials Science, University of Luxembourg, L-1511 Luxembourg City, Luxembourg
| | - Igor Poltavsky
- Department of Physics and Materials Science, University of Luxembourg, L-1511 Luxembourg City, Luxembourg
| | - Alexandre Tkatchenko
- Department of Physics and Materials Science, University of Luxembourg, L-1511 Luxembourg City, Luxembourg
| |
Collapse
|
136
|
|
137
|
Allen AEA, Dusson G, Ortner C, Csányi G. Atomic permutationally invariant polynomials for fitting molecular force fields. MACHINE LEARNING-SCIENCE AND TECHNOLOGY 2021. [DOI: 10.1088/2632-2153/abd51e] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
|
138
|
Liu Z, Lin L, Jia Q, Cheng Z, Jiang Y, Guo Y, Ma J. Transferable Multilevel Attention Neural Network for Accurate Prediction of Quantum Chemistry Properties via Multitask Learning. J Chem Inf Model 2021; 61:1066-1082. [PMID: 33629839 DOI: 10.1021/acs.jcim.0c01224] [Citation(s) in RCA: 42] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
Abstract
The development of efficient models for predicting specific properties through machine learning is of great importance for the innovation of chemistry and material science. However, predicting global electronic structure properties like Frontier molecular orbital highest occupied molecular orbital (HOMO) and lowest unoccupied molecular orbital (LUMO) energy levels and their HOMO-LUMO gaps from the small-sized molecule data to larger molecules remains a challenge. Here, we develop a multilevel attention neural network, named DeepMoleNet, to enable chemical interpretable insights being fused into multitask learning through (1) weighting contributions from various atoms and (2) taking the atom-centered symmetry functions (ACSFs) as the teacher descriptor. The efficient prediction of 12 properties including dipole moment, HOMO, and Gibbs free energy within chemical accuracy is achieved by using multiple benchmarks, both at the equilibrium and nonequilibrium geometries, including up to 110,000 records of data in QM9, 400,000 records in MD17, and 280,000 records in ANI-1ccx for random split evaluation. The good transferability for predicting larger molecules outside the training set is demonstrated in both equilibrium QM9 and Alchemy data sets at the density functional theory (DFT) level. Additional tests on nonequilibrium molecular conformations from DFT-based MD17 data set and ANI-1ccx data set with coupled cluster accuracy as well as the public test sets of singlet fission molecules, biomolecules, long oligomers, and protein with up to 140 atoms show reasonable predictions for thermodynamics and electronic structure properties. The proposed multilevel attention neural network is applicable to high-throughput screening of numerous chemical species in both equilibrium and nonequilibrium molecular spaces to accelerate rational designs of drug-like molecules, material candidates, and chemical reactions.
Collapse
Affiliation(s)
- Ziteng Liu
- Key Laboratory of Mesoscopic Chemistry of Ministry of Education, School of Chemistry and Chemical Engineering, Nanjing University, Nanjing 210023, P. R. China
| | - Liqiang Lin
- National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing 210023, P. R. China
| | - Qingqing Jia
- Key Laboratory of Mesoscopic Chemistry of Ministry of Education, School of Chemistry and Chemical Engineering, Nanjing University, Nanjing 210023, P. R. China
| | - Zheng Cheng
- Key Laboratory of Mesoscopic Chemistry of Ministry of Education, School of Chemistry and Chemical Engineering, Nanjing University, Nanjing 210023, P. R. China
| | - Yanyan Jiang
- National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing 210023, P. R. China
| | - Yanwen Guo
- National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing 210023, P. R. China
| | - Jing Ma
- Key Laboratory of Mesoscopic Chemistry of Ministry of Education, School of Chemistry and Chemical Engineering, Nanjing University, Nanjing 210023, P. R. China.,Jiangsu Key Laboratory of Advanced Organic Materials, School of Chemistry and Chemical Engineering, Nanjing University, Nanjing 210023, P. R. China
| |
Collapse
|
139
|
Automated discovery of a robust interatomic potential for aluminum. Nat Commun 2021; 12:1257. [PMID: 33623036 PMCID: PMC7902823 DOI: 10.1038/s41467-021-21376-0] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2020] [Accepted: 01/15/2021] [Indexed: 11/22/2022] Open
Abstract
Machine learning, trained on quantum mechanics (QM) calculations, is a powerful tool for modeling potential energy surfaces. A critical factor is the quality and diversity of the training dataset. Here we present a highly automated approach to dataset construction and demonstrate the method by building a potential for elemental aluminum (ANI-Al). In our active learning scheme, the ML potential under development is used to drive non-equilibrium molecular dynamics simulations with time-varying applied temperatures. Whenever a configuration is reached for which the ML uncertainty is large, new QM data is collected. The ML model is periodically retrained on all available QM data. The final ANI-Al potential makes very accurate predictions of radial distribution function in melt, liquid-solid coexistence curve, and crystal properties such as defect energies and barriers. We perform a 1.3M atom shock simulation and show that ANI-Al force predictions shine in their agreement with new reference DFT calculations. The accuracy of a machine-learned potential is limited by the quality and diversity of the training dataset. Here the authors propose an active learning approach to automatically construct general purpose machine-learning potentials here demonstrated for the aluminum case.
Collapse
|
140
|
Fingerprint-Based Detection of Non-Local Effects in the Electronic Structure of a Simple Single Component Covalent System. CONDENSED MATTER 2021. [DOI: 10.3390/condmat6010009] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Using fingerprints used mainly in machine learning schemes of the potential energy surface, we detect in a fully algorithmic way long range effects on local physical properties in a simple covalent system of carbon atoms. The fact that these long range effects exist for many configurations implies that atomistic simulation methods, such as force fields or modern machine learning schemes, that are based on locality assumptions, are limited in accuracy. We show that the basic driving mechanism for the long range effects is charge transfer. If the charge transfer is known, locality can be recovered for certain quantities such as the band structure energy.
Collapse
|
141
|
Ko TW, Finkler JA, Goedecker S, Behler J. General-Purpose Machine Learning Potentials Capturing Nonlocal Charge Transfer. Acc Chem Res 2021; 54:808-817. [PMID: 33513012 DOI: 10.1021/acs.accounts.0c00689] [Citation(s) in RCA: 48] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
The development of first-principles-quality machine learning potentials (MLP) has seen tremendous progress, now enabling computer simulations of complex systems for which sufficiently accurate interatomic potentials have not been available. These advances and the increasing use of MLPs for more and more diverse systems gave rise to new questions regarding their applicability and limitations, which has constantly driven new developments. The resulting MLPs can be classified into several generations depending on the types of systems they are able to describe. First-generation MLPs, as introduced 25 years ago, have been applicable to low-dimensional systems such as small molecules. MLPs became a practical tool for complex systems in chemistry and materials science with the introduction of high-dimensional neural network potentials (HDNNP) in 2007, which represented the first MLP of the second generation. Second-generation MLPs are based on the concept of locality and express the total energy as a sum of environment-dependent atomic energies, which allows applications to very large systems containing thousands of atoms with linearly scaling computational costs. Since second-generation MLPs do not consider interactions beyond the local chemical environments, a natural extension has been the inclusion of long-range interactions without truncation, mainly electrostatics, employing environment-dependent charges establishing the third MLP generation. A variety of second- and, to some extent, also third-generation MLPs are currently the standard methods in ML-based atomistic simulations.In spite of countless successful applications, in recent years it has been recognized that the accuracy of MLPs relying on local atomic energies and charges is still insufficient for systems with long-ranged dependencies in the electronic structure. These can, for instance, result from nonlocal charge transfer or ionization and are omnipresent in many important types of systems and chemical processes such as the protonation and deprotonation of organic and biomolecules, redox reactions, and defects and doping in materials. In all of these situations, small local modifications can change the system globally, resulting in different equilibrium structures, charge distributions, and reactivity. These phenomena cannot be captured by second- and third-generation MLPs. Consequently, the inclusion of nonlocal phenomena has been identified as a next key step in the development of a new fourth generation of MLPs. While a first fourth-generation MLP, the charge equilibration neural network technique (CENT), was introduced in 2015, only very recently have a range of new general-purpose methods applicable to a broad range of physical scenarios emerged. In this Account, we show how fourth-generation HDNNPs can be obtained by combining the concepts of CENT and second-generation HDNNPs. These new MLPs allow for a highly accurate description of systems where nonlocal charge transfer is important.
Collapse
Affiliation(s)
- Tsz Wai Ko
- Universität Göttingen, Institut für Physikalische Chemie, Theoretische Chemie, Tammannstraße 6, 37077 Göttingen, Germany
| | - Jonas A. Finkler
- Department of Physics, Universität Basel, Klingelbergstrasse 82, 4056 Basel, Switzerland
| | - Stefan Goedecker
- Department of Physics, Universität Basel, Klingelbergstrasse 82, 4056 Basel, Switzerland
| | - Jörg Behler
- Universität Göttingen, Institut für Physikalische Chemie, Theoretische Chemie, Tammannstraße 6, 37077 Göttingen, Germany
| |
Collapse
|
142
|
Husch T, Sun J, Cheng L, Lee SJR, Miller TF. Improved accuracy and transferability of molecular-orbital-based machine learning: Organics, transition-metal complexes, non-covalent interactions, and transition states. J Chem Phys 2021; 154:064108. [DOI: 10.1063/5.0032362] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/02/2023] Open
Affiliation(s)
- Tamara Husch
- Division of Chemistry and Chemical Engineering, California Institute of Technology, Pasadena, California 91125, USA
| | - Jiace Sun
- Division of Chemistry and Chemical Engineering, California Institute of Technology, Pasadena, California 91125, USA
| | - Lixue Cheng
- Division of Chemistry and Chemical Engineering, California Institute of Technology, Pasadena, California 91125, USA
| | - Sebastian J. R. Lee
- Division of Chemistry and Chemical Engineering, California Institute of Technology, Pasadena, California 91125, USA
| | - Thomas F. Miller
- Division of Chemistry and Chemical Engineering, California Institute of Technology, Pasadena, California 91125, USA
| |
Collapse
|
143
|
Druchok M, Yarish D, Gurbych O, Maksymenko M. Toward efficient generation, correction, and properties control of unique drug‐like structures. J Comput Chem 2021; 42:746-760. [DOI: 10.1002/jcc.26494] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2020] [Revised: 12/21/2020] [Accepted: 01/25/2021] [Indexed: 01/01/2023]
Affiliation(s)
- Maksym Druchok
- SoftServe, Inc Lviv Ukraine
- Institute for Condensed Matter Physics Lviv Ukraine
| | | | | | | |
Collapse
|
144
|
Yue S, Muniz MC, Calegari Andrade MF, Zhang L, Car R, Panagiotopoulos AZ. When do short-range atomistic machine-learning models fall short? J Chem Phys 2021; 154:034111. [DOI: 10.1063/5.0031215] [Citation(s) in RCA: 25] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022] Open
Affiliation(s)
- Shuwen Yue
- Department of Chemical and Biological Engineering, Princeton University, Princeton, New Jersey 08544, USA
| | - Maria Carolina Muniz
- Department of Chemical and Biological Engineering, Princeton University, Princeton, New Jersey 08544, USA
| | | | - Linfeng Zhang
- Program in Applied and Computational Mathematics, Princeton University, Princeton, New Jersey 08544, USA
| | - Roberto Car
- Department of Chemistry, Princeton University, Princeton, New Jersey 08544, USA
| | | |
Collapse
|
145
|
Ha JK, Kim K, Min SK. Machine Learning-Assisted Excited State Molecular Dynamics with the State-Interaction State-Averaged Spin-Restricted Ensemble-Referenced Kohn-Sham Approach. J Chem Theory Comput 2021; 17:694-702. [PMID: 33470100 DOI: 10.1021/acs.jctc.0c01261] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Abstract
We present a machine learning-assisted excited state molecular dynamics (ML-ESMD) based on the ensemble density functional theory framework. Since we represent a diabatic Hamiltonian in terms of generalized valence bond ansatz within the state-interaction state-averaged spin-restricted ensemble-referenced Kohn-Sham (SI-SA-REKS) method, we can avoid singularities near conical intersections, which are crucial in excited state molecular dynamics simulations. We train the diabatic Hamiltonian elements and their analytical gradients with the SchNet architecture to construct machine learning models, while the phase freedom of off-diagonal elements of the Hamiltonian is cured by introducing the phase-less loss function. Our machine learning models show reasonable accuracy with mean absolute errors of ∼0.1 kcal/mol and ∼0.5 kcal/mol/Å for the diabatic Hamiltonian elements and their gradients, respectively, for penta-2,4-dieniminium cation. Moreover, by exploiting the diabatic representation, our models can predict correct conical intersection structures and their topologies. In addition, our ML-ESMD simulations give almost identical result with a direct dynamics at the same level of theory.
Collapse
Affiliation(s)
- Jong-Kwon Ha
- Department of Chemistry, School of Natural Science, Ulsan National Institute of Science and Technology (UNIST), 50 UNIST-gil, Ulju-gun, Ulsan 44919, South Korea
| | - Kicheol Kim
- Department of Chemistry, School of Natural Science, Ulsan National Institute of Science and Technology (UNIST), 50 UNIST-gil, Ulju-gun, Ulsan 44919, South Korea
| | - Seung Kyu Min
- Department of Chemistry, School of Natural Science, Ulsan National Institute of Science and Technology (UNIST), 50 UNIST-gil, Ulju-gun, Ulsan 44919, South Korea
| |
Collapse
|
146
|
Ko TW, Finkler JA, Goedecker S, Behler J. A fourth-generation high-dimensional neural network potential with accurate electrostatics including non-local charge transfer. Nat Commun 2021; 12:398. [PMID: 33452239 PMCID: PMC7811002 DOI: 10.1038/s41467-020-20427-2] [Citation(s) in RCA: 148] [Impact Index Per Article: 49.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2020] [Accepted: 11/18/2020] [Indexed: 11/16/2022] Open
Abstract
Machine learning potentials have become an important tool for atomistic simulations in many fields, from chemistry via molecular biology to materials science. Most of the established methods, however, rely on local properties and are thus unable to take global changes in the electronic structure into account, which result from long-range charge transfer or different charge states. In this work we overcome this limitation by introducing a fourth-generation high-dimensional neural network potential that combines a charge equilibration scheme employing environment-dependent atomic electronegativities with accurate atomic energies. The method, which is able to correctly describe global charge distributions in arbitrary systems, yields much improved energies and substantially extends the applicability of modern machine learning potentials. This is demonstrated for a series of systems representing typical scenarios in chemistry and materials science that are incorrectly described by current methods, while the fourth-generation neural network potential is in excellent agreement with electronic structure calculations. Machine learning potentials do not account for long-range charge transfer. Here the authors introduce a fourth-generation high-dimensional neural network potential including non-local information of charge populations that is able to provide forces, charges and energies in excellent agreement with DFT data.
Collapse
Affiliation(s)
- Tsz Wai Ko
- Universität Göttingen, Institut für Physikalische Chemie, Theoretische Chemie, Tammannstraße 6, 37077, Göttingen, Germany.
| | - Jonas A Finkler
- Department of Physics, Universität Basel, Klingelbergstrasse 82, 4056, Basel, Switzerland.
| | - Stefan Goedecker
- Department of Physics, Universität Basel, Klingelbergstrasse 82, 4056, Basel, Switzerland
| | - Jörg Behler
- Universität Göttingen, Institut für Physikalische Chemie, Theoretische Chemie, Tammannstraße 6, 37077, Göttingen, Germany
| |
Collapse
|
147
|
Han R, Rodríguez-Mayorga M, Luber S. A Machine Learning Approach for MP2 Correlation Energies and Its Application to Organic Compounds. J Chem Theory Comput 2021; 17:777-790. [DOI: 10.1021/acs.jctc.0c00898] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Affiliation(s)
- Ruocheng Han
- Department of Chemistry A, University of Zurich, Winterthurerstrasse 190, 8057 Zurich, Switzerland
| | | | - Sandra Luber
- Department of Chemistry A, University of Zurich, Winterthurerstrasse 190, 8057 Zurich, Switzerland
| |
Collapse
|
148
|
Shimamura K, Takeshita Y, Fukushima S, Koura A, Shimojo F. Computational and training requirements for interatomic potential based on artificial neural network for estimating low thermal conductivity of silver chalcogenides. J Chem Phys 2020; 153:234301. [PMID: 33353316 DOI: 10.1063/5.0027058] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022] Open
Abstract
We examined the estimation of thermal conductivity through molecular dynamics simulations for a superionic conductor, α-Ag2Se, using the interatomic potential based on an artificial neural network (ANN potential). The training data were created using the existing empirical potential of Ag2Se to help find suitable computational and training requirements for the ANN potential, with the intent to apply them to first-principles calculations. The thermal conductivities calculated using different definitions of heat flux were compared, and the effect of explicit long-range Coulomb interaction on the conductivities was investigated. We clarified that using a rigorous heat flux formula for the ANN potential, even for highly ionic α-Ag2Se, the resulting thermal conductivity was reasonably consistent with the reference value without explicitly considering Coulomb interaction. It was found that ANN training including the virial term played an important role in reducing the dependency of thermal conductivity on the initial values of the weight parameters of the ANN.
Collapse
Affiliation(s)
- Kohei Shimamura
- Department of Physics, Kumamoto University, Kumamoto 860-8555, Japan
| | - Yusuke Takeshita
- Department of Physics, Kumamoto University, Kumamoto 860-8555, Japan
| | - Shogo Fukushima
- Department of Physics, Kumamoto University, Kumamoto 860-8555, Japan
| | - Akihide Koura
- Department of Physics, Kumamoto University, Kumamoto 860-8555, Japan
| | - Fuyuki Shimojo
- Department of Physics, Kumamoto University, Kumamoto 860-8555, Japan
| |
Collapse
|
149
|
Nguyen KA, Pachter R, Day PN. Systematic Study of the Properties of CdS Clusters with Carboxylate Ligands Using a Deep Neural Network Potential Developed with Data from Density Functional Theory Calculations. J Phys Chem A 2020; 124:10472-10481. [PMID: 33271016 DOI: 10.1021/acs.jpca.0c06965] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Although structures of the inorganic core of CdS atomically precise quantum dots were reported, characterizing the nature of the metal-carboxylate coordination in these materials remains a challenge due to the large number of possible isomers. The computational cost imposed by first-principles methods is prohibitive for such a configurational search, and empirical potentials are not available. In this work, we applied deep neural network algorithms to train a potential for CdS clusters with carboxylate ligands using a database of energies and gradients obtained from density functional theory calculations. The derived potential provided energies and gradients based on a set of reference structures. Our trained potential was then used to accelerate genetic algorithm and molecular dynamics simulations searches of low-energy structures, which in turn, were used to compute the X-ray diffraction and electronic absorption spectra. Our results for CdS clusters with carboxylate ligands, analyzed and compared with experimental findings, demonstrated that the structure of a cluster whose properties agree better with experiment may deviate from the one previously assumed.
Collapse
Affiliation(s)
- Kiet A Nguyen
- Air Force Research Laboratory, Wright-Patterson Air Force Base, Ohio 45433, United States.,UES, Inc. Dayton, Ohio 45432, United States
| | - Ruth Pachter
- Air Force Research Laboratory, Wright-Patterson Air Force Base, Ohio 45433, United States
| | - Paul N Day
- Air Force Research Laboratory, Wright-Patterson Air Force Base, Ohio 45433, United States.,UES, Inc. Dayton, Ohio 45432, United States
| |
Collapse
|
150
|
Grisafi A, Nigam J, Ceriotti M. Multi-scale approach for the prediction of atomic scale properties. Chem Sci 2020; 12:2078-2090. [PMID: 34163971 PMCID: PMC8179303 DOI: 10.1039/d0sc04934d] [Citation(s) in RCA: 26] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022] Open
Abstract
Electronic nearsightedness is one of the fundamental principles that governs the behavior of condensed matter and supports its description in terms of local entities such as chemical bonds. Locality also underlies the tremendous success of machine-learning schemes that predict quantum mechanical observables - such as the cohesive energy, the electron density, or a variety of response properties - as a sum of atom-centred contributions, based on a short-range representation of atomic environments. One of the main shortcomings of these approaches is their inability to capture physical effects ranging from electrostatic interactions to quantum delocalization, which have a long-range nature. Here we show how to build a multi-scale scheme that combines in the same framework local and non-local information, overcoming such limitations. We show that the simplest version of such features can be put in formal correspondence with a multipole expansion of permanent electrostatics. The data-driven nature of the model construction, however, makes this simple form suitable to tackle also different types of delocalized and collective effects. We present several examples that range from molecular physics to surface science and biophysics, demonstrating the ability of this multi-scale approach to model interactions driven by electrostatics, polarization and dispersion, as well as the cooperative behavior of dielectric response functions.
Collapse
Affiliation(s)
- Andrea Grisafi
- Laboratory of Computational Science and Modeling, IMX, École Polytechnique Fédérale de Lausanne 1015 Lausanne Switzerland
| | - Jigyasa Nigam
- Laboratory of Computational Science and Modeling, IMX, École Polytechnique Fédérale de Lausanne 1015 Lausanne Switzerland .,National Centre for Computational Design and Discovery of Novel Materials (MARVEL), École Polytechnique Fédérale de Lausanne 1015 Lausanne Switzerland.,Indian Institute of Space Science and Technology Thiruvananthapuram 695547 India
| | - Michele Ceriotti
- Laboratory of Computational Science and Modeling, IMX, École Polytechnique Fédérale de Lausanne 1015 Lausanne Switzerland .,National Centre for Computational Design and Discovery of Novel Materials (MARVEL), École Polytechnique Fédérale de Lausanne 1015 Lausanne Switzerland
| |
Collapse
|