1
|
Hou YF, Zhang L, Zhang Q, Ge F, Dral PO. Physics-Informed Active Learning for Accelerating Quantum Chemical Simulations. J Chem Theory Comput 2024. [PMID: 39264419 DOI: 10.1021/acs.jctc.4c00821] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/13/2024]
Abstract
Quantum chemical simulations can be greatly accelerated by constructing machine learning potentials, which is often done using active learning (AL). The usefulness of the constructed potentials is often limited by the high effort required and their insufficient robustness in the simulations. Here, we introduce the end-to-end AL for constructing robust data-efficient potentials with affordable investment of time and resources and minimum human interference. Our AL protocol is based on the physics-informed sampling of training points, automatic selection of initial data, uncertainty quantification, and convergence monitoring. The versatility of this protocol is shown in our implementation of quasi-classical molecular dynamics for simulating vibrational spectra, conformer search of a key biochemical molecule, and time-resolved mechanism of the Diels-Alder reaction. These investigations took us days instead of weeks of pure quantum chemical calculations on a high-performance computing cluster.
Collapse
Affiliation(s)
- Yi-Fan Hou
- State Key Laboratory of Physical Chemistry of Solid Surfaces, College of Chemistry and Chemical Engineering, Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, and Innovation Laboratory for Sciences and Technologies of Energy Materials of Fujian Province (IKKEM), Xiamen University, Xiamen, Fujian 361005, China
| | - Lina Zhang
- State Key Laboratory of Physical Chemistry of Solid Surfaces, College of Chemistry and Chemical Engineering, Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, and Innovation Laboratory for Sciences and Technologies of Energy Materials of Fujian Province (IKKEM), Xiamen University, Xiamen, Fujian 361005, China
| | - Quanhao Zhang
- State Key Laboratory of Physical Chemistry of Solid Surfaces, College of Chemistry and Chemical Engineering, Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, and Innovation Laboratory for Sciences and Technologies of Energy Materials of Fujian Province (IKKEM), Xiamen University, Xiamen, Fujian 361005, China
| | - Fuchun Ge
- State Key Laboratory of Physical Chemistry of Solid Surfaces, College of Chemistry and Chemical Engineering, Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, and Innovation Laboratory for Sciences and Technologies of Energy Materials of Fujian Province (IKKEM), Xiamen University, Xiamen, Fujian 361005, China
| | - Pavlo O Dral
- State Key Laboratory of Physical Chemistry of Solid Surfaces, College of Chemistry and Chemical Engineering, Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, and Innovation Laboratory for Sciences and Technologies of Energy Materials of Fujian Province (IKKEM), Xiamen University, Xiamen, Fujian 361005, China
- Institute of Physics, Faculty of Physics, Astronomy, and Informatics, Nicolaus Copernicus University in Toruń, ul. Grudziądzka 5, Toruń 87-100, Poland
| |
Collapse
|
2
|
Jin Y, Perez-Lemus GR, Zubieta Rico PF, de Pablo JJ. Improving Machine Learned Force Fields for Complex Fluids through Enhanced Sampling: A Liquid Crystal Case Study. J Phys Chem A 2024; 128:7257-7268. [PMID: 39150905 DOI: 10.1021/acs.jpca.4c01546] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/18/2024]
Abstract
Machine learned force fields offer the potential for faster execution times while retaining the accuracy of traditional DFT calculations, making them promising candidates for molecular simulations in cases where reliable classical force fields are not available. Some of the challenges associated with machine learned force fields include simulation stability over extended periods of time and ensuring that the statistical and dynamical properties of the underlying simulated systems are correctly captured. In this work, we propose a systematic training pipeline for such force fields that leads to improved model quality, compared to that achieved by traditional data generation and training approaches. That pipeline relies on the use of enhanced sampling techniques, and it is demonstrated here in the context of a liquid crystal, which exemplifies many of the challenges that are encountered in fluids and materials with complex free energy landscapes. Our results indicate that, whereas the majority of traditional machine learned force field training approaches lead to molecular dynamics simulations that are only stable over hundred-picosecond trajectories, our approach allows for stable simulations over tens of nanoseconds for organic molecular systems comprising thousands of atoms.
Collapse
Affiliation(s)
- Yezhi Jin
- Pritzker School of Molecular Engineering, University of Chicago, Chicago, Illinois 60637-1476, United States
| | - Gustavo R Perez-Lemus
- Pritzker School of Molecular Engineering, University of Chicago, Chicago, Illinois 60637-1476, United States
| | - Pablo F Zubieta Rico
- Pritzker School of Molecular Engineering, University of Chicago, Chicago, Illinois 60637-1476, United States
| | - Juan J de Pablo
- Pritzker School of Molecular Engineering, University of Chicago, Chicago, Illinois 60637-1476, United States
| |
Collapse
|
3
|
Williams CD, Kalayan J, Burton NA, Bryce RA. Stable and accurate atomistic simulations of flexible molecules using conformationally generalisable machine learned potentials. Chem Sci 2024; 15:12780-12795. [PMID: 39148799 PMCID: PMC11323334 DOI: 10.1039/d4sc01109k] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2024] [Accepted: 07/07/2024] [Indexed: 08/17/2024] Open
Abstract
Computational simulation methods based on machine learned potentials (MLPs) promise to revolutionise shape prediction of flexible molecules in solution, but their widespread adoption has been limited by the way in which training data is generated. Here, we present an approach which allows the key conformational degrees of freedom to be properly represented in reference molecular datasets. MLPs trained on these datasets using a global descriptor scheme are generalisable in conformational space, providing quantum chemical accuracy for all conformers. These MLPs are capable of propagating long, stable molecular dynamics trajectories, an attribute that has remained a challenge. We deploy the MLPs in obtaining converged conformational free energy surfaces for flexible molecules via well-tempered metadynamics simulations; this approach provides a hitherto inaccessible route to accurately computing the structural, dynamical and thermodynamical properties of a wide variety of flexible molecular systems. It is further demonstrated that MLPs must be trained on reference datasets with complete coverage of conformational space, including in barrier regions, to achieve stable molecular dynamics trajectories.
Collapse
Affiliation(s)
- Christopher D Williams
- Division of Pharmacy and Optometry, School of Health Sciences, Faculty of Biology, Medicine and Health, The University of Manchester Oxford Road Manchester M13 9PL UK
| | - Jas Kalayan
- Science and Technologies Facilities Council (STFC), Daresbury Laboratory Keckwick Lane, Daresbury Warrington WA4 4AD UK
| | - Neil A Burton
- Department of Chemistry, School of Natural Sciences, Faculty of Science and Engineering, The University of Manchester Oxford Road Manchester M13 9PL UK
| | - Richard A Bryce
- Division of Pharmacy and Optometry, School of Health Sciences, Faculty of Biology, Medicine and Health, The University of Manchester Oxford Road Manchester M13 9PL UK
| |
Collapse
|
4
|
Jin H, Merz KM. Modeling Zinc Complexes Using Neural Networks. J Chem Inf Model 2024; 64:3140-3148. [PMID: 38587510 PMCID: PMC11040731 DOI: 10.1021/acs.jcim.4c00095] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2024] [Revised: 03/04/2024] [Accepted: 03/28/2024] [Indexed: 04/09/2024]
Abstract
Understanding the energetic landscapes of large molecules is necessary for the study of chemical and biological systems. Recently, deep learning has greatly accelerated the development of models based on quantum chemistry, making it possible to build potential energy surfaces and explore chemical space. However, most of this work has focused on organic molecules due to the simplicity of their electronic structures as well as the availability of data sets. In this work, we build a deep learning architecture to model the energetics of zinc organometallic complexes. To achieve this, we have compiled a configurationally and conformationally diverse data set of zinc complexes using metadynamics to overcome the limitations of traditional sampling methods. In terms of the neural network potentials, our results indicate that for zinc complexes, partial charges play an important role in modeling the long-range interactions with a neural network. Our developed model outperforms semiempirical methods in predicting the relative energy of zinc conformers, yielding a mean absolute error (MAE) of 1.32 kcal/mol with reference to the double-hybrid PWPB95 method.
Collapse
Affiliation(s)
- Hongni Jin
- Department
of Chemistry, Michigan State University, East Lansing, Michigan 48824, United States
| | - Kenneth M. Merz
- Department
of Chemistry, Michigan State University, East Lansing, Michigan 48824, United States
- Department
of Biochemistry and Molecular Biology, Michigan
State University, East Lansing, Michigan 48824, United States
| |
Collapse
|
5
|
Jin H, Merz KM. Modeling Fe(II) Complexes Using Neural Networks. J Chem Theory Comput 2024; 20:2551-2558. [PMID: 38439716 PMCID: PMC10976644 DOI: 10.1021/acs.jctc.4c00063] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2024] [Revised: 02/18/2024] [Accepted: 02/22/2024] [Indexed: 03/06/2024]
Abstract
We report a Fe(II) data set of more than 23000 conformers in both low-spin (LS) and high-spin (HS) states. This data set was generated to develop a neural network model that is capable of predicting the energy and the energy splitting as a function of the conformation of a Fe(II) organometallic complex. In order to achieve this, we propose a type of scaled electronic embedding to cover the long-range interactions implicitly in our neural network describing the Fe(II) organometallic complexes. For the total energy prediction, the lowest MAE is 0.037 eV, while the lowest MAE of the splitting energy is 0.030 eV. Compared to baseline models, which only incorporate short-range interactions, our scaled electronic embeddings improve the accuracy by over 70% for the prediction of the total energy and the splitting energy. With regard to semiempirical methods, our proposed models reduce the MAE, with respect to these methods, by 2 orders of magnitude.
Collapse
Affiliation(s)
- Hongni Jin
- Department
of Chemistry, Michigan State University, East Lansing, Michigan 48824, United States
| | - Kenneth M. Merz
- Department
of Chemistry, Michigan State University, East Lansing, Michigan 48824, United States
- Department
of Biochemistry and Molecular Biology, Michigan
State University, East Lansing, Michigan 48824, United States
| |
Collapse
|
6
|
Martí C, Devereux C, Najm HN, Zádor J. Evaluation of Rate Coefficients in the Gas Phase Using Machine-Learned Potentials. J Phys Chem A 2024. [PMID: 38427974 DOI: 10.1021/acs.jpca.3c07872] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/03/2024]
Abstract
We assess the capability of machine-learned potentials to compute rate coefficients by training a neural network (NN) model and applying it to describe the chemical landscape on the C5H5 potential energy surface, which is relevant to molecular weight growth in combustion and interstellar media. We coupled the resulting NN with an automated kinetics workflow code, KinBot, to perform all necessary calculations to compute the rate coefficients. The NN is benchmarked exhaustively by evaluating its performance at the various stages of the kinetics calculations: from the electronic energy through the computation of zero point energy, barrier heights, entropic contributions, the portion of the PES explored, and finally the overall rate coefficients as formulated by transition state theory.
Collapse
Affiliation(s)
- Carles Martí
- Combustion Research Facility, Sandia National Laboratories, Livermore, California 94551, United States
| | - Christian Devereux
- Combustion Research Facility, Sandia National Laboratories, Livermore, California 94551, United States
| | - Habib N Najm
- Combustion Research Facility, Sandia National Laboratories, Livermore, California 94551, United States
| | - Judit Zádor
- Combustion Research Facility, Sandia National Laboratories, Livermore, California 94551, United States
| |
Collapse
|
7
|
Kříž K, Schmidt L, Andersson AT, Walz MM, van der Spoel D. An Imbalance in the Force: The Need for Standardized Benchmarks for Molecular Simulation. J Chem Inf Model 2023; 63:412-431. [PMID: 36630710 PMCID: PMC9875315 DOI: 10.1021/acs.jcim.2c01127] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2022] [Indexed: 01/12/2023]
Abstract
Force fields (FFs) for molecular simulation have been under development for more than half a century. As with any predictive model, rigorous testing and comparisons of models critically depends on the availability of standardized data sets and benchmarks. While such benchmarks are rather common in the fields of quantum chemistry, this is not the case for empirical FFs. That is, few benchmarks are reused to evaluate FFs, and development teams rather use their own training and test sets. Here we present an overview of currently available tests and benchmarks for computational chemistry, focusing on organic compounds, including halogens and common ions, as FFs for these are the most common ones. We argue that many of the benchmark data sets from quantum chemistry can in fact be reused for evaluating FFs, but new gas phase data is still needed for compounds containing phosphorus and sulfur in different valence states. In addition, more nonequilibrium interaction energies and forces, as well as molecular properties such as electrostatic potentials around compounds, would be beneficial. For the condensed phases there is a large body of experimental data available, and tools to utilize these data in an automated fashion are under development. If FF developers, as well as researchers in artificial intelligence, would adopt a number of these data sets, it would become easier to compare the relative strengths and weaknesses of different models and to, eventually, restore the balance in the force.
Collapse
Affiliation(s)
- Kristian Kříž
- Department
of Cell and Molecular Biology, Uppsala University, Box 596, SE-75124Uppsala, Sweden
| | - Lisa Schmidt
- Faculty
of Biosciences, University of Heidelberg, Heidelberg69117, Germany
| | - Alfred T. Andersson
- Department
of Cell and Molecular Biology, Uppsala University, Box 596, SE-75124Uppsala, Sweden
| | - Marie-Madeleine Walz
- Department
of Cell and Molecular Biology, Uppsala University, Box 596, SE-75124Uppsala, Sweden
| | - David van der Spoel
- Department
of Cell and Molecular Biology, Uppsala University, Box 596, SE-75124Uppsala, Sweden
| |
Collapse
|
8
|
Thaler S, Stupp M, Zavadlav J. Deep coarse-grained potentials via relative entropy minimization. J Chem Phys 2022; 157:244103. [PMID: 36586977 DOI: 10.1063/5.0124538] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022] Open
Abstract
Neural network (NN) potentials are a natural choice for coarse-grained (CG) models. Their many-body capacity allows highly accurate approximations of the potential of mean force, promising CG simulations of unprecedented accuracy. CG NN potentials trained bottom-up via force matching (FM), however, suffer from finite data effects: They rely on prior potentials for physically sound predictions outside the training data domain, and the corresponding free energy surface is sensitive to errors in the transition regions. The standard alternative to FM for classical potentials is relative entropy (RE) minimization, which has not yet been applied to NN potentials. In this work, we demonstrate, for benchmark problems of liquid water and alanine dipeptide, that RE training is more data efficient, due to accessing the CG distribution during training, resulting in improved free energy surfaces and reduced sensitivity to prior potentials. In addition, RE learns to correct time integration errors, allowing larger time steps in CG molecular dynamics simulation, while maintaining accuracy. Thus, our findings support the use of training objectives beyond FM, as a promising direction for improving CG NN potential's accuracy and reliability.
Collapse
Affiliation(s)
- Stephan Thaler
- Multiscale Modeling of Fluid Materials, Department of Engineering Physics and Computation, TUM School of Engineering and Design, Technical University of Munich, Munich, Germany
| | - Maximilian Stupp
- Multiscale Modeling of Fluid Materials, Department of Engineering Physics and Computation, TUM School of Engineering and Design, Technical University of Munich, Munich, Germany
| | - Julija Zavadlav
- Multiscale Modeling of Fluid Materials, Department of Engineering Physics and Computation, TUM School of Engineering and Design, Technical University of Munich, Munich, Germany
| |
Collapse
|
9
|
Towards fully ab initio simulation of atmospheric aerosol nucleation. Nat Commun 2022; 13:6067. [PMID: 36241616 PMCID: PMC9568664 DOI: 10.1038/s41467-022-33783-y] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2022] [Accepted: 09/29/2022] [Indexed: 11/08/2022] Open
Abstract
Atmospheric aerosol nucleation contributes to approximately half of the worldwide cloud condensation nuclei. Despite the importance of climate, detailed nucleation mechanisms are still poorly understood. Understanding aerosol nucleation dynamics is hindered by the nonreactivity of force fields (FFs) and high computational costs due to the rare event nature of aerosol nucleation. Developing reactive FFs for nucleation systems is even more challenging than developing covalently bonded materials because of the wide size range and high dimensional characteristics of noncovalent hydrogen bonding bridging clusters. Here, we propose a general workflow that is also applicable to other systems to train an accurate reactive FF based on a deep neural network (DNN) and further bridge DNN-FF-based molecular dynamics (MD) with a cluster kinetics model based on Poisson distributions of reactive events to overcome the high computational costs of direct MD. We found that previously reported acid-base formation rates tend to be significantly underestimated, especially in polluted environments, emphasizing that acid-base nucleation observed in multiple environments should be revisited.
Collapse
|
10
|
Li J, Lopez SA. A Look Inside the Black Box of Machine Learning Photodynamics Simulations. Acc Chem Res 2022; 55:1972-1984. [PMID: 35796602 DOI: 10.1021/acs.accounts.2c00288] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Abstract
ConspectusPhotochemical reactions are of great importance in chemistry, biology, and materials science because they take advantage of a renewable energy source, mild reaction conditions, and high atom economy. Light absorption can excite molecules to a higher energy electronic state of the same spin multiplicity. The following nonadiabatic processes induce molecular transformations that afford exotic molecular architectures and high-energy-isomers that are inaccessible by thermal means. Computational simulations now complement time-resolved instrumentation to reveal ultrafast excited-state mechanistic information for photochemical reactions that is essential in disentangling elusive spectroscopic features, excited-state lifetimes, and excited-state mechanistic critical points. Nonadiabatic molecular dynamics (NAMD), powered by surface hopping techniques, is among the most widely applied techniques to model the photochemical reactions of medium-sized molecules. However, the computational efficiency is limited because of the requisite thousands of multiconfigurational quantum-chemical calculations multiplied by hundreds of trajectories. Machine learning (ML) has emerged as a revolutionary force in computational chemistry to predict the outcome of the resource-intensive multiconfigurational calculations on the fly. An ML potential trained with a substantial set of quantum-chemical calculations can predict the energies and forces with errors under chemical accuracy at a negligible cost. The integration of ML potentials in NAMD dramatically extends the maximum simulation time scale by ∼10 000-fold to the nanosecond regime.In this Account, we present a comprehensive demonstration of ML photodynamics simulations and summarize our most recent applications in resolving complex photochemical reactions. First, we address three fundamental components of ML techniques for photodynamics simulations: the quantum-chemical data set, the ML potential, and NAMD. Second, we describe best practices in building training data and our procedure toward training the ML photodynamics model with our recent literature contributions. We introduce a convenient training data generation scheme combining Wigner sampling and geometrical interpolation. It trains reliable and effective ML potentials suitable for subsequent active learning to detect undersampled data. We demonstrate how active learning automatically discovers new mechanistic pathways and reproduces experimental results. We point out that atomic permutation is an essential data augmentation approach to improve the learnability of distance-based molecular descriptors for highly symmetric molecules. Third, we demonstrate the utility of ML-photodynamics by showing the results of ML photodynamics simulations of (1) photo-torquoselective 4π disrotatory electrocyclic ring closing of norbornyl cyclohexadiene, which reveals a thermal conversion from experimentally unobserved intermediates to the reactant in 1 ns; (2) [2 + 2] photocycloaddition of substituted [3]-syn-ladderdienes in competition with 4π and 6π electrocyclic ring-opening reactions, uncovering substituent effects to explain the reported increased quantum yield of substituted cubane precursors; and (3) photochemical 4π disrotatory electrocyclic reactions of fluorobenzenes in nanoseconds with XMS-CASPT2-level training data. We expect this Account to broaden understanding of ML photodynamics and inspire future developments and applications to increasingly large molecules within complex environments on long time scales.
Collapse
Affiliation(s)
- Jingbai Li
- Department of Chemistry and Chemical Biology, Northeastern University, Boston, Massachusetts 02115, United States
| | - Steven A Lopez
- Department of Chemistry and Chemical Biology, Northeastern University, Boston, Massachusetts 02115, United States
| |
Collapse
|
11
|
Kamberaj H. Random walks in a free energy landscape combining augmented molecular dynamics simulations with a dynamic graph neural network model. J Mol Graph Model 2022; 114:108199. [DOI: 10.1016/j.jmgm.2022.108199] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2021] [Revised: 04/09/2022] [Accepted: 04/11/2022] [Indexed: 10/18/2022]
|
12
|
Jacobson LD, Stevenson JM, Ramezanghorbani F, Ghoreishi D, Leswing K, Harder ED, Abel R. Transferable Neural Network Potential Energy Surfaces for Closed-Shell Organic Molecules: Extension to Ions. J Chem Theory Comput 2022; 18:2354-2366. [PMID: 35290063 DOI: 10.1021/acs.jctc.1c00821] [Citation(s) in RCA: 17] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
Abstract
Transferable high dimensional neural network potentials (HDNNPs) have shown great promise as an avenue to increase the accuracy and domain of applicability of existing atomistic force fields for organic systems relevant to life science. We have previously reported such a potential (Schrödinger-ANI) that has broad coverage of druglike molecules. We extend that work here to cover ionic and zwitterionic druglike molecules expected to be relevant to drug discovery research activities. We report a novel HDNNP architecture, which we call QRNN, that predicts atomic charges and uses these charges as descriptors in an energy model that delivers conformational energies within chemical accuracy when measured against the reference theory it is trained to. Further, we find that delta learning based on a semiempirical level of theory approximately halves the errors. We test the models on torsion energy profiles, relative conformational energies, geometric parameters, and relative tautomer errors.
Collapse
Affiliation(s)
- Leif D Jacobson
- Schrödinger Inc., 1540 Broadway, 24th floor, New York, New York 10036, United States
| | - James M Stevenson
- Schrödinger Inc., 1540 Broadway, 24th floor, New York, New York 10036, United States
| | | | - Delaram Ghoreishi
- Schrödinger Inc., 1540 Broadway, 24th floor, New York, New York 10036, United States
| | - Karl Leswing
- Schrödinger Inc., 1540 Broadway, 24th floor, New York, New York 10036, United States
| | - Edward D Harder
- Schrödinger Inc., 1540 Broadway, 24th floor, New York, New York 10036, United States
| | - Robert Abel
- Schrödinger Inc., 1540 Broadway, 24th floor, New York, New York 10036, United States
| |
Collapse
|
13
|
Pinheiro M, Ge F, Ferré N, Dral PO, Barbatti M. Choosing the right molecular machine learning potential. Chem Sci 2021; 12:14396-14413. [PMID: 34880991 PMCID: PMC8580106 DOI: 10.1039/d1sc03564a] [Citation(s) in RCA: 55] [Impact Index Per Article: 18.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2021] [Accepted: 09/14/2021] [Indexed: 11/21/2022] Open
Abstract
Quantum-chemistry simulations based on potential energy surfaces of molecules provide invaluable insight into the physicochemical processes at the atomistic level and yield such important observables as reaction rates and spectra. Machine learning potentials promise to significantly reduce the computational cost and hence enable otherwise unfeasible simulations. However, the surging number of such potentials begs the question of which one to choose or whether we still need to develop yet another one. Here, we address this question by evaluating the performance of popular machine learning potentials in terms of accuracy and computational cost. In addition, we deliver structured information for non-specialists in machine learning to guide them through the maze of acronyms, recognize each potential's main features, and judge what they could expect from each one.
Collapse
Affiliation(s)
- Max Pinheiro
- Aix Marseille University, CNRS, ICR Marseille France
| | - Fuchun Ge
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, Department of Chemistry, College of Chemistry and Chemical Engineering, Xiamen University China
| | - Nicolas Ferré
- Aix Marseille University, CNRS, ICR Marseille France
| | - Pavlo O Dral
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, Department of Chemistry, College of Chemistry and Chemical Engineering, Xiamen University China
| | - Mario Barbatti
- Aix Marseille University, CNRS, ICR Marseille France
- Institut Universitaire de France 75231 Paris France
| |
Collapse
|
14
|
Hoxha M, Kamberaj H. Automation of some macromolecular properties using a machine learning approach. MACHINE LEARNING: SCIENCE AND TECHNOLOGY 2021. [DOI: 10.1088/2632-2153/abe7b6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022] Open
Abstract
Abstract
In this study, we employed a newly developed method to predict macromolecular properties using a swarm artificial neural network (ANN) method as a machine learning approach. In this method, the molecular structures are represented by the feature description vectors used as training input data for a neural network. This study aims to develop an efficient approach for training an ANN using either experimental or quantum mechanics data. We aim to introduce an error model controlling the reliability of the prediction confidence interval using a bootstrapping swarm approach. We created different datasets of selected experimental or quantum mechanics results. Using this optimized ANN, we hope to predict properties and their statistical errors for new molecules. There are four datasets used in this study. That includes the dataset of 642 small organic molecules with known experimental hydration free energies, the dataset of 1475 experimental pKa values of ionizable groups in 192 proteins, the dataset of 2693 mutants in 14 proteins with given experimental values of changes in the Gibbs free energy, and a dataset of 7101 quantum mechanics heat of formation calculations. All the data are prepared and optimized using the AMBER force field in the CHARMM macromolecular computer simulation program. The bootstrapping swarm ANN code for performing the optimization and prediction is written in Python computer programming language. The descriptor vectors of the small molecules are based on the Coulomb matrix and sum over bond properties. For the macromolecular systems, they consider the chemical-physical fingerprints of the region in the vicinity of each amino acid.
Collapse
|
15
|
Westermayr J, Marquetand P. Machine Learning for Electronically Excited States of Molecules. Chem Rev 2021; 121:9873-9926. [PMID: 33211478 PMCID: PMC8391943 DOI: 10.1021/acs.chemrev.0c00749] [Citation(s) in RCA: 167] [Impact Index Per Article: 55.7] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2020] [Indexed: 12/11/2022]
Abstract
Electronically excited states of molecules are at the heart of photochemistry, photophysics, as well as photobiology and also play a role in material science. Their theoretical description requires highly accurate quantum chemical calculations, which are computationally expensive. In this review, we focus on not only how machine learning is employed to speed up such excited-state simulations but also how this branch of artificial intelligence can be used to advance this exciting research field in all its aspects. Discussed applications of machine learning for excited states include excited-state dynamics simulations, static calculations of absorption spectra, as well as many others. In order to put these studies into context, we discuss the promises and pitfalls of the involved machine learning techniques. Since the latter are mostly based on quantum chemistry calculations, we also provide a short introduction into excited-state electronic structure methods and approaches for nonadiabatic dynamics simulations and describe tricks and problems when using them in machine learning for excited states of molecules.
Collapse
Affiliation(s)
- Julia Westermayr
- Institute
of Theoretical Chemistry, Faculty of Chemistry, University of Vienna, Währinger Strasse 17, 1090 Vienna, Austria
| | - Philipp Marquetand
- Institute
of Theoretical Chemistry, Faculty of Chemistry, University of Vienna, Währinger Strasse 17, 1090 Vienna, Austria
- Vienna
Research Platform on Accelerating Photoreaction Discovery, University of Vienna, Währinger Strasse 17, 1090 Vienna, Austria
- Data
Science @ Uni Vienna, University of Vienna, Währinger Strasse 29, 1090 Vienna, Austria
| |
Collapse
|
16
|
Unke O, Chmiela S, Sauceda HE, Gastegger M, Poltavsky I, Schütt KT, Tkatchenko A, Müller KR. Machine Learning Force Fields. Chem Rev 2021; 121:10142-10186. [PMID: 33705118 PMCID: PMC8391964 DOI: 10.1021/acs.chemrev.0c01111] [Citation(s) in RCA: 400] [Impact Index Per Article: 133.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2020] [Indexed: 12/27/2022]
Abstract
In recent years, the use of machine learning (ML) in computational chemistry has enabled numerous advances previously out of reach due to the computational complexity of traditional electronic-structure methods. One of the most promising applications is the construction of ML-based force fields (FFs), with the aim to narrow the gap between the accuracy of ab initio methods and the efficiency of classical FFs. The key idea is to learn the statistical relation between chemical structure and potential energy without relying on a preconceived notion of fixed chemical bonds or knowledge about the relevant interactions. Such universal ML approximations are in principle only limited by the quality and quantity of the reference data used to train them. This review gives an overview of applications of ML-FFs and the chemical insights that can be obtained from them. The core concepts underlying ML-FFs are described in detail, and a step-by-step guide for constructing and testing them from scratch is given. The text concludes with a discussion of the challenges that remain to be overcome by the next generation of ML-FFs.
Collapse
Affiliation(s)
- Oliver
T. Unke
- Machine
Learning Group, Technische Universität
Berlin, 10587 Berlin, Germany
- DFG
Cluster of Excellence “Unifying Systems in Catalysis”
(UniSysCat), Technische Universität Berlin, 10623 Berlin, Germany
| | - Stefan Chmiela
- Machine
Learning Group, Technische Universität
Berlin, 10587 Berlin, Germany
| | - Huziel E. Sauceda
- Machine
Learning Group, Technische Universität
Berlin, 10587 Berlin, Germany
- BASLEARN,
BASF-TU Joint Lab, Technische Universität
Berlin, 10587 Berlin, Germany
| | - Michael Gastegger
- Machine
Learning Group, Technische Universität
Berlin, 10587 Berlin, Germany
- DFG
Cluster of Excellence “Unifying Systems in Catalysis”
(UniSysCat), Technische Universität Berlin, 10623 Berlin, Germany
- BASLEARN,
BASF-TU Joint Lab, Technische Universität
Berlin, 10587 Berlin, Germany
| | - Igor Poltavsky
- Department
of Physics and Materials Science, University
of Luxembourg, L-1511 Luxembourg City, Luxembourg
| | - Kristof T. Schütt
- Machine
Learning Group, Technische Universität
Berlin, 10587 Berlin, Germany
| | - Alexandre Tkatchenko
- Department
of Physics and Materials Science, University
of Luxembourg, L-1511 Luxembourg City, Luxembourg
| | - Klaus-Robert Müller
- Machine
Learning Group, Technische Universität
Berlin, 10587 Berlin, Germany
- BIFOLD−Berlin
Institute for the Foundations of Learning and Data, Berlin, Germany
- Department
of Artificial Intelligence, Korea University, Anam-dong, Seongbuk-gu, Seoul 02841, Korea
- Max Planck
Institute for Informatics, Stuhlsatzenhausweg, 66123 Saarbrücken, Germany
- Google
Research, Brain Team, Berlin, Germany
| |
Collapse
|
17
|
Abstract
Electronically excited states of molecules are at the heart of photochemistry, photophysics, as well as photobiology and also play a role in material science. Their theoretical description requires highly accurate quantum chemical calculations, which are computationally expensive. In this review, we focus on not only how machine learning is employed to speed up such excited-state simulations but also how this branch of artificial intelligence can be used to advance this exciting research field in all its aspects. Discussed applications of machine learning for excited states include excited-state dynamics simulations, static calculations of absorption spectra, as well as many others. In order to put these studies into context, we discuss the promises and pitfalls of the involved machine learning techniques. Since the latter are mostly based on quantum chemistry calculations, we also provide a short introduction into excited-state electronic structure methods and approaches for nonadiabatic dynamics simulations and describe tricks and problems when using them in machine learning for excited states of molecules.
Collapse
Affiliation(s)
- Julia Westermayr
- Institute of Theoretical Chemistry, Faculty of Chemistry, University of Vienna, Währinger Strasse 17, 1090 Vienna, Austria
| | - Philipp Marquetand
- Institute of Theoretical Chemistry, Faculty of Chemistry, University of Vienna, Währinger Strasse 17, 1090 Vienna, Austria
- Vienna Research Platform on Accelerating Photoreaction Discovery, University of Vienna, Währinger Strasse 17, 1090 Vienna, Austria
- Data Science @ Uni Vienna, University of Vienna, Währinger Strasse 29, 1090 Vienna, Austria
| |
Collapse
|
18
|
Poltavsky I, Tkatchenko A. Machine Learning Force Fields: Recent Advances and Remaining Challenges. J Phys Chem Lett 2021; 12:6551-6564. [PMID: 34242032 DOI: 10.1021/acs.jpclett.1c01204] [Citation(s) in RCA: 41] [Impact Index Per Article: 13.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
In chemistry and physics, machine learning (ML) methods promise transformative impacts by advancing modeling and improving our understanding of complex molecules and materials. Each ML method comprises a mathematically well-defined procedure, and an increasingly larger number of easy-to-use ML packages for modeling atomistic systems are becoming available. In this Perspective, we discuss the general aspects of ML techniques in the context of creating ML force fields. We describe common features of ML modeling and quantum-mechanical approximations, so-called global and local ML models, and the physical differences behind these two classes of approaches. Finally, we describe the recent developments and emerging directions in the field of ML-driven molecular modeling. This Perspective aims to inspire interdisciplinary collaborations crossing the borders between physical chemistry, chemical physics, computer science, and data science.
Collapse
Affiliation(s)
- Igor Poltavsky
- Department of Physics and Materials Science, University of Luxembourg, L-1511 Luxembourg City, Luxembourg
| | - Alexandre Tkatchenko
- Department of Physics and Materials Science, University of Luxembourg, L-1511 Luxembourg City, Luxembourg
| |
Collapse
|
19
|
Hayashi A, Ato Y, Yamamoto A, Yoshida H, Yamanaka S, Kawakami T, Okumura M. Gibbs Energy of Hydrogen Adsorption on Pt Surface by Machine Learning Potential and Metadynamics. CHEM LETT 2021. [DOI: 10.1246/cl.210137] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Affiliation(s)
- Akihide Hayashi
- Department of Chemistry, Graduate School of Science, Osaka University, 1-32 Machikaneyama, Toyonaka, Osaka 563-0043, Japan
| | - Yoshinori Ato
- Department of Chemistry, Graduate School of Science, Osaka University, 1-32 Machikaneyama, Toyonaka, Osaka 563-0043, Japan
| | - Akira Yamamoto
- Graduate School of Human and Environmental Studies, Kyoto University, Yoshida Nihonmatsu-cho, Sakyo-ku, Kyoto 606-8501, Japan
- Elements Strategy Initiative for Catalysts and Batteries (ESICB), Kyoto University, Nishikyo, Kyoto 615-8245, Japan
| | - Hisao Yoshida
- Graduate School of Human and Environmental Studies, Kyoto University, Yoshida Nihonmatsu-cho, Sakyo-ku, Kyoto 606-8501, Japan
- Elements Strategy Initiative for Catalysts and Batteries (ESICB), Kyoto University, Nishikyo, Kyoto 615-8245, Japan
| | - Shusuke Yamanaka
- Department of Chemistry, Graduate School of Science, Osaka University, 1-32 Machikaneyama, Toyonaka, Osaka 563-0043, Japan
| | - Takashi Kawakami
- Department of Chemistry, Graduate School of Science, Osaka University, 1-32 Machikaneyama, Toyonaka, Osaka 563-0043, Japan
| | - Mitsutaka Okumura
- Department of Chemistry, Graduate School of Science, Osaka University, 1-32 Machikaneyama, Toyonaka, Osaka 563-0043, Japan
- Graduate School of Human and Environmental Studies, Kyoto University, Yoshida Nihonmatsu-cho, Sakyo-ku, Kyoto 606-8501, Japan
| |
Collapse
|
20
|
Xu J, Cao XM, Hu P. Accelerating Metadynamics-Based Free-Energy Calculations with Adaptive Machine Learning Potentials. J Chem Theory Comput 2021; 17:4465-4476. [PMID: 34100605 DOI: 10.1021/acs.jctc.1c00261] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
There is an increasing demand for free-energy calculations using ab initio molecular dynamics these days. Metadynamics (MetaD) is frequently utilized to reconstruct the free-energy surface, but it is often computationally intractable for the first-principles calculations. Machine learning potentials (MLPs) have become popular alternatives. However, the training could be a long and arduous process before using them in practical applications. To accelerate MetaD use with MLPs for the free-energy calculation in an easy manner, we propose the adaptive machine learning potential-accelerated metadynamics (AMLP-MetaD). In this method, the MLP in the form of a Gaussian approximation potential (GAP) can adapt itself based on its uncertainty estimation, which decides whether to accept the model prediction or recalculate it with a reference method (usually density functional theory) for further training during the MetaD simulation. We demonstrate that the free-energy landscape similar to the ab initio one can be obtained using AMLP-MetaD with a 10-time speedup. Moreover, the quality of the free-energy results can be deeply improved using Δ-MLP, which is the GAP-corrected density functional tight binding in our case. We exemplify this novel method with two model systems, CO adsorption on the Pt13 cluster and the Pt(111) surface, which are of vital importance in heterogeneous catalysis. The successful application in these two tests highlights that our proposed method can be used in both cluster and periodic systems and for up to two collective variables.
Collapse
Affiliation(s)
- Jiayan Xu
- School of Chemistry and Chemical Engineering, Queen's University Belfast, Belfast BT9 5AG, U.K
| | - Xiao-Ming Cao
- Key Laboratory for Advanced Materials, Centre for Computational Chemistry and Research Institute of Industrial Catalysis, School of Chemistry and Molecular Engineering, East China University of Science and Technology, 130 Meilong Road, Shanghai 200237, P. R. China
| | - P Hu
- School of Chemistry and Chemical Engineering, Queen's University Belfast, Belfast BT9 5AG, U.K
| |
Collapse
|
21
|
Affiliation(s)
- Heather J. Kulik
- Department of Chemical Engineering Massachusetts Institute of Technology 77 Massachusetts Ave Rm 66–464 Cambridge MA 02139 USA
| |
Collapse
|
22
|
Druchok M, Yarish D, Gurbych O, Maksymenko M. Toward efficient generation, correction, and properties control of unique drug‐like structures. J Comput Chem 2021; 42:746-760. [DOI: 10.1002/jcc.26494] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2020] [Revised: 12/21/2020] [Accepted: 01/25/2021] [Indexed: 01/01/2023]
Affiliation(s)
- Maksym Druchok
- SoftServe, Inc Lviv Ukraine
- Institute for Condensed Matter Physics Lviv Ukraine
| | | | | | | |
Collapse
|
23
|
Ma S, Liu ZP. Machine Learning for Atomic Simulation and Activity Prediction in Heterogeneous Catalysis: Current Status and Future. ACS Catal 2020. [DOI: 10.1021/acscatal.0c03472] [Citation(s) in RCA: 51] [Impact Index Per Article: 12.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
Affiliation(s)
- Sicong Ma
- Collaborative Innovation Center of Chemistry for Energy Material, Shanghai Key Laboratory of Molecular Catalysis and Innovative Materials, Key Laboratory of Computational Physical Science, Department of Chemistry, Fudan University, Shanghai 200433, China
| | - Zhi-Pan Liu
- Collaborative Innovation Center of Chemistry for Energy Material, Shanghai Key Laboratory of Molecular Catalysis and Innovative Materials, Key Laboratory of Computational Physical Science, Department of Chemistry, Fudan University, Shanghai 200433, China
| |
Collapse
|
24
|
Kang PL, Shang C, Liu ZP. Large-Scale Atomic Simulation via Machine Learning Potentials Constructed by Global Potential Energy Surface Exploration. Acc Chem Res 2020; 53:2119-2129. [PMID: 32940999 DOI: 10.1021/acs.accounts.0c00472] [Citation(s) in RCA: 41] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/08/2023]
Abstract
Atomic simulations based on quantum mechanics (QM) calculations have entered into the tool box of chemists over the past few decades, facilitating an understanding of a wide range of chemistry problems, from structure characterization to reactivity determination. Due to the poor scaling and high computational cost intrinsic to QM calculations, one has to either sacrifice accuracy or time when performing large-scale atomic simulations. The battle to find a better compromise between accuracy and speed has been central to the development of new theoretical methods.The recent advances of machine-learning (ML)-based large-scale atomic simulations has shown great promise to the benefit of many branches of chemistry. Instead of solving the Schrödinger equation directly, ML-based simulations rely on a large data set of accurate potential energy surfaces (PESs) and complex numerical models to predict the total energy. These simulations feature both a high speed and a high accuracy for computing large systems. Due to the lack of a physical foundation in numerical models, ML models are often frustrated in their predictivity and robustness, which are key to applications. Focusing on these concerns, here we overview the recent advances in ML methodologies for atomic simulations on three key aspects. Namely, the generation of a representative data set, the extensity of ML models, and the continuity of data representation. While global optimization methods are the natural choice for building a representative data set, the stochastic surface walking method is shown to provide the desired PES sampling for both minima and transition regions on the PES. The current ML models generally utilize local geometrical descriptors as an input and consider the total energy as the sum of atomic energies. There are many flavors of data descriptors and ML models, but the applications for material and reaction predictions are still limited, not least because of the difficulty to train the associated vast global data sets. We show that our recently designed power-type structure descriptors together with a feed-forward neural network (NN) model are compatible with highly complex global PES data, which has led to a large family of global NN (G-NN) potentials.Two recent applications of G-NN potentials in material and reaction simulations are selected to illustrate how ML-based atomic simulations can help the discovery of new materials and reactions.
Collapse
Affiliation(s)
- Pei-Lin Kang
- Collaborative Innovation Center of Chemistry for Energy Material, Shanghai Key Laboratory of Molecular Catalysis and Innovative Materials, Key Laboratory of Computational Physical Science, Department of Chemistry, Fudan University, Shanghai 200433, China
| | - Cheng Shang
- Collaborative Innovation Center of Chemistry for Energy Material, Shanghai Key Laboratory of Molecular Catalysis and Innovative Materials, Key Laboratory of Computational Physical Science, Department of Chemistry, Fudan University, Shanghai 200433, China
| | - Zhi-Pan Liu
- Collaborative Innovation Center of Chemistry for Energy Material, Shanghai Key Laboratory of Molecular Catalysis and Innovative Materials, Key Laboratory of Computational Physical Science, Department of Chemistry, Fudan University, Shanghai 200433, China
| |
Collapse
|
25
|
Westermayr J, Marquetand P. Machine learning and excited-state molecular dynamics. MACHINE LEARNING-SCIENCE AND TECHNOLOGY 2020. [DOI: 10.1088/2632-2153/ab9c3e] [Citation(s) in RCA: 27] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
|
26
|
Yanxon H, Zagaceta D, Wood BC, Zhu Q. Neural network potential from bispectrum components: A case study on crystalline silicon. J Chem Phys 2020; 153:054118. [PMID: 32770884 DOI: 10.1063/5.0014677] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
In this article, we present a systematic study on developing machine learning force fields (MLFFs) for crystalline silicon. While the main-stream approach of fitting a MLFF is to use a small and localized training set from molecular dynamics simulations, it is unlikely to cover the global features of the potential energy surface. To remedy this issue, we used randomly generated symmetrical crystal structures to train a more general Si-MLFF. Furthermore, we performed substantial benchmarks among different choices of material descriptors and regression techniques on two different sets of silicon data. Our results show that neural network potential fitting with bispectrum coefficients as descriptors is a feasible method for obtaining accurate and transferable MLFFs.
Collapse
Affiliation(s)
- Howard Yanxon
- Department of Physics and Astronomy, University of Nevada, Las Vegas, Nevada 89154, USA
| | - David Zagaceta
- Department of Physics and Astronomy, University of Nevada, Las Vegas, Nevada 89154, USA
| | - Brandon C Wood
- Materials Science Division, Lawrence Livermore National Laboratory, Livermore, California 94550, USA
| | - Qiang Zhu
- Department of Physics and Astronomy, University of Nevada, Las Vegas, Nevada 89154, USA
| |
Collapse
|
27
|
Xie X, Persson KA, Small DW. Incorporating Electronic Information into Machine Learning Potential Energy Surfaces via Approaching the Ground-State Electronic Energy as a Function of Atom-Based Electronic Populations. J Chem Theory Comput 2020; 16:4256-4270. [PMID: 32502350 DOI: 10.1021/acs.jctc.0c00217] [Citation(s) in RCA: 32] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Machine learning (ML) approximations to density functional theory (DFT) potential energy surfaces (PESs) are showing great promise for reducing the computational cost of accurate molecular simulations, but at present, they are not applicable to varying electronic states, and in particular, they are not well suited for molecular systems in which the local electronic structure is sensitive to the medium to long-range electronic environment. With this issue as the focal point, we present a new machine learning approach called "BpopNN" for obtaining efficient approximations to DFT PESs. Conceptually, the methodology is based on approaching the true DFT energy as a function of electron populations on atoms; in practice, this is realized with available density functionals and constrained DFT (CDFT). The new approach creates approximations to this function with neural networks. These approximations thereby incorporate electronic information naturally into a ML approach, and optimizing the model energy with respect to populations allows the electronic terms to self-consistently adapt to the environment, as in DFT. We confirm the effectiveness of this approach with a variety of calculations on LinHn clusters.
Collapse
Affiliation(s)
- Xiaowei Xie
- Department of Chemistry, University of California, Berkeley, California 94720, United States.,Energy Technologies Area, Lawrence Berkeley National Laboratory, Berkeley, California 94720, United States
| | - Kristin A Persson
- Energy Technologies Area, Lawrence Berkeley National Laboratory, Berkeley, California 94720, United States.,Department of Materials Science and Engineering, University of California, Berkeley, California 94720, United States
| | - David W Small
- Department of Chemistry, University of California, Berkeley, California 94720, United States.,Molecular Graphics and Computation Facility, College of Chemistry, University of California, Berkeley 94720, California United States
| |
Collapse
|
28
|
Smith JS, Zubatyuk R, Nebgen B, Lubbers N, Barros K, Roitberg AE, Isayev O, Tretiak S. The ANI-1ccx and ANI-1x data sets, coupled-cluster and density functional theory properties for molecules. Sci Data 2020; 7:134. [PMID: 32358545 PMCID: PMC7195467 DOI: 10.1038/s41597-020-0473-z] [Citation(s) in RCA: 76] [Impact Index Per Article: 19.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2019] [Accepted: 03/24/2020] [Indexed: 11/22/2022] Open
Abstract
Maximum diversification of data is a central theme in building generalized and accurate machine learning (ML) models. In chemistry, ML has been used to develop models for predicting molecular properties, for example quantum mechanics (QM) calculated potential energy surfaces and atomic charge models. The ANI-1x and ANI-1ccx ML-based general-purpose potentials for organic molecules were developed through active learning; an automated data diversification process. Here, we describe the ANI-1x and ANI-1ccx data sets. To demonstrate data diversity, we visualize it with a dimensionality reduction scheme, and contrast against existing data sets. The ANI-1x data set contains multiple QM properties from 5 M density functional theory calculations, while the ANI-1ccx data set contains 500 k data points obtained with an accurate CCSD(T)/CBS extrapolation. Approximately 14 million CPU core-hours were expended to generate this data. Multiple QM calculated properties for the chemical elements C, H, N, and O are provided: energies, atomic forces, multipole moments, atomic charges, etc. We provide this data to the community to aid research and development of ML models for chemistry.
Collapse
Affiliation(s)
- Justin S Smith
- Center for Nonlinear Studies, Los Alamos National Laboratory, Los Alamos, NM, USA
- Theoretical Division, Los Alamos National Laboratory, Los Alamos, NM, USA
| | - Roman Zubatyuk
- Theoretical Division, Los Alamos National Laboratory, Los Alamos, NM, USA
- Department of Chemistry, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Benjamin Nebgen
- Theoretical Division, Los Alamos National Laboratory, Los Alamos, NM, USA
| | - Nicholas Lubbers
- Computer, Computational, and Statistical Sciences Division, Los Alamos National Laboratory, Los Alamos, NM, USA
| | - Kipton Barros
- Computer, Computational, and Statistical Sciences Division, Los Alamos National Laboratory, Los Alamos, NM, USA
| | - Adrian E Roitberg
- University of Florida, Department of Chemistry, PO Box 117200, 32611-7200, Gainesville, USA.
| | - Olexandr Isayev
- Department of Chemistry, Carnegie Mellon University, Pittsburgh, PA, USA.
| | - Sergei Tretiak
- Theoretical Division, Los Alamos National Laboratory, Los Alamos, NM, USA.
| |
Collapse
|
29
|
Gastegger M, Marquetand P. Molecular Dynamics with Neural Network Potentials. MACHINE LEARNING MEETS QUANTUM PHYSICS 2020. [DOI: 10.1007/978-3-030-40245-7_12] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
|
30
|
Kang PL, Shang C, Liu ZP. Glucose to 5-Hydroxymethylfurfural: Origin of Site-Selectivity Resolved by Machine Learning Based Reaction Sampling. J Am Chem Soc 2019; 141:20525-20536. [DOI: 10.1021/jacs.9b11535] [Citation(s) in RCA: 38] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Affiliation(s)
- Pei-Lin Kang
- Collaborative Innovation Center of Chemistry for Energy Material, Shanghai Key Laboratory of Molecular Catalysis and Innovative Materials, Key Laboratory of Computational Physical Science, Department of Chemistry, Fudan University, Shanghai 200433, P. R. China
| | - Cheng Shang
- Collaborative Innovation Center of Chemistry for Energy Material, Shanghai Key Laboratory of Molecular Catalysis and Innovative Materials, Key Laboratory of Computational Physical Science, Department of Chemistry, Fudan University, Shanghai 200433, P. R. China
| | - Zhi-Pan Liu
- Collaborative Innovation Center of Chemistry for Energy Material, Shanghai Key Laboratory of Molecular Catalysis and Innovative Materials, Key Laboratory of Computational Physical Science, Department of Chemistry, Fudan University, Shanghai 200433, P. R. China
| |
Collapse
|
31
|
Brown SE. From ab initio data to high-dimensional potential energy surfaces: A critical overview and assessment of the development of permutationally invariant polynomial potential energy surfaces for single molecules. J Chem Phys 2019; 151:194111. [PMID: 31757150 DOI: 10.1063/1.5123999] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
The representation of high-dimensional potential energy surfaces by way of the many-body expansion and permutationally invariant polynomials has become a well-established tool for improving the resolution and extending the scope of molecular simulations. The high level of accuracy that can be attained by these potential energy functions (PEFs) is due in large part to their specificity: for each term in the many-body expansion, a species-specific training set must be generated at the desired level of theory and a number of fits attempted in order to obtain a robust and reliable PEF. In this work, we attempt to characterize the numerical aspects of the fitting problem, addressing questions which are of simultaneous practical and fundamental importance. These include concrete illustrations of the nonconvexity of the problem, the ill-conditionedness of the linear system to be solved and possible need for regularization, the sensitivity of the solutions to the characteristics of the training set, and limitations of the approach with respect to accuracy and the types of molecules that can be treated. In addition, we introduce a general approach to the generation of training set configurations based on the familiar harmonic approximation and evaluate the possible benefits to the use of quasirandom sequences for sampling configuration space in this context. Using sulfate as a case study, the findings are largely generalizable and expected to ultimately facilitate the efficient development of PIP-based many-body PEFs for general systems via automation.
Collapse
Affiliation(s)
- Sandra E Brown
- Department of Chemistry and Biochemistry, University of California San Diego, La Jolla, California 92093, USA
| |
Collapse
|
32
|
A fast neural network approach for direct covariant forces prediction in complex multi-element extended systems. NAT MACH INTELL 2019. [DOI: 10.1038/s42256-019-0098-0] [Citation(s) in RCA: 22] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
|
33
|
Herr JE, Koh K, Yao K, Parkhill J. Compressing physics with an autoencoder: Creating an atomic species representation to improve machine learning models in the chemical sciences. J Chem Phys 2019; 151:084103. [DOI: 10.1063/1.5108803] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open
Affiliation(s)
- John E. Herr
- Department of Chemistry and Biochemistry, The University of Notre Dame du Lac, 251 Nieuwland Science Hall, Notre Dame, Indiana 46556, USA
| | - Kevin Koh
- Department of Chemistry and Biochemistry, The University of Notre Dame du Lac, 251 Nieuwland Science Hall, Notre Dame, Indiana 46556, USA
| | - Kun Yao
- Department of Chemistry and Biochemistry, The University of Notre Dame du Lac, 251 Nieuwland Science Hall, Notre Dame, Indiana 46556, USA
| | - John Parkhill
- Department of Chemistry and Biochemistry, The University of Notre Dame du Lac, 251 Nieuwland Science Hall, Notre Dame, Indiana 46556, USA
| |
Collapse
|
34
|
Schlexer Lamoureux P, Winther KT, Garrido Torres JA, Streibel V, Zhao M, Bajdich M, Abild‐Pedersen F, Bligaard T. Machine Learning for Computational Heterogeneous Catalysis. ChemCatChem 2019. [DOI: 10.1002/cctc.201900595] [Citation(s) in RCA: 144] [Impact Index Per Article: 28.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Affiliation(s)
- Philomena Schlexer Lamoureux
- SUNCAT Center for Interface Science and Catalysis, SLAC National Accelerator Laboratory 2575 Sand Hill Road, Menlo Park California 94025 United States
- Department of Chemical Engineering Stanford University 443 Via Ortega Stanford CA 94305 United States
| | - Kirsten T. Winther
- SUNCAT Center for Interface Science and Catalysis, SLAC National Accelerator Laboratory 2575 Sand Hill Road, Menlo Park California 94025 United States
- Department of Chemical Engineering Stanford University 443 Via Ortega Stanford CA 94305 United States
| | - Jose Antonio Garrido Torres
- SUNCAT Center for Interface Science and Catalysis, SLAC National Accelerator Laboratory 2575 Sand Hill Road, Menlo Park California 94025 United States
- Department of Chemical Engineering Stanford University 443 Via Ortega Stanford CA 94305 United States
| | - Verena Streibel
- SUNCAT Center for Interface Science and Catalysis, SLAC National Accelerator Laboratory 2575 Sand Hill Road, Menlo Park California 94025 United States
- Department of Chemical Engineering Stanford University 443 Via Ortega Stanford CA 94305 United States
| | - Meng Zhao
- SUNCAT Center for Interface Science and Catalysis, SLAC National Accelerator Laboratory 2575 Sand Hill Road, Menlo Park California 94025 United States
- Department of Chemical Engineering Stanford University 443 Via Ortega Stanford CA 94305 United States
| | - Michal Bajdich
- SUNCAT Center for Interface Science and Catalysis, SLAC National Accelerator Laboratory 2575 Sand Hill Road, Menlo Park California 94025 United States
- Department of Chemical Engineering Stanford University 443 Via Ortega Stanford CA 94305 United States
| | - Frank Abild‐Pedersen
- SUNCAT Center for Interface Science and Catalysis, SLAC National Accelerator Laboratory 2575 Sand Hill Road, Menlo Park California 94025 United States
- Department of Chemical Engineering Stanford University 443 Via Ortega Stanford CA 94305 United States
| | - Thomas Bligaard
- SUNCAT Center for Interface Science and Catalysis, SLAC National Accelerator Laboratory 2575 Sand Hill Road, Menlo Park California 94025 United States
- Department of Chemical Engineering Stanford University 443 Via Ortega Stanford CA 94305 United States
| |
Collapse
|
35
|
Okamoto Y. Data sampling scheme for reproducing energies along reaction coordinates in high-dimensional neural network potentials. J Chem Phys 2019; 150:134103. [PMID: 30954039 DOI: 10.1063/1.5078394] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/03/2023] Open
Abstract
We propose a data sampling scheme for high-dimensional neural network potentials that can predict energies along the reaction pathway calculated using the hybrid density functional theory. We observed that a data sampling scheme that combined partial geometry optimization of intermediate structures with random displacement of atoms successfully predicted the energies along the reaction path with respect to five chemical reactions: Claisen rearrangement, Diels-Alder reaction, [1,5]-sigmatropic hydrogen shift, concerted hydrogen transfer in the water hexamer, and Cornforth rearrangement.
Collapse
Affiliation(s)
- Yasuharu Okamoto
- Data Platform Center, National Institute for Materials Science, 1-2-1 Sengen, Tsukuba, Ibaraki 305-0047, Japan
| |
Collapse
|
36
|
Janet JP, Liu F, Nandy A, Duan C, Yang T, Lin S, Kulik HJ. Designing in the Face of Uncertainty: Exploiting Electronic Structure and Machine Learning Models for Discovery in Inorganic Chemistry. Inorg Chem 2019; 58:10592-10606. [PMID: 30834738 DOI: 10.1021/acs.inorgchem.9b00109] [Citation(s) in RCA: 59] [Impact Index Per Article: 11.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Recent transformative advances in computing power and algorithms have made computational chemistry central to the discovery and design of new molecules and materials. First-principles simulations are increasingly accurate and applicable to large systems with the speed needed for high-throughput computational screening. Despite these strides, the combinatorial challenges associated with the vastness of chemical space mean that more than just fast and accurate computational tools are needed for accelerated chemical discovery. In transition-metal chemistry and catalysis, unique challenges arise. The variable spin, oxidation state, and coordination environments favored by elements with well-localized d or f electrons provide great opportunity for tailoring properties in catalytic or functional (e.g., magnetic) materials but also add layers of uncertainty to any design strategy. We outline five key mandates for realizing computationally driven accelerated discovery in inorganic chemistry: (i) fully automated simulation of new compounds, (ii) knowledge of prediction sensitivity or accuracy, (iii) faster-than-fast property prediction methods, (iv) maps for rapid chemical space traversal, and (v) a means to reveal design rules on the kilocompound scale. Through case studies in open-shell transition-metal chemistry, we describe how advances in methodology and software in each of these areas bring about new chemical insights. We conclude with our outlook on the next steps in this process toward realizing fully autonomous discovery in inorganic chemistry using computational chemistry.
Collapse
Affiliation(s)
- Jon Paul Janet
- Department of Chemical Engineering , Massachusetts Institute of Technology , Cambridge , Massachusetts 02139 , United States
| | - Fang Liu
- Department of Chemical Engineering , Massachusetts Institute of Technology , Cambridge , Massachusetts 02139 , United States
| | - Aditya Nandy
- Department of Chemical Engineering , Massachusetts Institute of Technology , Cambridge , Massachusetts 02139 , United States.,Department of Chemistry , Massachusetts Institute of Technology , Cambridge , Massachusetts 02139 , United States
| | - Chenru Duan
- Department of Chemical Engineering , Massachusetts Institute of Technology , Cambridge , Massachusetts 02139 , United States.,Department of Chemistry , Massachusetts Institute of Technology , Cambridge , Massachusetts 02139 , United States
| | - Tzuhsiung Yang
- Department of Chemical Engineering , Massachusetts Institute of Technology , Cambridge , Massachusetts 02139 , United States
| | - Sean Lin
- Department of Chemical Engineering , Massachusetts Institute of Technology , Cambridge , Massachusetts 02139 , United States
| | - Heather J Kulik
- Department of Chemical Engineering , Massachusetts Institute of Technology , Cambridge , Massachusetts 02139 , United States
| |
Collapse
|
37
|
Jackson NE, Webb MA, de Pablo JJ. Recent advances in machine learning towards multiscale soft materials design. Curr Opin Chem Eng 2019. [DOI: 10.1016/j.coche.2019.03.005] [Citation(s) in RCA: 70] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
|
38
|
Bonati L, Parrinello M. Silicon Liquid Structure and Crystal Nucleation from Ab Initio Deep Metadynamics. PHYSICAL REVIEW LETTERS 2018; 121:265701. [PMID: 30636123 DOI: 10.1103/physrevlett.121.265701] [Citation(s) in RCA: 55] [Impact Index Per Article: 9.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/01/2018] [Indexed: 06/09/2023]
Abstract
Studying the crystallization process of silicon is a challenging task since empirical potentials are not able to reproduce well the properties of both a semiconducting solid and metallic liquid. On the other hand, nucleation is a rare event that occurs in much longer timescales than those achievable by ab initio molecular dynamics. To address this problem, we train a deep neural network potential based on a set of data generated by metadynamics simulations using a classical potential. We show how this is an effective way to collect all the relevant data for the process of interest. In order to efficiently drive the crystallization process, we introduce a new collective variable based on the Debye structure factor. We are able to encode the long-range order information in a local variable which is better suited to describe the nucleation dynamics. The reference energies are then calculated using the strongly constrained and appropriately normed (SCAN) exchange-correlation functional, which is able to get a better description of the bonding complexity of the Si phase diagram. Finally, we recover the free energy surface with a density functional theory accuracy, and we compute the thermodynamics properties near the melting point, obtaining a good agreement with experimental data. In addition, we study the early stages of the crystallization process, unveiling features of the nucleation mechanism.
Collapse
Affiliation(s)
- Luigi Bonati
- Department of Physics, ETH Zurich, c/o Università della Svizzera italiana, Via Giuseppe Buffi 13, CH-6900, Lugano, Switzerland
- Facoltà di Informatica, Instituto di Scienze Computazionali, National Center for Computational Design and Discovery of Novel Materials (MARVEL), Università della Svizzera italiana, Via Giuseppe Buffi 13, CH-6900, Lugano, Switzerland
| | - Michele Parrinello
- Facoltà di Informatica, Instituto di Scienze Computazionali, National Center for Computational Design and Discovery of Novel Materials (MARVEL), Università della Svizzera italiana, Via Giuseppe Buffi 13, CH-6900, Lugano, Switzerland
- Department of Chemistry and Applied Biosciences, ETH Zurich, c/o Università della Svizzera italiana, Via Giuseppe Buffi 13, CH-6900, Lugano, Switzerland
| |
Collapse
|
39
|
Grajciar L, Heard CJ, Bondarenko AA, Polynski MV, Meeprasert J, Pidko EA, Nachtigall P. Towards operando computational modeling in heterogeneous catalysis. Chem Soc Rev 2018; 47:8307-8348. [PMID: 30204184 PMCID: PMC6240816 DOI: 10.1039/c8cs00398j] [Citation(s) in RCA: 114] [Impact Index Per Article: 19.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2018] [Indexed: 12/19/2022]
Abstract
An increased synergy between experimental and theoretical investigations in heterogeneous catalysis has become apparent during the last decade. Experimental work has extended from ultra-high vacuum and low temperature towards operando conditions. These developments have motivated the computational community to move from standard descriptive computational models, based on inspection of the potential energy surface at 0 K and low reactant concentrations (0 K/UHV model), to more realistic conditions. The transition from 0 K/UHV to operando models has been backed by significant developments in computer hardware and software over the past few decades. New methodological developments, designed to overcome part of the gap between 0 K/UHV and operando conditions, include (i) global optimization techniques, (ii) ab initio constrained thermodynamics, (iii) biased molecular dynamics, (iv) microkinetic models of reaction networks and (v) machine learning approaches. The importance of the transition is highlighted by discussing how the molecular level picture of catalytic sites and the associated reaction mechanisms changes when the chemical environment, pressure and temperature effects are correctly accounted for in molecular simulations. It is the purpose of this review to discuss each method on an equal footing, and to draw connections between methods, particularly where they may be applied in combination.
Collapse
Affiliation(s)
- Lukáš Grajciar
- Department of Physical and Macromolecular Chemistry
, Faculty of Science
, Charles University in Prague
,
128 43 Prague 2
, Czech Republic
.
;
;
| | - Christopher J. Heard
- Department of Physical and Macromolecular Chemistry
, Faculty of Science
, Charles University in Prague
,
128 43 Prague 2
, Czech Republic
.
;
;
| | - Anton A. Bondarenko
- TheoMAT group
, ITMO University
,
Lomonosova 9
, St. Petersburg
, 191002
, Russia
| | - Mikhail V. Polynski
- TheoMAT group
, ITMO University
,
Lomonosova 9
, St. Petersburg
, 191002
, Russia
| | - Jittima Meeprasert
- Inorganic Systems Engineering group
, Department of Chemical Engineering
, Faculty of Applied Sciences
, Delft University of Technology
,
Van der Maasweg 9
, 2629 HZ Delft
, The Netherlands
.
| | - Evgeny A. Pidko
- TheoMAT group
, ITMO University
,
Lomonosova 9
, St. Petersburg
, 191002
, Russia
- Inorganic Systems Engineering group
, Department of Chemical Engineering
, Faculty of Applied Sciences
, Delft University of Technology
,
Van der Maasweg 9
, 2629 HZ Delft
, The Netherlands
.
| | - Petr Nachtigall
- Department of Physical and Macromolecular Chemistry
, Faculty of Science
, Charles University in Prague
,
128 43 Prague 2
, Czech Republic
.
;
;
| |
Collapse
|
40
|
Smith JS, Nebgen B, Lubbers N, Isayev O, Roitberg AE. Less is more: Sampling chemical space with active learning. J Chem Phys 2018; 148:241733. [DOI: 10.1063/1.5023802] [Citation(s) in RCA: 278] [Impact Index Per Article: 46.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
Affiliation(s)
- Justin S. Smith
- Department of Chemistry, University of Florida, Gainesville, Florida 32611, USA
| | - Ben Nebgen
- Los Alamos National Laboratory, Los Alamos, New Mexico 87545, USA
| | - Nicholas Lubbers
- Los Alamos National Laboratory, Los Alamos, New Mexico 87545, USA
| | - Olexandr Isayev
- UNC Eshelman School of Pharmacy, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599, USA
| | - Adrian E. Roitberg
- Department of Chemistry, University of Florida, Gainesville, Florida 32611, USA
| |
Collapse
|
41
|
Rupp M, von Lilienfeld OA, Burke K. Guest Editorial: Special Topic on Data-Enabled Theoretical Chemistry. J Chem Phys 2018; 148:241401. [DOI: 10.1063/1.5043213] [Citation(s) in RCA: 62] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023] Open
Affiliation(s)
- Matthias Rupp
- Fritz Haber Institute of the Max Planck Society, Faradayweg 4-6, 14195 Berlin, Germany
| | - O. Anatole von Lilienfeld
- Department of Chemistry, Institute of Physical Chemistry and National Center for Computational Design and Discovery of Novel Materials, University of Basel, 4056 Basel, Switzerland
| | - Kieron Burke
- Departments of Chemistry and Physics, University of California, Irvine, California 92697, USA
| |
Collapse
|