1
|
Watanabe N, Hori Y, Sugisawa H, Ida T, Shoji M, Shigeta Y. A machine learning potential construction based on radial distribution function sampling. J Comput Chem 2024. [PMID: 39225311 DOI: 10.1002/jcc.27497] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2024] [Revised: 08/09/2024] [Accepted: 08/15/2024] [Indexed: 09/04/2024]
Abstract
Sampling reference data is crucial in machine learning potential (MLP) construction. Inadequate coverage of local configurations in reference data may lead to unphysical behaviors in MLP-based molecular dynamics (MLP-MD) simulations. To address this problem, this study proposes a new on-the-fly reference data sampling method called radial distribution function (RDF)-based data sampling for MLP construction. This method detects and extracts anomalous structures from the trajectories of MLP-MD simulations by focusing on the shapes of RDFs. The detected structures are added to the reference data to improve the accuracy of the MLP. This method allows us to realize a reasonable MLP construction for liquid water with minimal additional data. We prepare data from an H2O molecular cluster system and verify whether the constructed MLPs are practical for bulk water systems. MLP-MD simulations without RDF-based data sampling show unphysical behaviors, such as atomic collisions. In contrast, after applying this method, we obtain MLP-MD trajectories with features, such as RDF shapes and angle distributions, that are comparable to those of ab initio MD simulations. Our simulation results demonstrate that the RDF-based data sampling approach is useful for constructing MLPs that are robust to extrapolations from molecular cluster systems to bulk systems without any specialized know-how.
Collapse
Affiliation(s)
- Natsuki Watanabe
- Center for Computational Sciences, University of Tsukuba, Tsukuba, Japan
- Graduate School of Pure and Applied Sciences, University of Tsukuba, Tsukuba, Japan
| | - Yuta Hori
- Center for Computational Sciences, University of Tsukuba, Tsukuba, Japan
| | - Hiroki Sugisawa
- Science & Innovation Center, Mitsubishi Chemical Corporation, Yokohama, Japan
| | - Tomonori Ida
- Division of Material Chemistry, Graduate School of Natural Science and Technology, Kanazawa University, Kanazawa, Japan
| | - Mitsuo Shoji
- Center for Computational Sciences, University of Tsukuba, Tsukuba, Japan
| | - Yasuteru Shigeta
- Center for Computational Sciences, University of Tsukuba, Tsukuba, Japan
| |
Collapse
|
2
|
Faraji S, Liu M. Transferable machine learning interatomic potential for carbon hydrogen systems. Phys Chem Chem Phys 2024; 26:22346-22358. [PMID: 39140158 DOI: 10.1039/d4cp02300e] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/15/2024]
Abstract
In this study, we developed a machine learning interatomic potential based on artificial neural networks (ANN) to model carbon-hydrogen (C-H) systems. The ANN potential was trained on a dataset of C-H clusters obtained through density functional theory (DFT) calculations. Through comprehensive evaluations against DFT results, including predictions of geometries and formation energies across 0D-3D systems comprising C and C-H, as well as modeling various chemical processes, the ANN potential demonstrated exceptional accuracy and transferability. Its capability to accurately predict lattice dynamics, crucial for stability assessment in crystal structure prediction, was also verified through phonon dispersion analysis. Notably, its accuracy and computational efficiency in calculating force constants facilitated the exploration of complex energy landscapes, leading to the discovery of a novel C polymorph. These results underscore the robustness and versatility of the ANN potential, highlighting its efficacy in advancing computational materials science by conducting precise atomistic simulations on a wide range of C-H materials.
Collapse
Affiliation(s)
- Somayeh Faraji
- Department of Chemistry, University of Florida, Gainesville, FL 32611, USA.
| | - Mingjie Liu
- Department of Chemistry, University of Florida, Gainesville, FL 32611, USA.
| |
Collapse
|
3
|
Kahle L, Minisini B, Bui T, First JT, Buda C, Goldman T, Wimmer E. A dual-cutoff machine-learned potential for condensed organic systems obtained via uncertainty-guided active learning. Phys Chem Chem Phys 2024; 26:22665-22680. [PMID: 39158948 DOI: 10.1039/d4cp01980f] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/20/2024]
Abstract
Machine-learned potentials (MLPs) trained on ab initio data combine the computational efficiency of classical interatomic potentials with the accuracy and generality of the first-principles method used in the creation of the respective training set. In this work, we implement and train a MLP to obtain an accurate description of the potential energy surface and property predictions for organic compounds, as both single molecules and in the condensed phase. We devise a dual descriptor, based on the atomic cluster expansion (ACE), that couples an information-rich short-range description with a coarser long-range description that captures weak intermolecular interactions. We employ uncertainty-guided active learning for the training set generation, creating a dataset that is comparatively small for the breadth of application and consists of alcohols, alkanes, and an adipate. Utilizing that MLP, we calculate densities of those systems of varying chain lengths as a function of temperature, obtaining a discrepancy of less than 4% compared with experiment. Vibrational frequencies calculated with the MLP have a root mean square error of less than 1 THz compared to DFT. The heat capacities of condensed systems are within 11% of experimental findings, which is strong evidence that the dual descriptor provides an accurate framework for the prediction of both short-range intramolecular and long-range intermolecular interactions.
Collapse
Affiliation(s)
- Leonid Kahle
- Materials Design SARL, 42 avenue Verdier, 92120 Montrouge, France.
| | - Benoit Minisini
- Materials Design SARL, 42 avenue Verdier, 92120 Montrouge, France.
| | - Tai Bui
- bp Exploration Operating Co. Ltd, Chertsey Road, Sunbury-on-Thames TW16 7LN, UK
| | - Jeremy T First
- bp, Center for High Performance Computing, 225 Westlake Park Blvd, Houston, TX 77079, USA
| | - Corneliu Buda
- bp Exploration Operating Co. Ltd, Chertsey Road, Sunbury-on-Thames TW16 7LN, UK
| | - Thomas Goldman
- bp Exploration Operating Co. Ltd, Chertsey Road, Sunbury-on-Thames TW16 7LN, UK
| | - Erich Wimmer
- Materials Design SARL, 42 avenue Verdier, 92120 Montrouge, France.
| |
Collapse
|
4
|
Fisher KE, Herbst MF, Marzouk YM. Multitask methods for predicting molecular properties from heterogeneous data. J Chem Phys 2024; 161:014114. [PMID: 38958501 DOI: 10.1063/5.0201681] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2024] [Accepted: 06/12/2024] [Indexed: 07/04/2024] Open
Abstract
Data generation remains a bottleneck in training surrogate models to predict molecular properties. We demonstrate that multitask Gaussian process regression overcomes this limitation by leveraging both expensive and cheap data sources. In particular, we consider training sets constructed from coupled-cluster (CC) and density functional theory (DFT) data. We report that multitask surrogates can predict at CC-level accuracy with a reduction in data generation cost by over an order of magnitude. Of note, our approach allows the training set to include DFT data generated by a heterogeneous mix of exchange-correlation functionals without imposing any artificial hierarchy on functional accuracy. More generally, the multitask framework can accommodate a wider range of training set structures-including the full disparity between the different levels of fidelity-than existing kernel approaches based on Δ-learning although we show that the accuracy of the two approaches can be similar. Consequently, multitask regression can be a tool for reducing data generation costs even further by opportunistically exploiting existing data sources.
Collapse
Affiliation(s)
- K E Fisher
- Department of Aeronautics and Astronautics, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA
| | - M F Herbst
- Mathematics for Materials Modelling, Institute of Mathematics and Institute of Materials, École Polytechnique Fédérale de Lausanne, 1015 Lausanne, Switzerland
- National Centre for Computational Design and Discovery of Novel Materials (MARVEL), École Polytechnique Fédérale de Lausanne, 1015 Lausanne, Switzerland
| | - Y M Marzouk
- Department of Aeronautics and Astronautics, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA
| |
Collapse
|
5
|
Yao S, Song J, Jia L, Cheng L, Zhong Z, Song M, Feng Z. Fast and effective molecular property prediction with transferability map. Commun Chem 2024; 7:85. [PMID: 38632308 PMCID: PMC11024153 DOI: 10.1038/s42004-024-01169-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2023] [Accepted: 04/05/2024] [Indexed: 04/19/2024] Open
Abstract
Effective transfer learning for molecular property prediction has shown considerable strength in addressing insufficient labeled molecules. Many existing methods either disregard the quantitative relationship between source and target properties, risking negative transfer, or require intensive training on target tasks. To quantify transferability concerning task-relatedness, we propose Principal Gradient-based Measurement (PGM) for transferring molecular property prediction ability. First, we design an optimization-free scheme to calculate a principal gradient for approximating the direction of model optimization on a molecular property prediction dataset. We have analyzed the close connection between the principal gradient and model optimization through mathematical proof. PGM measures the transferability as the distance between the principal gradient obtained from the source dataset and that derived from the target dataset. Then, we perform PGM on various molecular property prediction datasets to build a quantitative transferability map for source dataset selection. Finally, we evaluate PGM on multiple combinations of transfer learning tasks across 12 benchmark molecular property prediction datasets and demonstrate that it can serve as fast and effective guidance to improve the performance of a target task. This work contributes to more efficient discovery of drugs, materials, and catalysts by offering a task-relatedness quantification prior to transfer learning and understanding the relationship between chemical properties.
Collapse
Affiliation(s)
- Shaolun Yao
- Collaborative Innovation Center of Artificial Intelligence by MOE and Zhejiang Provincial Government, Zhejiang University, 310027, Hangzhou, China
- College of Computer Science and Technology, Zhejiang University, 310027, Hangzhou, China
- Shanghai Institute for Advanced Study of Zhejiang University, 201203, Shanghai, China
| | - Jie Song
- Shanghai Institute for Advanced Study of Zhejiang University, 201203, Shanghai, China
- School of Software Technology, Zhejiang University, 315048, Ningbo, China
| | - Lingxiang Jia
- College of Computer Science and Technology, Zhejiang University, 310027, Hangzhou, China
| | - Lechao Cheng
- School of Computer Science and Information Engineering, Hefei University of Technology, 230009, Hefei, China
| | - Zipeng Zhong
- College of Computer Science and Technology, Zhejiang University, 310027, Hangzhou, China
| | - Mingli Song
- College of Computer Science and Technology, Zhejiang University, 310027, Hangzhou, China
- Shanghai Institute for Advanced Study of Zhejiang University, 201203, Shanghai, China
| | - Zunlei Feng
- Shanghai Institute for Advanced Study of Zhejiang University, 201203, Shanghai, China.
- School of Software Technology, Zhejiang University, 315048, Ningbo, China.
| |
Collapse
|
6
|
Dral PO. AI in computational chemistry through the lens of a decade-long journey. Chem Commun (Camb) 2024; 60:3240-3258. [PMID: 38444290 DOI: 10.1039/d4cc00010b] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/07/2024]
Abstract
This article gives a perspective on the progress of AI tools in computational chemistry through the lens of the author's decade-long contributions put in the wider context of the trends in this rapidly expanding field. This progress over the last decade is tremendous: while a decade ago we had a glimpse of what was to come through many proof-of-concept studies, now we witness the emergence of many AI-based computational chemistry tools that are mature enough to make faster and more accurate simulations increasingly routine. Such simulations in turn allow us to validate and even revise experimental results, deepen our understanding of the physicochemical processes in nature, and design better materials, devices, and drugs. The rapid introduction of powerful AI tools gives rise to unique challenges and opportunities that are discussed in this article too.
Collapse
Affiliation(s)
- Pavlo O Dral
- State Key Laboratory of Physical Chemistry of Solid Surfaces, College of Chemistry and Chemical Engineering, Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, and Innovation Laboratory for Sciences and Technologies of Energy Materials of Fujian Province (IKKEM), Xiamen University, Xiamen, Fujian 361005, China.
| |
Collapse
|
7
|
Martí C, Devereux C, Najm HN, Zádor J. Evaluation of Rate Coefficients in the Gas Phase Using Machine-Learned Potentials. J Phys Chem A 2024. [PMID: 38427974 DOI: 10.1021/acs.jpca.3c07872] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/03/2024]
Abstract
We assess the capability of machine-learned potentials to compute rate coefficients by training a neural network (NN) model and applying it to describe the chemical landscape on the C5H5 potential energy surface, which is relevant to molecular weight growth in combustion and interstellar media. We coupled the resulting NN with an automated kinetics workflow code, KinBot, to perform all necessary calculations to compute the rate coefficients. The NN is benchmarked exhaustively by evaluating its performance at the various stages of the kinetics calculations: from the electronic energy through the computation of zero point energy, barrier heights, entropic contributions, the portion of the PES explored, and finally the overall rate coefficients as formulated by transition state theory.
Collapse
Affiliation(s)
- Carles Martí
- Combustion Research Facility, Sandia National Laboratories, Livermore, California 94551, United States
| | - Christian Devereux
- Combustion Research Facility, Sandia National Laboratories, Livermore, California 94551, United States
| | - Habib N Najm
- Combustion Research Facility, Sandia National Laboratories, Livermore, California 94551, United States
| | - Judit Zádor
- Combustion Research Facility, Sandia National Laboratories, Livermore, California 94551, United States
| |
Collapse
|
8
|
Hedelius BE, Tingey D, Della Corte D. TrIP─Transformer Interatomic Potential Predicts Realistic Energy Surface Using Physical Bias. J Chem Theory Comput 2024; 20:199-211. [PMID: 38150692 DOI: 10.1021/acs.jctc.3c00936] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2023]
Abstract
Accurate interatomic energies and forces enable high-quality molecular dynamics simulations, torsion scans, potential energy surface mappings, and geometry optimizations. Machine learning algorithms have enabled rapid estimates of the energies and forces with high accuracy. Further development of machine learning algorithms holds promise for producing universal potentials that support many different atomic species. We present the Transformer Interatomic Potential (TrIP): a chemically sound potential based on the SE(3)-Transformer. TrIP's species-agnostic architecture, which uses continuous atomic representation and homogeneous graph convolutions, encourages parameter sharing between atomic species for more general representations of chemical environments, maintains a reasonable number of parameters, serves as a form of regularization, and is a step toward accurate universal interatomic potentials. TrIP achieves state-of-the-art accuracies on the COMP6 benchmark with an energy prediction of just 1.02 kcal/mol MAE. We introduce physical bias in the form of Ziegler-Biersack-Littmark-screened nuclear repulsion and constrained atomization energies. An energy scan of a water molecule demonstrates that these changes improve long- and near-range interactions compared to other neural network potentials. TrIP also demonstrates stability in molecular dynamics simulations, demonstrating reasonable exploration of Ramachandran space.
Collapse
Affiliation(s)
- Bryce E Hedelius
- Department of Physics and Astronomy, Brigham Young University, Provo, Utah 84602, United States
| | - Damon Tingey
- Department of Physics and Astronomy, Brigham Young University, Provo, Utah 84602, United States
| | - Dennis Della Corte
- Department of Physics and Astronomy, Brigham Young University, Provo, Utah 84602, United States
| |
Collapse
|
9
|
Vita JA, Fuemmeler EG, Gupta A, Wolfe GP, Tao AQ, Elliott RS, Martiniani S, Tadmor EB. ColabFit exchange: Open-access datasets for data-driven interatomic potentials. J Chem Phys 2023; 159:154802. [PMID: 37861121 DOI: 10.1063/5.0163882] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2023] [Accepted: 09/25/2023] [Indexed: 10/21/2023] Open
Abstract
Data-driven interatomic potentials (IPs) trained on large collections of first principles calculations are rapidly becoming essential tools in the fields of computational materials science and chemistry for performing atomic-scale simulations. Despite this, apart from a few notable exceptions, there is a distinct lack of well-organized, public datasets in common formats available for use with IP development. This deficiency precludes the research community from implementing widespread benchmarking, which is essential for gaining insight into model performance and transferability, and also limits the development of more general, or even universal, IPs. To address this issue, we introduce the ColabFit Exchange, the first database providing open access to a large collection of systematically organized datasets from multiple domains that is especially designed for IP development. The ColabFit Exchange is publicly available at https://colabfit.org, providing a web-based interface for exploring, downloading, and contributing datasets. Composed of data collected from the literature or provided by community researchers, the ColabFit Exchange currently (September 2023) consists of 139 datasets spanning nearly 70 000 unique chemistries, and is intended to continuously grow. In addition to outlining the software framework used for constructing and accessing the ColabFit Exchange, we also provide analyses of the data, quantifying the diversity of the database and proposing metrics for assessing the relative diversity of multiple datasets. Finally, we demonstrate an end-to-end IP development pipeline, utilizing datasets from the ColabFit Exchange, fitting tools from the KLIFF software package, and validation tests provided by the OpenKIM framework.
Collapse
Affiliation(s)
- Joshua A Vita
- Department of Materials Science and Engineering, University of Illinois Urbana-Champaign, Urbana, Illinois 61801, USA
| | - Eric G Fuemmeler
- Department of Aerospace Engineering and Mechanics, University of Minnesota, Minneapolis, Minnesota 55455, USA
| | - Amit Gupta
- Department of Aerospace Engineering and Mechanics, University of Minnesota, Minneapolis, Minnesota 55455, USA
| | - Gregory P Wolfe
- Center for Soft Matter Research, Department of Physics, New York University, New York, New York 10012, USA
| | - Alexander Quanming Tao
- Department of Aerospace Engineering and Mechanics, University of Minnesota, Minneapolis, Minnesota 55455, USA
| | - Ryan S Elliott
- Department of Aerospace Engineering and Mechanics, University of Minnesota, Minneapolis, Minnesota 55455, USA
| | - Stefano Martiniani
- Center for Soft Matter Research, Department of Physics, New York University, New York, New York 10012, USA
- Simons Center for Computational Physical Chemistry, Department of Chemistry, New York University, New York, New York 10012, USA
- Courant Institute of Mathematical Sciences, New York University, New York, New York 10112, USA
| | - Ellad B Tadmor
- Department of Aerospace Engineering and Mechanics, University of Minnesota, Minneapolis, Minnesota 55455, USA
| |
Collapse
|
10
|
Kovács DP, Batatia I, Arany ES, Csányi G. Evaluation of the MACE force field architecture: From medicinal chemistry to materials science. J Chem Phys 2023; 159:044118. [PMID: 37522405 DOI: 10.1063/5.0155322] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2023] [Accepted: 06/29/2023] [Indexed: 08/01/2023] Open
Abstract
The MACE architecture represents the state of the art in the field of machine learning force fields for a variety of in-domain, extrapolation, and low-data regime tasks. In this paper, we further evaluate MACE by fitting models for published benchmark datasets. We show that MACE generally outperforms alternatives for a wide range of systems, from amorphous carbon, universal materials modeling, and general small molecule organic chemistry to large molecules and liquid water. We demonstrate the capabilities of the model on tasks ranging from constrained geometry optimization to molecular dynamics simulations and find excellent performance across all tested domains. We show that MACE is very data efficient and can reproduce experimental molecular vibrational spectra when trained on as few as 50 randomly selected reference configurations. We further demonstrate that the strictly local atom-centered model is sufficient for such tasks even in the case of large molecules and weakly interacting molecular assemblies.
Collapse
Affiliation(s)
- Dávid Péter Kovács
- Engineering Laboratory, University of Cambridge, Cambridge CB2 1PZ, United Kingdom
| | - Ilyes Batatia
- Engineering Laboratory, University of Cambridge, Cambridge CB2 1PZ, United Kingdom
- ENS Paris-Saclay, Université Paris-Saclay, 91190 Gif-sur-Yvette, France
| | - Eszter Sára Arany
- School of Clinical Medicine, University of Cambridge, Cambridge CB2 0SP, United Kingdom
| | - Gábor Csányi
- Engineering Laboratory, University of Cambridge, Cambridge CB2 1PZ, United Kingdom
| |
Collapse
|
11
|
Li C, Gilbert B, Farrell S, Zarzycki P. Rapid Prediction of a Liquid Structure from a Single Molecular Configuration Using Deep Learning. J Chem Inf Model 2023. [PMID: 37307434 DOI: 10.1021/acs.jcim.3c00472] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Molecular dynamics simulation is an indispensable tool for understanding the collective behavior of atoms and molecules and the phases they form. Statistical mechanics provides accurate routes for predicting macroscopic properties as time-averages over visited molecular configurations - microstates. However, to obtain convergence, we need a sufficiently long record of visited microstates, which translates to the high-computational cost of the molecular simulations. In this work, we show how to use a point cloud-based deep learning strategy to rapidly predict the structural properties of liquids from a single molecular configuration. We tested our approach using three homogeneous liquids with progressively more complex entities and interactions: Ar, NO, and H2O under varying pressure and temperature conditions within the liquid state domain. Our deep neural network architecture allows rapid insight into the liquid structure, here probed by the radial distribution function, and can be used with molecular/atomistic configurations generated by either simulation, first-principle, or experimental methods.
Collapse
Affiliation(s)
- Chunhui Li
- Energy Geosciences Division, Lawrence Berkeley National Laboratory, 1 Cyclotron Road, Berkeley, California 94720, United States
| | - Benjamin Gilbert
- Energy Geosciences Division, Lawrence Berkeley National Laboratory, 1 Cyclotron Road, Berkeley, California 94720, United States
| | - Steven Farrell
- NERSC, Lawrence Berkeley National Laboratory, 1 Cyclotron Road, Berkeley, California 94720, United States
| | - Piotr Zarzycki
- Energy Geosciences Division, Lawrence Berkeley National Laboratory, 1 Cyclotron Road, Berkeley, California 94720, United States
| |
Collapse
|