1
|
Bukowy T, Brown ML, Popelier PLA. Toward Gaussian Process Regression Modeling of a Urea Force Field. J Phys Chem A 2024; 128:8551-8560. [PMID: 39303098 PMCID: PMC11457224 DOI: 10.1021/acs.jpca.4c04117] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2024] [Revised: 08/29/2024] [Accepted: 09/09/2024] [Indexed: 09/22/2024]
Abstract
FFLUX is a next-generation, machine-learnt force field built on three cornerstones: quantum chemical topology, Gaussian process regression, and (high-rank) multipolar electrostatics. It is capable of performing molecular dynamics with near-quantum accuracy at a lower computational cost than standard ab initio molecular dynamics. Previous work with FFLUX was concerned with water and formamide. In this study, we go one step further and challenge FFLUX to model urea, a larger and more flexible system. In result, we have trained urea models at the B3LYP/aug-cc-pVTZ level of theory, with a mean absolute error of 0.4 kJ mol-1 and a maximum prediction error below 7.0 kJ mol-1. To test their performance in molecular dynamics simulations, two sets of FFLUX geometry optimizations were carried out: 5 dimers corresponding to energy minima and 75 random dimers. The 5 dimers were recovered with a root-mean-square deviation below 0.1 Å with respect to their ab initio references. Out of the 75 random dimers, 68% converged to the qualitatively same dimer as those obtained at the ab initio level. Furthermore, we have ranked the 5 FFLUX-optimized dimers in the order of their relative FFLUX single-point energies and compared them with the ab initio method. The energy ranking fully agreed but for one crossover between two successive minima. Finally, we have demonstrated the importance of geometry-dependent (i.e., flexible) multipole moments, showing that the lack of multipole moment flexibility can lead to average errors in the total intermolecular electrostatic energy of more than 2 orders of magnitude.
Collapse
Affiliation(s)
- Tomasz Bukowy
- Department of Chemistry, University of Manchester, Manchester M13 9PL, Great
Britain
| | - Matthew L. Brown
- Department of Chemistry, University of Manchester, Manchester M13 9PL, Great
Britain
| | - Paul L. A. Popelier
- Department of Chemistry, University of Manchester, Manchester M13 9PL, Great
Britain
| |
Collapse
|
2
|
Isamura BK, Popelier PLA. Transfer learning of hyperparameters for fast construction of anisotropic GPR models: design and application to the machine-learned force field FFLUX. Phys Chem Chem Phys 2024; 26:23677-23691. [PMID: 39224929 PMCID: PMC11369757 DOI: 10.1039/d4cp01862a] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2024] [Accepted: 08/22/2024] [Indexed: 09/04/2024]
Abstract
The polarisable machine-learned force field FFLUX requires pre-trained anisotropic Gaussian process regression (GPR) models of atomic energies and multipole moments to propagate unbiased molecular dynamics simulations. The outcome of FFLUX simulations is highly dependent on the predictive accuracy of the underlying models whose training entails determining the optimal set of model hyperparameters. Unfortunately, traditional direct learning (DL) procedures do not scale well on this task, especially when the hyperparameter search is initiated from a (set of) random guess solution(s). Additionally, the complexity of the hyperparameter space (HS) increases with the number of geometrical input features, at least for anisotropic kernels, making the optimization of hyperparameters even more challenging. In this study, we propose a transfer learning (TL) protocol that accelerates the training process of anisotropic GPR models by facilitating access to promising regions of the HS. The protocol is based on a seeding-relaxation mechanism in which an excellent guess solution is identified by rapidly building one or several small source models over a subset of the target training set before readjusting the previous guess over the entire set. We demonstrate the performance of this protocol by building and assessing the performance of DL and TL models of atomic energies and charges in various conformations of benzene, ethanol, formic acid dimer and the drug fomepizole. Our experiments suggest that TL models can be built one order of magnitude faster while preserving the quality of their DL analogs. Most importantly, when deployed in FFLUX simulations, TL models compete with or even outperform their DL analogs when it comes to performing FFLUX geometry optimization and computing harmonic vibrational modes.
Collapse
Affiliation(s)
- Bienfait K Isamura
- Department of Chemistry, The University of Manchester, Manchester, M13 9PL, UK.
| | - Paul L A Popelier
- Department of Chemistry, The University of Manchester, Manchester, M13 9PL, UK.
| |
Collapse
|
3
|
Aristoff D, Johnson M, Simpson G, Webber RJ. The fast committor machine: Interpretable prediction with kernels. J Chem Phys 2024; 161:084113. [PMID: 39193940 DOI: 10.1063/5.0222798] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2024] [Accepted: 08/07/2024] [Indexed: 08/29/2024] Open
Abstract
In the study of stochastic systems, the committor function describes the probability that a system starting from an initial configuration x will reach a set B before a set A. This paper introduces an efficient and interpretable algorithm for approximating the committor, called the "fast committor machine" (FCM). The FCM uses simulated trajectory data to build a kernel-based model of the committor. The kernel function is constructed to emphasize low-dimensional subspaces that optimally describe the A to B transitions. The coefficients in the kernel model are determined using randomized linear algebra, leading to a runtime that scales linearly with the number of data points. In numerical experiments involving a triple-well potential and alanine dipeptide, the FCM yields higher accuracy and trains more quickly than a neural network with the same number of parameters. The FCM is also more interpretable than the neural net.
Collapse
Affiliation(s)
- David Aristoff
- Mathematics, Colorado State University, Fort Collins, Colorado 80523, USA
| | - Mats Johnson
- Mathematics, Colorado State University, Fort Collins, Colorado 80523, USA
| | - Gideon Simpson
- Mathematics, Drexel University, Philadelphia, Pennsylvania 19104, USA
| | - Robert J Webber
- Mathematics, University of California San Diego, La Jolla, California 92093, USA
| |
Collapse
|
4
|
Brown M, Isamura BK, Skelton JM, Popelier PLA. Incorporating Noncovalent Interactions in Transfer Learning Gaussian Process Regression Models for Molecular Simulations. J Chem Theory Comput 2024; 20:5994-6008. [PMID: 38981081 PMCID: PMC11270819 DOI: 10.1021/acs.jctc.4c00402] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2024] [Revised: 05/30/2024] [Accepted: 06/18/2024] [Indexed: 07/11/2024]
Abstract
FFLUX is a quantum chemical topology-based multipolar force field that uses Gaussian process regression machine learning models to predict atomic energies and multipole moments on the fly for fast and accurate molecular dynamics simulations. These models have previously been trained on monomers, meaning that many-body effects, for example, intermolecular charge transfer, are missed in simulations. Moreover, dispersion and repulsion have been modeled using Lennard-Jones potentials, necessitating careful parametrization. In this work, we take an important step toward addressing these shortcomings and show that models trained on clusters, in this case, a dimer, can be used in FFLUX simulations by preparing and benchmarking a formamide dimer model. To mitigate the computational costs associated with training higher-dimensional models, we rely on the transfer of hyperparameters from a smaller source model to a larger target model, enabling an order of magnitude faster training than with a direct learning approach. The dimer model allows for simulations that account for two-body effects, including intermolecular polarization and charge penetration, and that do not require nonbonded potentials. We show that addressing these limitations allows for simulations that are closer to quantum mechanics than previously possible with the monomeric models.
Collapse
Affiliation(s)
- Matthew
L. Brown
- Department of Chemistry, The University of Manchester, Oxford Road, Manchester M13 9PL, United
Kingdom
| | - Bienfait K. Isamura
- Department of Chemistry, The University of Manchester, Oxford Road, Manchester M13 9PL, United
Kingdom
| | - Jonathan M. Skelton
- Department of Chemistry, The University of Manchester, Oxford Road, Manchester M13 9PL, United
Kingdom
| | - Paul L. A. Popelier
- Department of Chemistry, The University of Manchester, Oxford Road, Manchester M13 9PL, United
Kingdom
| |
Collapse
|
5
|
Shanks BL, Sullivan HW, Shazed AR, Hoepfner MP. Accelerated Bayesian Inference for Molecular Simulations using Local Gaussian Process Surrogate Models. J Chem Theory Comput 2024; 20:3798-3808. [PMID: 38551198 DOI: 10.1021/acs.jctc.3c01358] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/19/2024]
Abstract
While Bayesian inference is the gold standard for uncertainty quantification and propagation, its use within physical chemistry encounters formidable computational barriers. These bottlenecks are magnified for modeling data with many independent variables, such as X-ray/neutron scattering patterns and electromagnetic spectra. To address this challenge, we employ local Gaussian process (LGP) surrogate models to accelerate Bayesian optimization over these complex thermophysical properties. The time-complexity of the LGPs scales linearly in the number of independent variables, in stark contrast to the computationally expensive cubic scaling of conventional Gaussian processes. To illustrate the method, we trained a LGP surrogate model on the radial distribution function of liquid neon and observed a 1,760,000-fold speed-up compared to molecular dynamics simulation, beating a conventional GP by three orders-of-magnitude. We conclude that LGPs are robust and efficient surrogate models poised to expand the application of Bayesian inference in molecular simulations to a broad spectrum of experimental data.
Collapse
Affiliation(s)
- Brennon L Shanks
- Department of Chemical Engineering, University of Utah, Salt Lake City, UT 84112-9202, United States
| | - Harry W Sullivan
- Department of Chemical Engineering, University of Utah, Salt Lake City, UT 84112-9202, United States
| | - Abdur R Shazed
- Department of Chemical Engineering, University of Utah, Salt Lake City, UT 84112-9202, United States
| | - Michael P Hoepfner
- Department of Chemical Engineering, University of Utah, Salt Lake City, UT 84112-9202, United States
| |
Collapse
|
6
|
Sahre MJ, von Rudorff GF, Marquetand P, von Lilienfeld OA. Transferability of atomic energies from alchemical decomposition. J Chem Phys 2024; 160:054106. [PMID: 38341696 DOI: 10.1063/5.0187298] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2023] [Accepted: 01/09/2024] [Indexed: 02/13/2024] Open
Abstract
We study alchemical atomic energy partitioning as a method to estimate atomization energies from atomic contributions, which are defined in physically rigorous and general ways through the use of the uniform electron gas as a joint reference. We analyze quantitatively the relation between atomic energies and their local environment using a dataset of 1325 organic molecules. The atomic energies are transferable across various molecules, enabling the prediction of atomization energies with a mean absolute error of 23 kcal/mol, comparable to simple statistical estimates but potentially more robust given their grounding in the physics-based decomposition scheme. A comparative analysis with other decomposition methods highlights its sensitivity to electrostatic variations, underlining its potential as a representation of the environment as well as in studying processes like diffusion in solids characterized by significant electrostatic shifts.
Collapse
Affiliation(s)
- Michael J Sahre
- Vienna Doctoral School in Chemistry (DoSChem) and Institute of Theoretical Chemistry and Faculty of Physics, University of Vienna, 1090 Vienna, Austria
| | - Guido Falk von Rudorff
- Department of Chemistry, University Kassel, Heinrich-Plett-Str.40, 34132 Kassel, Germany
- Center for Interdisciplinary Nanostructure Science and Technology (CINSaT), Heinrich-Plett-Straße 40, 34132 Kassel, Germany
| | - Philipp Marquetand
- Faculty of Chemistry, Institute of Theoretical Chemistry, University of Vienna, Währinger Str. 17, 1090 Vienna, Austria
| | - O Anatole von Lilienfeld
- Vienna Doctoral School in Chemistry (DoSChem) and Institute of Theoretical Chemistry and Faculty of Physics, University of Vienna, 1090 Vienna, Austria
- Chemical Physics Theory Group, Department of Chemistry, University of Toronto, St. George Campus, Toronto, M5S 3H6 Ontario, Canada
- Department of Materials Science and Engineering, University of Toronto, St. George Campus, Toronto, M5S 3E4 Ontario, Canada
- Vector Institute for Artificial Intelligence, Toronto, M5S 1M1 Ontario, Canada
- ML Group, Technische Universität Berlin and Institute for the Foundations of Learning and Data, 10587 Berlin, Germany
- Berlin Institute for the Foundations of Learning and Data, 10587 Berlin, Germany
- Department of Physics, University of Toronto, St. George Campus, Toronto, M5S 1A7 Ontario, Canada
| |
Collapse
|
7
|
Brown M, Skelton JM, Popelier PLA. Application of the FFLUX Force Field to Molecular Crystals: A Study of Formamide. J Chem Theory Comput 2023; 19:7946-7959. [PMID: 37847867 PMCID: PMC10653110 DOI: 10.1021/acs.jctc.3c00578] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2023] [Indexed: 10/19/2023]
Abstract
In this work, we present the first application of the quantum chemical topology force field FFLUX to the solid state. FFLUX utilizes Gaussian process regression machine learning models trained on data from the interacting quantum atom partitioning scheme to predict atomic energies and flexible multipole moments that change with geometry. Here, the ambient (α) and high-pressure (β) polymorphs of formamide are used as test systems and optimized using FFLUX. Optimizing the structures with increasing multipolar ranks indicates that the lattice parameters of the α phase differ by less than 5% to the experimental structure when multipole moments up to the quadrupole are used. These differences are found to be in line with the dispersion-corrected density functional theory. Lattice dynamics calculations are also found to be possible using FFLUX, yielding harmonic phonon spectra comparable to dispersion-corrected DFT while enabling larger supercells to be considered than is typically possible with first-principles calculations. These promising results indicate that FFLUX can be used to accurately determine properties of molecular solids that are difficult to access using DFT, including the structural dynamics, free energies, and properties at finite temperature.
Collapse
Affiliation(s)
- Matthew
L. Brown
- Department of Chemistry, The University of Manchester, Oxford Road, Manchester M13 9PL, Britain
| | - Jonathan M. Skelton
- Department of Chemistry, The University of Manchester, Oxford Road, Manchester M13 9PL, Britain
| | - Paul L. A. Popelier
- Department of Chemistry, The University of Manchester, Oxford Road, Manchester M13 9PL, Britain
| |
Collapse
|
8
|
Broad J, Wheatley RJ, Graham RS. Parallel Implementation of Nonadditive Gaussian Process Potentials for Monte Carlo Simulations. J Chem Theory Comput 2023. [PMID: 37368843 DOI: 10.1021/acs.jctc.3c00113] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/29/2023]
Abstract
A strategy is presented to implement Gaussian process potentials in molecular simulations through parallel programming. Attention is focused on the three-body nonadditive energy, though all algorithms extend straightforwardly to the additive energy. The method to distribute pairs and triplets between processes is general to all potentials. Results are presented for a simulation box of argon, including full box and atom displacement calculations, which are relevant to Monte Carlo simulation. Data on speed-up are presented for up to 120 processes across four nodes. A 4-fold speed-up is observed over five processes, extending to 20-fold over 40 processes and 30-fold over 120 processes.
Collapse
Affiliation(s)
- Jack Broad
- Molecular Foundry, Lawrence Berkeley National Laboratory, Berkeley, California 94720, United States
| | - Richard J Wheatley
- School of Chemistry, University of Nottingham, Nottingham, NG7 2RD, England
| | - Richard S Graham
- School of Mathematical Sciences, University of Nottingham, Nottingham, NG7 2RD, England
| |
Collapse
|
9
|
Burn M, Popelier PLA. Gaussian Process Regression Models for Predicting Atomic Energies and Multipole Moments. J Chem Theory Comput 2023; 19:1370-1380. [PMID: 36757024 PMCID: PMC9979601 DOI: 10.1021/acs.jctc.2c00731] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/10/2023]
Abstract
Developing a force field is a difficult task because its design is typically pulled in opposite directions by speed and accuracy. FFLUX breaks this trend by utilizing Gaussian process regression (GPR) to predict, at ab initio accuracy, atomic energies and multipole moments as obtained from the quantum theory of atoms in molecules (QTAIM). This work demonstrates that the in-house FFLUX training pipeline can generate successful GPR models for six representative molecules: peptide-capped glycine and alanine, glucose, paracetamol, aspirin, and ibuprofen. The molecules were sufficiently distorted to represent configurations from an AMBER-GAFF2 molecular dynamics run. All internal degrees of freedom were covered corresponding to 93 dimensions in the case of the largest molecule ibuprofen (33 atoms). Benefiting from active learning, the GPR models contain only about 2000 training points and return largely sub-kcal mol-1 prediction errors for the validation sets. A proof of concept has been reached for transferring the model produced through active learning on one atomic property to that of the remaining atomic properties. The prediction of electrostatic interaction can be assessed at the intermolecular level, and the vast majority of interactions have a root-mean-square error of less than 0.1 kJ mol-1 with a maximum value of ∼1 kJ mol-1 for a glycine and paracetamol dimer.
Collapse
|
10
|
Brown M, Skelton JM, Popelier PLA. Construction of a Gaussian Process Regression Model of Formamide for Use in Molecular Simulations. J Phys Chem A 2023; 127:1702-1714. [PMID: 36756842 PMCID: PMC9969515 DOI: 10.1021/acs.jpca.2c06566] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/10/2023]
Abstract
FFLUX, a novel force field based on quantum chemical topology, can perform molecular dynamics simulations with flexible multipole moments that change with geometry. This is enabled by Gaussian process regression machine learning models, which accurately predict atomic energies and multipole moments up to the hexadecapole. We have constructed a model of the formamide monomer at the B3LYP/aug-cc-pVTZ level of theory capable of sub-kJ mol-1 accuracy, with the maximum prediction error for the molecule being 0.8 kJ mol-1. This model was used in FFLUX simulations along with Lennard-Jones parameters to successfully optimize the geometry of formamide dimers with errors smaller than 0.1 Å compared to those obtained with D3-corrected B3LYP/aug-cc-pVTZ. Comparisons were also made to a force field constructed with static multipole moments and Lennard-Jones parameters. FFLUX recovers the expected energy ranking of dimers compared to the literature, and changes in C═O and C-N bond lengths associated with hydrogen bonding were found to be consistent with density functional theory.
Collapse
|
11
|
Burn MJ, Popelier PLA. Producing chemically accurate atomic Gaussian process regression models by active learning for molecular simulation. J Comput Chem 2022; 43:2084-2098. [PMID: 36165338 PMCID: PMC9828508 DOI: 10.1002/jcc.27006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2022] [Revised: 08/20/2022] [Accepted: 08/24/2022] [Indexed: 01/12/2023]
Abstract
Machine learning is becoming increasingly more important in the field of force field development. Never has it been more vital to have chemically accurate machine learning potentials because force fields become more sophisticated and their applications expand. In this study a method for developing chemically accurate Gaussian process regression models is demonstrated for an increasingly complex set of molecules. This work is an extension to previous work showing the progression of the active learning technique in producing more accurate models in much less CPU time than ever before. The per-atom active learning approach has unlocked the potential to generate chemically accurate models for molecules such as peptide-capped glycine.
Collapse
Affiliation(s)
- Matthew J. Burn
- Manchester Institute of BiotechnologyThe University of ManchesterManchesterUK,Department of ChemistryThe University of ManchesterManchesterUK
| | - Paul L. A. Popelier
- Manchester Institute of BiotechnologyThe University of ManchesterManchesterUK,Department of ChemistryThe University of ManchesterManchesterUK
| |
Collapse
|
12
|
Popelier PLA. Non-covalent interactions from a Quantum Chemical Topology perspective. J Mol Model 2022; 28:276. [PMID: 36006513 PMCID: PMC9411098 DOI: 10.1007/s00894-022-05188-7] [Citation(s) in RCA: 18] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2021] [Accepted: 02/07/2022] [Indexed: 11/12/2022]
Abstract
About half a century after its little-known beginnings, the quantum topological approach called QTAIM has grown into a widespread, but still not mainstream, methodology of interpretational quantum chemistry. Although often confused in textbooks with yet another population analysis, be it perhaps an elegant but somewhat esoteric one, QTAIM has been enriched with about a dozen other research areas sharing its main mathematical language, such as Interacting Quantum Atoms (IQA) or Electron Localisation Function (ELF), to form an overarching approach called Quantum Chemical Topology (QCT). Instead of reviewing the latter's role in understanding non-covalent interactions, we propose a number of ideas emerging from the full consequences of the space-filling nature of topological atoms, and discuss how they (will) impact on interatomic interactions, including non-covalent ones. The architecture of a force field called FFLUX, which is based on these ideas, is outlined. A new method called Relative Energy Gradient (REG) is put forward, which is able, by computation, to detect which fragments of a given molecular assembly govern the energetic behaviour of this whole assembly. This method can offer insight into the typical balance of competing atomic energies both in covalent and non-covalent case studies. A brief discussion on so-called bond critical points is given, highlighting concerns about their meaning, mainly in the arena of non-covalent interactions.
Collapse
Affiliation(s)
- Paul L A Popelier
- Department of Chemistry, University of Manchester, Oxford Road, Manchester, M13 9PL, Great Britain, UK.
| |
Collapse
|
13
|
Symons BCB, Popelier PLA. Application of Quantum Chemical Topology Force Field FFLUX to Condensed Matter Simulations: Liquid Water. J Chem Theory Comput 2022; 18:5577-5588. [PMID: 35939826 PMCID: PMC9476653 DOI: 10.1021/acs.jctc.2c00311] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
![]()
We present here the first application of the quantum
chemical topology
force field FFLUX to condensed matter simulations. FFLUX offers many-body
potential energy surfaces learnt exclusively from ab initio data using Gaussian process regression. FFLUX also includes high-rank,
polarizable multipole moments (up to quadrupole moments in this work)
that are learnt from the same ab initio calculations
as the potential energy surfaces. Many-body effects (where a body
is an atom) and polarization are captured by the machine learning
models. The choice to use machine learning in this way allows the
force field’s representation of reality to be improved (e.g., by including higher order many-body effects) with
little detriment to the computational scaling of the code. In this
manner, FFLUX is inherently future-proof. The “plug and play”
nature of the machine learning models also ensures that FFLUX can
be applied to any system of interest, not just liquid water. In this
work we study liquid water across a range of temperatures and compare
the predicted bulk properties to experiment as well as other state-of-the-art
force fields AMOEBA(+CF), HIPPO, MB-Pol and SIBFA21. We find that
FFLUX finds a place amongst these.
Collapse
Affiliation(s)
- Benjamin C B Symons
- Department of Chemistry, University of Manchester, Oxford Road, Manchester M13 9PL, Great Britain
| | - Paul L A Popelier
- Department of Chemistry, University of Manchester, Oxford Road, Manchester M13 9PL, Great Britain
| |
Collapse
|
14
|
Teng C, Wang Y, Huang D, Martin K, Tristan JB, Bao JL. Dual-Level Training of Gaussian Processes with Physically Inspired Priors for Geometry Optimizations. J Chem Theory Comput 2022; 18:5739-5754. [PMID: 35939760 DOI: 10.1021/acs.jctc.2c00546] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Gaussian process (GP) regression has been recently developed as an effective method in molecular geometry optimization. The prior mean function is one of the crucial parts of the GP. We design and validate two types of physically inspired prior mean functions: force-field-based priors and posterior-type priors. In this work, we implement a dual-level training (DLT) optimizer for the posterior-type priors. The DLT optimizers can be considered as a class of optimization algorithms that belong to the delta-machine learning paradigm but with several major differences compared to the previously proposed algorithms in the same paradigm. In the first level of the DLT, we incorporate the classical mechanical descriptions of the equilibrium geometries into the prior function, which enhances the performance of the GP optimizer as compared to the one using a constant (or zero) prior. In the second level, we utilize the surrogate potential energy surfaces (PESs), which incorporate the physics learned in the first-level training, as the prior function to refine the model performance further. We find that the force-field-based priors and posterior-type priors reduce the overall optimization steps by a factor of 2-3 when compared to the limited-memory Broyden-Fletcher-Goldfarb-Shanno (L-BFGS) optimizer as well as the constant-prior GP optimizer proposed in previous works. We also demonstrate the potential of recovering the real PESs with GP with a force-field prior. This work shows the importance of including domain knowledge as an ingredient in the GP, which offers a potentially robust learning model for molecular geometry optimization and for exploring molecular PESs.
Collapse
Affiliation(s)
- Chong Teng
- Department of Chemistry, Boston College, Chestnut Hill, Massachusetts 02467, United States
| | - Yang Wang
- Department of Chemistry, Boston College, Chestnut Hill, Massachusetts 02467, United States
| | - Daniel Huang
- Department of Computer Science, San Francisco State University, San Francisco, California 94132, United States
| | - Katherine Martin
- Department of Chemistry, Boston College, Chestnut Hill, Massachusetts 02467, United States
| | - Jean-Baptiste Tristan
- Department of Computer Science, Boston College, Chestnut Hill, Massachusetts 02467, United States
| | - Junwei Lucas Bao
- Department of Chemistry, Boston College, Chestnut Hill, Massachusetts 02467, United States
| |
Collapse
|
15
|
la Vega ASD, Duarte LJ, Silva AF, Skelton JM, Rocha-Rinza T, Popelier PLA. Towards an atomistic understanding of polymorphism in molecular solids. Phys Chem Chem Phys 2022; 24:11278-11294. [PMID: 35481948 DOI: 10.1039/d2cp00457g] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Abstract
Understanding and controlling polymorphism in molecular solids is a major unsolved problem in crystal engineering. While the ability to calculate accurate lattice energies with atomistic modelling provides valuable insight into the associated energy scales, existing methods cannot connect energy differences to the delicate balances of intra- and intermolecular forces that ultimately determine polymorph stability ordering. We report herein a protocol for applying Quantum Chemical Topology (QCT) to study the key intra- and intermolecular interactions in molecular solids, which we use to compare the three known polymorphs of succinic acid including the recently-discovered γ form. QCT provides a rigorous partitioning of the total energy into contributions associated with topological atoms, and a quantitative and chemically intuitive description of the intra- and intermolecular interactions. The newly-proposed Relative Energy Gradient (REG) method ranks atomistic energy terms (steric, electrostatic and exchange) by their importance in constructing the total energy profile for a chemical process. We find that the conformation of the succinic acid molecule is governed by a balance of large and opposing electrostatic interactions, while the H-bond dimerisation is governed by a combination of electrostatics and sterics. In the solids, an atomistic energy balance emerges that governs the contraction, towards the equilibrium geometry, of a molecular cluster representing the bulk crystal. The protocol we put forward is as general as the capabilities of the underlying quantum-mechanical model and it can provide novel perspectives on polymorphism in a wide range of chemical systems.
Collapse
Affiliation(s)
- Arturo Sauza-de la Vega
- Instituto de Química, Universidad Nacional Autónoma de México (UNAM), Circuito Exterior, Ciudad Universitaria, Delegación Coyoacán C.P. 0.4510, Mexico City, Mexico
| | - Leonardo J Duarte
- Manchester Institute of Biotechnology, Univ. of Manchester, 131 Princess Street, Manchester, M1 7DN, UK. .,Instituto de Química, Universidade Estadual de Campinas (UNICAMP), CP 6154, Campinas, SP, CEP 13.083-970, Brazil
| | - Arnaldo F Silva
- Manchester Institute of Biotechnology, Univ. of Manchester, 131 Princess Street, Manchester, M1 7DN, UK.
| | - Jonathan M Skelton
- Department of Chemistry, University of Manchester, Oxford Road, Manchester, M13 9PL, UK
| | - Tomás Rocha-Rinza
- Instituto de Química, Universidad Nacional Autónoma de México (UNAM), Circuito Exterior, Ciudad Universitaria, Delegación Coyoacán C.P. 0.4510, Mexico City, Mexico
| | - Paul L A Popelier
- Manchester Institute of Biotechnology, Univ. of Manchester, 131 Princess Street, Manchester, M1 7DN, UK. .,Department of Chemistry, University of Manchester, Oxford Road, Manchester, M13 9PL, UK
| |
Collapse
|
16
|
Pinheiro M, Ge F, Ferré N, Dral PO, Barbatti M. Choosing the right molecular machine learning potential. Chem Sci 2021; 12:14396-14413. [PMID: 34880991 PMCID: PMC8580106 DOI: 10.1039/d1sc03564a] [Citation(s) in RCA: 55] [Impact Index Per Article: 18.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2021] [Accepted: 09/14/2021] [Indexed: 11/21/2022] Open
Abstract
Quantum-chemistry simulations based on potential energy surfaces of molecules provide invaluable insight into the physicochemical processes at the atomistic level and yield such important observables as reaction rates and spectra. Machine learning potentials promise to significantly reduce the computational cost and hence enable otherwise unfeasible simulations. However, the surging number of such potentials begs the question of which one to choose or whether we still need to develop yet another one. Here, we address this question by evaluating the performance of popular machine learning potentials in terms of accuracy and computational cost. In addition, we deliver structured information for non-specialists in machine learning to guide them through the maze of acronyms, recognize each potential's main features, and judge what they could expect from each one.
Collapse
Affiliation(s)
- Max Pinheiro
- Aix Marseille University, CNRS, ICR Marseille France
| | - Fuchun Ge
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, Department of Chemistry, College of Chemistry and Chemical Engineering, Xiamen University China
| | - Nicolas Ferré
- Aix Marseille University, CNRS, ICR Marseille France
| | - Pavlo O Dral
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, Department of Chemistry, College of Chemistry and Chemical Engineering, Xiamen University China
| | - Mario Barbatti
- Aix Marseille University, CNRS, ICR Marseille France
- Institut Universitaire de France 75231 Paris France
| |
Collapse
|
17
|
Symons BCB, Bane MK, Popelier PLA. DL_FFLUX: A Parallel, Quantum Chemical Topology Force Field. J Chem Theory Comput 2021; 17:7043-7055. [PMID: 34617748 PMCID: PMC8582247 DOI: 10.1021/acs.jctc.1c00595] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
![]()
DL_FFLUX is a force
field based on quantum chemical topology that
can perform molecular dynamics for flexible molecules endowed with
polarizable atomic multipole moments (up to hexadecapole). Using the
machine learning method kriging (aka Gaussian process regression),
DL_FFLUX has access to atomic properties (energy, charge, dipole moment,
etc.) with quantum mechanical accuracy. Newly optimized and parallelized
using domain decomposition Message Passing Interface (MPI), DL_FFLUX
is now able to deliver this rigorous methodology at scale while still
in reasonable time frames. DL_FFLUX is delivered as an add-on to the
widely distributed molecular dynamics code DL_POLY 4.08. For the systems
studied here (103–105 atoms), DL_FFLUX
is shown to add minimal computational cost to the standard DL_POLY
package. In fact, the optimization of the electrostatics in DL_FFLUX
means that, when high-rank multipole moments are enabled, DL_FFLUX
is up to 1.25× faster than standard DL_POLY. The parallel DL_FFLUX
preserves the quality of the scaling of MPI implementation in standard
DL_POLY. For the first time, it is feasible to use the full capability
of DL_FFLUX to study systems that are large enough to be of real-world
interest. For example, a fully flexible, high-rank polarized (up to
and including quadrupole moments) 1 ns simulation of a system of 10 125
atoms (3375 water molecules) takes 30 h (wall time) on 18 cores.
Collapse
Affiliation(s)
- Benjamin C B Symons
- Manchester Institute of Biotechnology (MIB), 131 Princess Street, Manchester M1 7DN, Great Britain.,Department of Chemistry, University of Manchester, Oxford Road, Manchester M13 9PL, Great Britain
| | - Michael K Bane
- High End Compute LTD, 23 Welby Street, Manchester M13 0EL, Great Britainhttps://highendcompute.co.uk.,Department of Computing and Mathematics, Manchester Metropolitan University, Manchester M15 6BH, Great Britain
| | - Paul L A Popelier
- Manchester Institute of Biotechnology (MIB), 131 Princess Street, Manchester M1 7DN, Great Britain.,Department of Chemistry, University of Manchester, Oxford Road, Manchester M13 9PL, Great Britain
| |
Collapse
|
18
|
Broad J, Preston S, Wheatley RJ, Graham RS. Gaussian process models of potential energy surfaces with boundary optimization. J Chem Phys 2021; 155:144106. [PMID: 34654292 DOI: 10.1063/5.0063534] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
A strategy is outlined to reduce the number of training points required to model intermolecular potentials using Gaussian processes, without reducing accuracy. An asymptotic function is used at a long range, and the crossover distance between this model and the Gaussian process is learnt from the training data. The results are presented for different implementations of this procedure, known as boundary optimization, across the following dimer systems: CO-Ne, HF-Ne, HF-Na+, CO2-Ne, and (CO2)2. The technique reduces the number of training points, at fixed accuracy, by up to ∼49%, compared to our previous work based on a sequential learning technique. The approach is readily transferable to other statistical methods of prediction or modeling problems.
Collapse
Affiliation(s)
- Jack Broad
- School of Mathematical Sciences, University of Nottingham, Nottingham NG7 2RD, United Kingdom
| | - Simon Preston
- School of Mathematical Sciences, University of Nottingham, Nottingham NG7 2RD, United Kingdom
| | - Richard J Wheatley
- School of Chemistry, University of Nottingham, Nottingham NG7 2RD, United Kingdom
| | - Richard S Graham
- School of Mathematical Sciences, University of Nottingham, Nottingham NG7 2RD, United Kingdom
| |
Collapse
|
19
|
Befort BJ, DeFever RS, Tow GM, Dowling AW, Maginn EJ. Machine Learning Directed Optimization of Classical Molecular Modeling Force Fields. J Chem Inf Model 2021; 61:4400-4414. [PMID: 34402301 DOI: 10.1021/acs.jcim.1c00448] [Citation(s) in RCA: 27] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Accurate force fields are necessary for predictive molecular simulations. However, developing force fields that accurately reproduce experimental properties is challenging. Here, we present a machine learning directed, multiobjective optimization workflow for force field parametrization that evaluates millions of prospective force field parameter sets while requiring only a small fraction of them to be tested with molecular simulations. We demonstrate the generality of the approach and identify multiple low-error parameter sets for two distinct test cases: simulations of hydrofluorocarbon (HFC) vapor-liquid equilibrium (VLE) and an ammonium perchlorate (AP) crystal phase. We discuss the challenges and implications of our force field optimization workflow.
Collapse
Affiliation(s)
- Bridgette J Befort
- Department of Chemical and Biomolecular Engineering, University of Notre Dame, Notre Dame, Indiana 46556, United States
| | - Ryan S DeFever
- Department of Chemical and Biomolecular Engineering, University of Notre Dame, Notre Dame, Indiana 46556, United States
| | - Garrett M Tow
- Department of Chemical and Biomolecular Engineering, University of Notre Dame, Notre Dame, Indiana 46556, United States
| | - Alexander W Dowling
- Department of Chemical and Biomolecular Engineering, University of Notre Dame, Notre Dame, Indiana 46556, United States
| | - Edward J Maginn
- Department of Chemical and Biomolecular Engineering, University of Notre Dame, Notre Dame, Indiana 46556, United States
| |
Collapse
|
20
|
Ceriotti M, Clementi C, Anatole von Lilienfeld O. Machine learning meets chemical physics. J Chem Phys 2021; 154:160401. [PMID: 33940847 DOI: 10.1063/5.0051418] [Citation(s) in RCA: 22] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
Over recent years, the use of statistical learning techniques applied to chemical problems has gained substantial momentum. This is particularly apparent in the realm of physical chemistry, where the balance between empiricism and physics-based theory has traditionally been rather in favor of the latter. In this guest Editorial for the special topic issue on "Machine Learning Meets Chemical Physics," a brief rationale is provided, followed by an overview of the topics covered. We conclude by making some general remarks.
Collapse
Affiliation(s)
- Michele Ceriotti
- Laboratory of Computational Science and Modeling, IMX, École Polytechnique Fédérale de Lausanne, 1015 Lausanne, Switzerland
| | - Cecilia Clementi
- Department of Physics, Freie Universität Berlin, Arnimallee 14, 14195 Berlin, Germany
| | | |
Collapse
|