1
|
Eastman P, Pritchard BP, Chodera JD, Markland TE. Nutmeg and SPICE: Models and Data for Biomolecular Machine Learning. J Chem Theory Comput 2024; 20:8583-8593. [PMID: 39318326 DOI: 10.1021/acs.jctc.4c00794] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/26/2024]
Abstract
We describe version 2 of the SPICE data set, a collection of quantum chemistry calculations for training machine learning potentials. It expands on the original data set by adding much more sampling of chemical space and more data on noncovalent interactions. We train a set of potential energy functions called Nutmeg on it. They are based on the TensorNet architecture. They use a novel mechanism to improve performance on charged and polar molecules, injecting precomputed partial charges into the model to provide a reference for the large-scale charge distribution. Evaluation of the new models shows that they do an excellent job of reproducing energy differences between conformations even on highly charged molecules or ones that are significantly larger than the molecules in the training set. They also produce stable molecular dynamics trajectories and are fast enough to be useful for routine simulation of small molecules.
Collapse
Affiliation(s)
- Peter Eastman
- Department of Chemistry, Stanford University, Stanford, California 94305, United States
| | - Benjamin P Pritchard
- Molecular Sciences Software Institute, Virginia Polytechnic Institute and State University, Blacksburg, Virginia 24060, United States
| | - John D Chodera
- Computational and Systems Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, New York 10065, United States
| | - Thomas E Markland
- Department of Chemistry, Stanford University, Stanford, California 94305, United States
| |
Collapse
|
2
|
Plé T, Adjoua O, Lagardère L, Piquemal JP. FeNNol: An efficient and flexible library for building force-field-enhanced neural network potentials. J Chem Phys 2024; 161:042502. [PMID: 39051830 DOI: 10.1063/5.0217688] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2024] [Accepted: 06/28/2024] [Indexed: 07/27/2024] Open
Abstract
Neural network interatomic potentials (NNPs) have recently proven to be powerful tools to accurately model complex molecular systems while bypassing the high numerical cost of ab initio molecular dynamics simulations. In recent years, numerous advances in model architectures as well as the development of hybrid models combining machine-learning (ML) with more traditional, physically motivated, force-field interactions have considerably increased the design space of ML potentials. In this paper, we present FeNNol, a new library for building, training, and running force-field-enhanced neural network potentials. It provides a flexible and modular system for building hybrid models, allowing us to easily combine state-of-the-art embeddings with ML-parameterized physical interaction terms without the need for explicit programming. Furthermore, FeNNol leverages the automatic differentiation and just-in-time compilation features of the Jax Python library to enable fast evaluation of NNPs, shrinking the performance gap between ML potentials and standard force-fields. This is demonstrated with the popular ANI-2x model reaching simulation speeds nearly on par with the AMOEBA polarizable force-field on commodity GPUs (graphics processing units). We hope that FeNNol will facilitate the development and application of new hybrid NNP architectures for a wide range of molecular simulation problems.
Collapse
Affiliation(s)
- Thomas Plé
- Sorbonne Université, LCT, UMR 7616 CNRS, 75005 Paris, France
| | - Olivier Adjoua
- Sorbonne Université, LCT, UMR 7616 CNRS, 75005 Paris, France
| | - Louis Lagardère
- Sorbonne Université, LCT, UMR 7616 CNRS, 75005 Paris, France
| | | |
Collapse
|
3
|
Martire S, Decherchi S, Cavalli A. OBIWAN: An Element-Wise Scalable Feed-Forward Neural Network Potential. J Chem Theory Comput 2024; 20:6287-6302. [PMID: 38978155 DOI: 10.1021/acs.jctc.4c00342] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/10/2024]
Abstract
Estimating the potential energy of a molecular system at a quantum level of theory is a task of paramount importance in computational chemistry. The often employed density functional theory approach allows one to accomplish this task, yet most often at significant computational costs. This prompted the community to develop so-called machine learning potentials to achieve near-quantum accuracy at molecular mechanics computational cost. In this paper, we introduce OBIWAN, a feed-forward neural network that bears some relevant structural properties that also led to the definition of a new kind of general-purpose neural network layer. Its featurization process scales efficiently with newly added atomic species. This allows one to seamlessly add new atom types without requiring to change the topology of the network. Also, this allows one to train on new data sets leveraging a previously trained OBIWAN, hence converging very quickly. This avoids training from scratch and renders the approach more compliant with a green computing perspective.
Collapse
Affiliation(s)
- Stefano Martire
- Department of Pharmacy and Biotechnology, University of Bologna, Via Belmeloro 6, Bologna 40126, Italy
- Computational and Chemical Biology, Fondazione Istituto Italiano di Tecnologia, Via Morego 30, Genoa 16163, Italy
| | - Sergio Decherchi
- Data Science and Computation Facility, Fondazione Istituto Italiano di Tecnologia, Via Morego 30, Genoa 16163, Italy
| | - Andrea Cavalli
- Computational and Chemical Biology, Fondazione Istituto Italiano di Tecnologia, Via Morego 30, Genoa 16163, Italy
- Centre Européen de Calcul Atomique et Moléculaire, Ecole Polytechnique Fédérale de Lausanne, Avenue de Forel 3, Lausanne 1015, Switzerland
| |
Collapse
|
4
|
Chen G, Jaffrelot Inizan T, Plé T, Lagardère L, Piquemal JP, Maday Y. Advancing Force Fields Parameterization: A Directed Graph Attention Networks Approach. J Chem Theory Comput 2024; 20:5558-5569. [PMID: 38875012 DOI: 10.1021/acs.jctc.3c01421] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/16/2024]
Abstract
Force fields (FFs) are an established tool for simulating large and complex molecular systems. However, parametrizing FFs is a challenging and time-consuming task that relies on empirical heuristics, experimental data, and computational data. Recent efforts aim to automate the assignment of FF parameters using pre-existing databases and on-the-fly ab initio data. In this study, we propose a graph-based force field (GB-FFs) model to directly derive parameters for the Generalized Amber Force Field (GAFF) from chemical environments and research into the influence of functional forms. Our end-to-end parametrization approach predicts parameters by aggregating the basic information in directed molecular graphs, eliminating the need for expert-defined procedures and enhances the accuracy and transferability of GAFF across a broader range of molecular complexes. Simulation results are compared to the original GAFF parametrization. In practice, our results demonstrate an improved transferability of the model, showcasing its improved accuracy in modeling intermolecular and torsional interactions, as well as improved solvation free energies. The optimization approach developed in this work is fully applicable to other nonpolarizable FFs as well as to polarizable ones.
Collapse
Affiliation(s)
- Gong Chen
- Sorbonne Université, CNRS, Université Paris Cité, Laboratoire Jacques-Louis Lions (LJLL), UMR 7598 CNRS, 75005 Paris, France
| | - Théo Jaffrelot Inizan
- Sorbonne Université, Laboratoire de Chimie Théorique (LCT), UMR 7616 CNRS, 75005 Paris, France
| | - Thomas Plé
- Sorbonne Université, Laboratoire de Chimie Théorique (LCT), UMR 7616 CNRS, 75005 Paris, France
| | - Louis Lagardère
- Sorbonne Université, Laboratoire de Chimie Théorique (LCT), UMR 7616 CNRS, 75005 Paris, France
| | - Jean-Philip Piquemal
- Sorbonne Université, Laboratoire de Chimie Théorique (LCT), UMR 7616 CNRS, 75005 Paris, France
| | - Yvon Maday
- Sorbonne Université, CNRS, Université Paris Cité, Laboratoire Jacques-Louis Lions (LJLL), UMR 7598 CNRS, 75005 Paris, France
| |
Collapse
|
5
|
Wang Y, Inizan TJ, Liu C, Piquemal JP, Ren P. Incorporating Neural Networks into the AMOEBA Polarizable Force Field. J Phys Chem B 2024; 128:2381-2388. [PMID: 38445577 PMCID: PMC10985787 DOI: 10.1021/acs.jpcb.3c08166] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/07/2024]
Abstract
Neural network potentials (NNPs) offer significant promise to bridge the gap between the accuracy of quantum mechanics and the efficiency of molecular mechanics in molecular simulation. Most NNPs rely on the locality assumption that ensures the model's transferability and scalability and thus lack the treatment of long-range interactions, which are essential for molecular systems in the condensed phase. Here we present an integrated hybrid model, AMOEBA+NN, which combines the AMOEBA potential for the short- and long-range noncovalent atomic interactions and an NNP to capture the remaining local covalent contributions. The AMOEBA+NN model was trained on the conformational energy of the ANI-1x data set and tested on several external data sets ranging from small molecules to tetrapeptides. The hybrid model demonstrated substantial improvements over the baseline models in term of accuracy as the molecule size increased, suggesting its potential as a next-generation approach for chemically accurate molecular simulations.
Collapse
Affiliation(s)
- Yanxing Wang
- Department of Biomedical Engineering, The University of Texas at Austin, Austin, Texas 78712, United States
| | - Théo Jaffrelot Inizan
- Sorbonne Université, Laboratoire de Chimie Théorique, UMR 7616 CNRS, Paris 75005, France
| | - Chengwen Liu
- Department of Biomedical Engineering, The University of Texas at Austin, Austin, Texas 78712, United States
| | - Jean-Philip Piquemal
- Sorbonne Université, Laboratoire de Chimie Théorique, UMR 7616 CNRS, Paris 75005, France
| | - Pengyu Ren
- Department of Biomedical Engineering, The University of Texas at Austin, Austin, Texas 78712, United States
| |
Collapse
|
6
|
Eastman P, Galvelis R, Peláez RP, Abreu CRA, Farr SE, Gallicchio E, Gorenko A, Henry MM, Hu F, Huang J, Krämer A, Michel J, Mitchell JA, Pande VS, Rodrigues JPGLM, Rodriguez-Guerra J, Simmonett AC, Singh S, Swails J, Turner P, Wang Y, Zhang I, Chodera JD, De Fabritiis G, Markland TE. OpenMM 8: Molecular Dynamics Simulation with Machine Learning Potentials. J Phys Chem B 2024; 128:109-116. [PMID: 38154096 PMCID: PMC10846090 DOI: 10.1021/acs.jpcb.3c06662] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2023]
Abstract
Machine learning plays an important and growing role in molecular simulation. The newest version of the OpenMM molecular dynamics toolkit introduces new features to support the use of machine learning potentials. Arbitrary PyTorch models can be added to a simulation and used to compute forces and energy. A higher-level interface allows users to easily model their molecules of interest with general purpose, pretrained potential functions. A collection of optimized CUDA kernels and custom PyTorch operations greatly improves the speed of simulations. We demonstrate these features in simulations of cyclin-dependent kinase 8 (CDK8) and the green fluorescent protein chromophore in water. Taken together, these features make it practical to use machine learning to improve the accuracy of simulations with only a modest increase in cost.
Collapse
Affiliation(s)
- Peter Eastman
- Department of Chemistry, Stanford University, Stanford, CA 94305, USA
| | - Raimondas Galvelis
- Acellera Labs, C Dr Trueta 183, 08005, Barcelona, Spain
- Computational Science Laboratory, Universitat Pompeu Fabra, Barcelona Biomedical Research Park (PRBB), C Dr. Aiguader 88, 08003, Barcelona, Spain
| | - Raúl P. Peláez
- Computational Science Laboratory, Universitat Pompeu Fabra, Barcelona Biomedical Research Park (PRBB), C Dr. Aiguader 88, 08003, Barcelona, Spain
| | - Charlles R. A. Abreu
- Chemical Engineering Department, School of Chemistry, Federal University of Rio de Janeiro, Rio de Janeiro 68542, Brazil
- Redesign Science Inc., 180 Varick St., New York, NY 10014, USA
| | - Stephen E. Farr
- EaStCHEM School of Chemistry, University of Edinburgh, EH9 3FJ, United Kingdom
| | - Emilio Gallicchio
- Department of Chemistry and Biochemistry, Brooklyn College of the City University of New York, NY, USA
- Ph.D. Program in Chemistry and Ph.D. Program in Biochemistry, The Graduate Center of the City University of New York, New York, NY, USA
| | - Anton Gorenko
- Stream HPC, Koningin Wilhelminaplein 1 - 40601, 1062 HG Amsterdam, Netherlands
| | - Michael M. Henry
- Computational and Systems Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York NY 10065, USA
| | - Frank Hu
- Department of Chemistry, Stanford University, Stanford, CA 94305, USA
| | - Jing Huang
- Key Laboratory of Structural Biology of Zhejiang Province, School of Life Sciences, Westlake University, 18 Shilongshan Road, Hangzhou 310024, Zhejiang, China
| | - Andreas Krämer
- Department of Mathematics and Computer Science, Freie Universität Berlin, Arnimallee 12, 14195 Berlin, Germany
| | - Julien Michel
- EaStCHEM School of Chemistry, University of Edinburgh, EH9 3FJ, United Kingdom
| | - Joshua A. Mitchell
- The Open Force Field Initiative, Open Molecular Software Foundation, Davis, CA 95616, USA
| | - Vijay S. Pande
- Andreessen Horowitz, 2865 Sand Hill Rd, Menlo Park, CA 94025, USA
- Department of Structural Biology, Stanford University, Stanford, CA 94305, USA
| | - João PGLM Rodrigues
- Department of Structural Biology, Stanford University, Stanford, CA 94305, USA
| | - Jaime Rodriguez-Guerra
- Charité Universitätsmedizin Berlin In silico Toxicology and Structural Bioinformatics, Virchowweg 6, 10117 Berlin, Germany
| | - Andrew C. Simmonett
- Laboratory of Computational Biology, National Heart, Lung and Blood Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | - Sukrit Singh
- Computational and Systems Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York NY 10065, USA
| | - Jason Swails
- Entos Inc., 9310 Athena Circle, La Jolla, CA 92037, USA
| | - Philip Turner
- College of Engineering, Virginia Polytechnic Institute and State University, Blacksburg, VA 24061, USA
| | - Yuanqing Wang
- Simons Center for Computational Physical Chemistry and Center for Data Science, New York University, 24 Waverly Place, New York, NY 10004, USA
| | - Ivy Zhang
- Computational and Systems Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York NY 10065, USA
- Tri-Institutional PhD Program in Computational Biology and Medicine, Weill Cornell Medical College, Cornell University, New York, NY 10065, USA
| | - John D. Chodera
- Computational and Systems Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York NY 10065, USA
| | - Gianni De Fabritiis
- Acellera Labs, C Dr Trueta 183, 08005, Barcelona, Spain
- Computational Science Laboratory, Universitat Pompeu Fabra, Barcelona Biomedical Research Park (PRBB), C Dr. Aiguader 88, 08003, Barcelona, Spain
- ICREA, Passeig Lluis Companys 23, 08010, Barcelona, Spain
| | | |
Collapse
|
7
|
Coste A, Slejko E, Zavadlav J, Praprotnik M. Developing an Implicit Solvation Machine Learning Model for Molecular Simulations of Ionic Media. J Chem Theory Comput 2024; 20:411-420. [PMID: 38118122 PMCID: PMC10782447 DOI: 10.1021/acs.jctc.3c00984] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2023] [Revised: 12/04/2023] [Accepted: 12/04/2023] [Indexed: 12/22/2023]
Abstract
Molecular dynamics (MD) simulations of biophysical systems require accurate modeling of their native environment, i.e., aqueous ionic solution, as it critically impacts the structure and function of biomolecules. On the other hand, the models should be computationally efficient to enable simulations of large spatiotemporal scales. Here, we present the deep implicit solvation model for sodium chloride solutions that satisfies both requirements. Owing to the use of the neural network potential, the model can capture the many-body potential of mean force, while the implicit water treatment renders the model inexpensive. We demonstrate our approach first for pure ionic solutions with concentrations ranging from physiological to 2 M. We then extend the model to capture the effective ion interactions in the vicinity and far away from a DNA molecule. In both cases, the structural properties are in good agreement with all-atom MD, showcasing a general methodology for the efficient and accurate modeling of ionic media.
Collapse
Affiliation(s)
- Amaury Coste
- Laboratory
for Molecular Modeling, National Institute of Chemistry, Ljubljana SI-1001, Slovenia
| | - Ema Slejko
- Laboratory
for Molecular Modeling, National Institute of Chemistry, Ljubljana SI-1001, Slovenia
- Department
of Physics, Faculty of Mathematics and Physics, University of Ljubljana, Ljubljana SI-1000, Slovenia
| | - Julija Zavadlav
- Professorship
of Multiscale Modeling of Fluid Materials, TUM School of Engineering
and Design, Technical University of Munich, Garching Near Munich DE-85748, Germany
| | - Matej Praprotnik
- Laboratory
for Molecular Modeling, National Institute of Chemistry, Ljubljana SI-1001, Slovenia
- Department
of Physics, Faculty of Mathematics and Physics, University of Ljubljana, Ljubljana SI-1000, Slovenia
| |
Collapse
|