1
|
Bruciaferri N, Eberhardt J, Llanos MA, Loeffler JR, Holcomb M, Fernandez-Quintero ML, Santos-Martins D, Ward AB, Forli S. CosolvKit: a Versatile Tool for Cosolvent MD Preparation and Analysis. J Chem Inf Model 2024; 64:8227-8235. [PMID: 39436011 DOI: 10.1021/acs.jcim.4c01398] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2024]
Abstract
Cosolvent molecular dynamics (MDs) are an increasingly popular form of simulations where small molecule cosolvents are added to water-solvated protein systems. These simulations can perform diverse target characterization tasks, including cryptic and allosteric pocket identification and pharmacophore profiling and supplement suites of enhanced sampling methods to explore protein conformational landscapes. The behavior of these systems is tied to the cosolvents used, so the ability to define diverse and complex mixtures is critical in dictating the outcome of the simulations. However, existing methods for preparing cosolvent simulations only support a limited number of predefined cosolvents and concentrations. Here, we present CosolvKit, a tool for the preparation and analysis of systems composed of user-defined cosolvents and concentrations. This tool is modular, supporting the creation of files for multiple MD engines, as well as direct access to OpenMM simulations, and offering access to a variety of generalizable small-molecule force fields. To the best of our knowledge, CosolvKit represents the first generalized approach for the construction of these simulations.
Collapse
Affiliation(s)
- Niccolo' Bruciaferri
- Department of Integrative Structural and Computational Biology, Scripps Research, La Jolla, California 92037, United States
| | - Jerome Eberhardt
- Department of Integrative Structural and Computational Biology, Scripps Research, La Jolla, California 92037, United States
- Biozentrum, University of Basel, Spitalstrasse 41, Basel 4056, Switzerland
| | - Manuel A Llanos
- Department of Integrative Structural and Computational Biology, Scripps Research, La Jolla, California 92037, United States
| | - Johannes R Loeffler
- Department of Integrative Structural and Computational Biology, Scripps Research, La Jolla, California 92037, United States
| | - Matthew Holcomb
- Department of Integrative Structural and Computational Biology, Scripps Research, La Jolla, California 92037, United States
| | - Monica L Fernandez-Quintero
- Department of Integrative Structural and Computational Biology, Scripps Research, La Jolla, California 92037, United States
| | - Diogo Santos-Martins
- Department of Integrative Structural and Computational Biology, Scripps Research, La Jolla, California 92037, United States
| | - Andrew B Ward
- Department of Integrative Structural and Computational Biology, Scripps Research, La Jolla, California 92037, United States
| | - Stefano Forli
- Department of Integrative Structural and Computational Biology, Scripps Research, La Jolla, California 92037, United States
| |
Collapse
|
2
|
Molani F, Cho AE. Accurate protein-ligand binding free energy estimation using QM/MM on multi-conformers predicted from classical mining minima. Commun Chem 2024; 7:247. [PMID: 39468282 PMCID: PMC11519471 DOI: 10.1038/s42004-024-01328-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2024] [Accepted: 10/14/2024] [Indexed: 10/30/2024] Open
Abstract
Accurate prediction of binding free energy is crucial for the rational design of drug candidates and understanding protein-ligand interactions. To address this, we have developed four protocols that combine QM/MM calculations and the mining minima (M2) method, tested on 9 targets and 203 ligands. Our protocols carry out free energy processing with or without conformational search on the selected conformers obtained from M2 calculations, where their force field atomic charge parameters are substituted with those obtained from a QM/MM calculation. The method achieved a high Pearson's correlation coefficient (0.81) with experimental binding free energies across diverse targets, demonstrating its generality. Using a differential evolution algorithm with a universal scaling factor of 0.2, we achieved a low mean absolute error of 0.60 kcal mol-1. This performance surpasses many existing methods and is comparable to popular relative binding free energy techniques but at significantly lower computational cost.
Collapse
Affiliation(s)
- Farzad Molani
- Department of Bioinformatics, Korea University, Sejong, Korea
| | - Art E Cho
- Department of Bioinformatics, Korea University, Sejong, Korea.
- inCerebro Co. Ltd., Gangnam-gu, Seoul, Korea.
| |
Collapse
|
3
|
Takaba K, Friedman AJ, Cavender CE, Behara PK, Pulido I, Henry MM, MacDermott-Opeskin H, Iacovella CR, Nagle AM, Payne AM, Shirts MR, Mobley DL, Chodera JD, Wang Y. Machine-learned molecular mechanics force fields from large-scale quantum chemical data. Chem Sci 2024; 15:12861-12878. [PMID: 39148808 PMCID: PMC11322960 DOI: 10.1039/d4sc00690a] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2024] [Accepted: 06/17/2024] [Indexed: 08/17/2024] Open
Abstract
The development of reliable and extensible molecular mechanics (MM) force fields-fast, empirical models characterizing the potential energy surface of molecular systems-is indispensable for biomolecular simulation and computer-aided drug design. Here, we introduce a generalized and extensible machine-learned MM force field, espaloma-0.3, and an end-to-end differentiable framework using graph neural networks to overcome the limitations of traditional rule-based methods. Trained in a single GPU-day to fit a large and diverse quantum chemical dataset of over 1.1 M energy and force calculations, espaloma-0.3 reproduces quantum chemical energetic properties of chemical domains highly relevant to drug discovery, including small molecules, peptides, and nucleic acids. Moreover, this force field maintains the quantum chemical energy-minimized geometries of small molecules and preserves the condensed phase properties of peptides and folded proteins, self-consistently parametrizing proteins and ligands to produce stable simulations leading to highly accurate predictions of binding free energies. This methodology demonstrates significant promise as a path forward for systematically building more accurate force fields that are easily extensible to new chemical domains of interest.
Collapse
Affiliation(s)
- Kenichiro Takaba
- Computational and Systems Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center New York NY 10065 USA
- Pharmaceuticals Research Center, Advanced Drug Discovery, Asahi Kasei Pharma Corporation Shizuoka 410-2321 Japan
| | - Anika J Friedman
- Department of Chemical and Biological Engineering, University of Colorado Boulder Boulder CO 80309 USA
| | - Chapin E Cavender
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego 9500 Gilman Drive La Jolla CA 92093 USA
| | - Pavan Kumar Behara
- Center for Neurotherapeutics, Department of Pathology and Laboratory Medicine, University of California Irvine CA 92697 USA
| | - Iván Pulido
- Computational and Systems Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center New York NY 10065 USA
| | - Michael M Henry
- Computational and Systems Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center New York NY 10065 USA
| | | | - Christopher R Iacovella
- Computational and Systems Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center New York NY 10065 USA
| | - Arnav M Nagle
- Computational and Systems Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center New York NY 10065 USA
- Department of Bioengineering, University of California, Berkeley Berkeley CA 94720 USA
| | - Alexander Matthew Payne
- Computational and Systems Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center New York NY 10065 USA
- Tri-Institutional PhD Program in Chemical Biology, Memorial Sloan Kettering Cancer Center New York 10065 USA
| | - Michael R Shirts
- Department of Chemical and Biological Engineering, University of Colorado Boulder Boulder CO 80309 USA
| | - David L Mobley
- Department of Pharmaceutical Sciences, University of California Irvine California 92697 USA
| | - John D Chodera
- Computational and Systems Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center New York NY 10065 USA
| | - Yuanqing Wang
- Simons Center for Computational Physical Chemistry and Center for Data Science, New York University New York NY 10004 USA
- Computational and Systems Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center New York NY 10065 USA
| |
Collapse
|
4
|
Plé T, Adjoua O, Lagardère L, Piquemal JP. FeNNol: An efficient and flexible library for building force-field-enhanced neural network potentials. J Chem Phys 2024; 161:042502. [PMID: 39051830 DOI: 10.1063/5.0217688] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2024] [Accepted: 06/28/2024] [Indexed: 07/27/2024] Open
Abstract
Neural network interatomic potentials (NNPs) have recently proven to be powerful tools to accurately model complex molecular systems while bypassing the high numerical cost of ab initio molecular dynamics simulations. In recent years, numerous advances in model architectures as well as the development of hybrid models combining machine-learning (ML) with more traditional, physically motivated, force-field interactions have considerably increased the design space of ML potentials. In this paper, we present FeNNol, a new library for building, training, and running force-field-enhanced neural network potentials. It provides a flexible and modular system for building hybrid models, allowing us to easily combine state-of-the-art embeddings with ML-parameterized physical interaction terms without the need for explicit programming. Furthermore, FeNNol leverages the automatic differentiation and just-in-time compilation features of the Jax Python library to enable fast evaluation of NNPs, shrinking the performance gap between ML potentials and standard force-fields. This is demonstrated with the popular ANI-2x model reaching simulation speeds nearly on par with the AMOEBA polarizable force-field on commodity GPUs (graphics processing units). We hope that FeNNol will facilitate the development and application of new hybrid NNP architectures for a wide range of molecular simulation problems.
Collapse
Affiliation(s)
- Thomas Plé
- Sorbonne Université, LCT, UMR 7616 CNRS, 75005 Paris, France
| | - Olivier Adjoua
- Sorbonne Université, LCT, UMR 7616 CNRS, 75005 Paris, France
| | - Louis Lagardère
- Sorbonne Université, LCT, UMR 7616 CNRS, 75005 Paris, France
| | | |
Collapse
|
5
|
Wang L, Behara PK, Thompson MW, Gokey T, Wang Y, Wagner JR, Cole DJ, Gilson MK, Shirts MR, Mobley DL. The Open Force Field Initiative: Open Software and Open Science for Molecular Modeling. J Phys Chem B 2024; 128:7043-7067. [PMID: 38989715 DOI: 10.1021/acs.jpcb.4c01558] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/12/2024]
Abstract
Force fields are a key component of physics-based molecular modeling, describing the energies and forces in a molecular system as a function of the positions of the atoms and molecules involved. Here, we provide a review and scientific status report on the work of the Open Force Field (OpenFF) Initiative, which focuses on the science, infrastructure and data required to build the next generation of biomolecular force fields. We introduce the OpenFF Initiative and the related OpenFF Consortium, describe its approach to force field development and software, and discuss accomplishments to date as well as future plans. OpenFF releases both software and data under open and permissive licensing agreements to enable rapid application, validation, extension, and modification of its force fields and software tools. We discuss lessons learned to date in this new approach to force field development. We also highlight ways that other force field researchers can get involved, as well as some recent successes of outside researchers taking advantage of OpenFF tools and data.
Collapse
Affiliation(s)
- Lily Wang
- Open Force Field, Open Molecular Software Foundation, Davis, California 95616, United States
| | - Pavan Kumar Behara
- Center for Neurotherapeutics, University of California, Irvine, California 92697, United States
| | - Matthew W Thompson
- Open Force Field, Open Molecular Software Foundation, Davis, California 95616, United States
| | - Trevor Gokey
- Department of Chemistry, University of California, Irvine, California 92697, United States
| | - Yuanqing Wang
- Simons Center for Computational Physical Chemistry and Center for Data Science, New York, New York 10004, United States
| | - Jeffrey R Wagner
- Open Force Field, Open Molecular Software Foundation, Davis, California 95616, United States
| | - Daniel J Cole
- School of Natural and Environmental Sciences, Newcastle University, Newcastle upon Tyne NE1 7RU, United Kingdom
| | - Michael K Gilson
- Skaggs School of Pharmacy and Pharmaceutical Sciences, The University of California at San Diego, La Jolla, California 92093, United States
| | - Michael R Shirts
- Department of Chemical and Biological Engineering, University of Colorado Boulder, Boulder, Colorado 80305, United States
| | - David L Mobley
- Department of Chemistry, University of California, Irvine, California 92697, United States
- Department of Pharmaceutical Sciences, University of California, Irvine, California 92697, United States
| |
Collapse
|
6
|
Carrer M, Cezar HM, Bore SL, Ledum M, Cascella M. Learning Force Field Parameters from Differentiable Particle-Field Molecular Dynamics. J Chem Inf Model 2024; 64:5510-5520. [PMID: 38963184 PMCID: PMC11267579 DOI: 10.1021/acs.jcim.4c00564] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2024] [Revised: 05/15/2024] [Accepted: 06/12/2024] [Indexed: 07/05/2024]
Abstract
We develop ∂-HylleraasMD (∂-HyMD), a fully end-to-end differentiable molecular dynamics software based on the Hamiltonian hybrid particle-field formalism, and use it to establish a protocol for automated optimization of force field parameters. ∂-HyMD is templated on the recently released HylleraaasMD software, while using the JAX autodiff framework as the main engine for the differentiable dynamics. ∂-HyMD exploits an embarrassingly parallel optimization algorithm by spawning independent simulations, whose trajectories are simultaneously processed by reverse mode automatic differentiation to calculate the gradient of the loss function, which is in turn used for iterative optimization of the force-field parameters. We show that parallel organization facilitates the convergence of the minimization procedure, avoiding the known memory and numerical stability issues of differentiable molecular dynamics approaches. We showcase the effectiveness of our implementation by producing a library of force field parameters for standard phospholipids, with either zwitterionic or anionic heads and with saturated or unsaturated tails. Compared to the all-atom reference, the force field obtained by ∂-HyMD yields better density profiles than the parameters derived from previously utilized gradient-free optimization procedures. Moreover, ∂-HyMD models can predict with good accuracy properties not included in the learning objective, such as lateral pressure profiles, and are transferable to other systems, including triglycerides.
Collapse
Affiliation(s)
- Manuel Carrer
- Hylleraas Centre for Quantum Molecular
Sciences and Department of Chemistry, University
of Oslo, PO Box 1033, Blindern, 0315 Oslo, Norway
| | - Henrique Musseli Cezar
- Hylleraas Centre for Quantum Molecular
Sciences and Department of Chemistry, University
of Oslo, PO Box 1033, Blindern, 0315 Oslo, Norway
| | - Sigbjørn Løland Bore
- Hylleraas Centre for Quantum Molecular
Sciences and Department of Chemistry, University
of Oslo, PO Box 1033, Blindern, 0315 Oslo, Norway
| | - Morten Ledum
- Hylleraas Centre for Quantum Molecular
Sciences and Department of Chemistry, University
of Oslo, PO Box 1033, Blindern, 0315 Oslo, Norway
| | - Michele Cascella
- Hylleraas Centre for Quantum Molecular
Sciences and Department of Chemistry, University
of Oslo, PO Box 1033, Blindern, 0315 Oslo, Norway
| |
Collapse
|
7
|
Chen G, Jaffrelot Inizan T, Plé T, Lagardère L, Piquemal JP, Maday Y. Advancing Force Fields Parameterization: A Directed Graph Attention Networks Approach. J Chem Theory Comput 2024; 20:5558-5569. [PMID: 38875012 DOI: 10.1021/acs.jctc.3c01421] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/16/2024]
Abstract
Force fields (FFs) are an established tool for simulating large and complex molecular systems. However, parametrizing FFs is a challenging and time-consuming task that relies on empirical heuristics, experimental data, and computational data. Recent efforts aim to automate the assignment of FF parameters using pre-existing databases and on-the-fly ab initio data. In this study, we propose a graph-based force field (GB-FFs) model to directly derive parameters for the Generalized Amber Force Field (GAFF) from chemical environments and research into the influence of functional forms. Our end-to-end parametrization approach predicts parameters by aggregating the basic information in directed molecular graphs, eliminating the need for expert-defined procedures and enhances the accuracy and transferability of GAFF across a broader range of molecular complexes. Simulation results are compared to the original GAFF parametrization. In practice, our results demonstrate an improved transferability of the model, showcasing its improved accuracy in modeling intermolecular and torsional interactions, as well as improved solvation free energies. The optimization approach developed in this work is fully applicable to other nonpolarizable FFs as well as to polarizable ones.
Collapse
Affiliation(s)
- Gong Chen
- Sorbonne Université, CNRS, Université Paris Cité, Laboratoire Jacques-Louis Lions (LJLL), UMR 7598 CNRS, 75005 Paris, France
| | - Théo Jaffrelot Inizan
- Sorbonne Université, Laboratoire de Chimie Théorique (LCT), UMR 7616 CNRS, 75005 Paris, France
| | - Thomas Plé
- Sorbonne Université, Laboratoire de Chimie Théorique (LCT), UMR 7616 CNRS, 75005 Paris, France
| | - Louis Lagardère
- Sorbonne Université, Laboratoire de Chimie Théorique (LCT), UMR 7616 CNRS, 75005 Paris, France
| | - Jean-Philip Piquemal
- Sorbonne Université, Laboratoire de Chimie Théorique (LCT), UMR 7616 CNRS, 75005 Paris, France
| | - Yvon Maday
- Sorbonne Université, CNRS, Université Paris Cité, Laboratoire Jacques-Louis Lions (LJLL), UMR 7598 CNRS, 75005 Paris, France
| |
Collapse
|
8
|
Wehrhan L, Keller BG. Fluorinated Protein-Ligand Complexes: A Computational Perspective. J Phys Chem B 2024; 128:5925-5934. [PMID: 38886167 PMCID: PMC11215785 DOI: 10.1021/acs.jpcb.4c01493] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2024] [Revised: 05/28/2024] [Accepted: 05/30/2024] [Indexed: 06/20/2024]
Abstract
Fluorine is an element renowned for its unique properties. Its powerful capability to modulate molecular properties makes it an attractive substituent for protein binding ligands; however, the rational design of fluorination can be challenging with effects on interactions and binding energies being difficult to predict. In this Perspective, we highlight how computational methods help us to understand the role of fluorine in protein-ligand binding with a focus on molecular simulation. We underline the importance of an accurate force field, present fluoride channels as a showcase for biomolecular interactions with fluorine, and discuss fluorine specific interactions like the ability to form hydrogen bonds and interactions with aryl groups. We put special emphasis on the disruption of water networks and entropic effects.
Collapse
Affiliation(s)
- Leon Wehrhan
- Department of Chemistry,
Biology and Pharmacy, Freie Universität
Berlin, Arnimallee 22, 14195 Berlin, Germany
| | - Bettina G. Keller
- Department of Chemistry,
Biology and Pharmacy, Freie Universität
Berlin, Arnimallee 22, 14195 Berlin, Germany
| |
Collapse
|
9
|
Schäfer JL, Keller BG. Implementation of Girsanov Reweighting in OpenMM and Deeptime. J Phys Chem B 2024; 128:6014-6027. [PMID: 38865491 PMCID: PMC11215775 DOI: 10.1021/acs.jpcb.4c01702] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2024] [Revised: 05/22/2024] [Accepted: 05/22/2024] [Indexed: 06/14/2024]
Abstract
Classical molecular dynamics (MD) simulations provide invaluable insights into complex molecular systems but face limitations in capturing phenomena occurring on time scales beyond their reach. To bridge this gap, various enhanced sampling techniques have been developed, which are complemented by reweighting techniques to recover the unbiased dynamics. Girsanov reweighting is a reweighting technique that reweights simulation paths, generated by a stochastic MD integrator, without evoking an effective model of the dynamics. Instead, it calculates the relative path probability density at the time resolution of the MD integrator. Efficient implementation of Girsanov reweighting requires that the reweighting factors are calculated on-the-fly during the simulations and thus needs to be implemented within the MD integrator. Here, we present a comprehensive guide for implementing Girsanov reweighting into MD simulations. We demonstrate the implementation in the MD simulation package OpenMM by extending the library openmmtools. Additionally, we implemented a reweighted Markov state model estimator within the time series analysis package Deeptime.
Collapse
Affiliation(s)
- Joana-Lysiane Schäfer
- Department of Biology, Chemistry, and
Pharmacy, Freie Universität Berlin, Berlin 14195, Germany
| | - Bettina G. Keller
- Department of Biology, Chemistry, and
Pharmacy, Freie Universität Berlin, Berlin 14195, Germany
| |
Collapse
|
10
|
Wang Y, Pulido I, Takaba K, Kaminow B, Scheen J, Wang L, Chodera JD. EspalomaCharge: Machine Learning-Enabled Ultrafast Partial Charge Assignment. J Phys Chem A 2024; 128:4160-4167. [PMID: 38717302 PMCID: PMC11129294 DOI: 10.1021/acs.jpca.4c01287] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2024] [Revised: 04/17/2024] [Accepted: 04/17/2024] [Indexed: 05/24/2024]
Abstract
Atomic partial charges are crucial parameters in molecular dynamics simulation, dictating the electrostatic contributions to intermolecular energies and thereby the potential energy landscape. Traditionally, the assignment of partial charges has relied on surrogates of ab initio semiempirical quantum chemical methods such as AM1-BCC and is expensive for large systems or large numbers of molecules. We propose a hybrid physical/graph neural network-based approximation to the widely popular AM1-BCC charge model that is orders of magnitude faster while maintaining accuracy comparable to differences in AM1-BCC implementations. Our hybrid approach couples a graph neural network to a streamlined charge equilibration approach in order to predict molecule-specific atomic electronegativity and hardness parameters, followed by analytical determination of optimal charge-equilibrated parameters that preserve total molecular charge. This hybrid approach scales linearly with the number of atoms, enabling for the first time the use of fully consistent charge models for small molecules and biopolymers for the construction of next-generation self-consistent biomolecular force fields. Implemented in the free and open source package EspalomaCharge, this approach provides drop-in replacements for both AmberTools antechamber and the Open Force Field Toolkit charging workflows, in addition to stand-alone charge generation interfaces. Source code is available at https://github.com/choderalab/espaloma-charge.
Collapse
Affiliation(s)
- Yuanqing Wang
- Computational
and Systems Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, New York 10065, United States
- Simons
Center for Computational Chemistry and Center for Data Science, New York University, New York, New York 10004, United States
| | - Iván Pulido
- Computational
and Systems Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, New York 10065, United States
| | - Kenichiro Takaba
- Computational
and Systems Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, New York 10065, United States
- Pharmaceutical
Research Center, Advanced Drug Discovery, Asahi Kasei Pharma Corporation, Shizuoka 410-2321, Japan
| | - Benjamin Kaminow
- Computational
and Systems Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, New York 10065, United States
- Tri-Institutional
PhD Program in Computational Biology and Medicine, Weill Cornell Medical
College, Cornell University, New York, New York 10065, United States
| | - Jenke Scheen
- Computational
and Systems Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, New York 10065, United States
| | - Lily Wang
- Computational
and Systems Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, New York 10065, United States
- Open Molecular Sciences Foundation, Davis, California 95618, United States
| | - John D. Chodera
- Computational
and Systems Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, New York 10065, United States
| |
Collapse
|
11
|
Chen M, Jiang X, Zhang L, Chen X, Wen Y, Gu Z, Li X, Zheng M. The emergence of machine learning force fields in drug design. Med Res Rev 2024; 44:1147-1182. [PMID: 38173298 DOI: 10.1002/med.22008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2023] [Revised: 11/29/2023] [Accepted: 12/05/2023] [Indexed: 01/05/2024]
Abstract
In the field of molecular simulation for drug design, traditional molecular mechanic force fields and quantum chemical theories have been instrumental but limited in terms of scalability and computational efficiency. To overcome these limitations, machine learning force fields (MLFFs) have emerged as a powerful tool capable of balancing accuracy with efficiency. MLFFs rely on the relationship between molecular structures and potential energy, bypassing the need for a preconceived notion of interaction representations. Their accuracy depends on the machine learning models used, and the quality and volume of training data sets. With recent advances in equivariant neural networks and high-quality datasets, MLFFs have significantly improved their performance. This review explores MLFFs, emphasizing their potential in drug design. It elucidates MLFF principles, provides development and validation guidelines, and highlights successful MLFF implementations. It also addresses potential challenges in developing and applying MLFFs. The review concludes by illuminating the path ahead for MLFFs, outlining the challenges to be overcome and the opportunities to be harnessed. This inspires researchers to embrace MLFFs in their investigations as a new tool to perform molecular simulations in drug design.
Collapse
Affiliation(s)
- Mingan Chen
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
- School of Physical Science and Technology, ShanghaiTech University, Shanghai, China
- Lingang Laboratory, Shanghai, China
| | - Xinyu Jiang
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
- School of Pharmacy, University of Chinese Academy of Sciences, Beijing, China
| | - Lehan Zhang
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
- School of Pharmacy, University of Chinese Academy of Sciences, Beijing, China
| | - Xiaoxu Chen
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
- School of Pharmacy, University of Chinese Academy of Sciences, Beijing, China
- School of Pharmaceutical Science and Technology, Hangzhou Institute for Advanced Study, UCAS, Hangzhou, China
| | - Yiming Wen
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
- School of Pharmacy, University of Chinese Academy of Sciences, Beijing, China
- School of Pharmaceutical Science and Technology, Hangzhou Institute for Advanced Study, UCAS, Hangzhou, China
| | - Zhiyong Gu
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
- School of Pharmacy, University of Chinese Academy of Sciences, Beijing, China
- School of Pharmaceutical Science and Technology, Hangzhou Institute for Advanced Study, UCAS, Hangzhou, China
| | - Xutong Li
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
- School of Pharmacy, University of Chinese Academy of Sciences, Beijing, China
| | - Mingyue Zheng
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
- School of Pharmacy, University of Chinese Academy of Sciences, Beijing, China
- School of Pharmaceutical Science and Technology, Hangzhou Institute for Advanced Study, UCAS, Hangzhou, China
| |
Collapse
|
12
|
Greener JG. Differentiable simulation to develop molecular dynamics force fields for disordered proteins. Chem Sci 2024; 15:4897-4909. [PMID: 38550690 PMCID: PMC10966991 DOI: 10.1039/d3sc05230c] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2023] [Accepted: 02/08/2024] [Indexed: 11/11/2024] Open
Abstract
Implicit solvent force fields are computationally efficient but can be unsuitable for running molecular dynamics on disordered proteins. Here I improve the a99SB-disp force field and the GBNeck2 implicit solvent model to better describe disordered proteins. Differentiable molecular simulations with 5 ns trajectories are used to jointly optimise 108 parameters to better match explicit solvent trajectories. Simulations with the improved force field better reproduce the radius of gyration and secondary structure content seen in experiments, whilst showing slightly degraded performance on folded proteins and protein complexes. The force field, called GB99dms, reproduces the results of a small molecule binding study and improves agreement with experiment for the aggregation of amyloid peptides. GB99dms, which can be used in OpenMM, is available at https://github.com/greener-group/GB99dms. This work is the first to show that gradients can be obtained directly from nanosecond-length differentiable simulations of biomolecules and highlights the effectiveness of this approach to training whole force fields to match desired properties.
Collapse
Affiliation(s)
- Joe G Greener
- Medical Research Council Laboratory of Molecular Biology Cambridge CB2 0QH UK
| |
Collapse
|
13
|
Wang Y, Inizan TJ, Liu C, Piquemal JP, Ren P. Incorporating Neural Networks into the AMOEBA Polarizable Force Field. J Phys Chem B 2024; 128:2381-2388. [PMID: 38445577 PMCID: PMC10985787 DOI: 10.1021/acs.jpcb.3c08166] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/07/2024]
Abstract
Neural network potentials (NNPs) offer significant promise to bridge the gap between the accuracy of quantum mechanics and the efficiency of molecular mechanics in molecular simulation. Most NNPs rely on the locality assumption that ensures the model's transferability and scalability and thus lack the treatment of long-range interactions, which are essential for molecular systems in the condensed phase. Here we present an integrated hybrid model, AMOEBA+NN, which combines the AMOEBA potential for the short- and long-range noncovalent atomic interactions and an NNP to capture the remaining local covalent contributions. The AMOEBA+NN model was trained on the conformational energy of the ANI-1x data set and tested on several external data sets ranging from small molecules to tetrapeptides. The hybrid model demonstrated substantial improvements over the baseline models in term of accuracy as the molecule size increased, suggesting its potential as a next-generation approach for chemically accurate molecular simulations.
Collapse
Affiliation(s)
- Yanxing Wang
- Department of Biomedical Engineering, The University of Texas at Austin, Austin, Texas 78712, United States
| | - Théo Jaffrelot Inizan
- Sorbonne Université, Laboratoire de Chimie Théorique, UMR 7616 CNRS, Paris 75005, France
| | - Chengwen Liu
- Department of Biomedical Engineering, The University of Texas at Austin, Austin, Texas 78712, United States
| | - Jean-Philip Piquemal
- Sorbonne Université, Laboratoire de Chimie Théorique, UMR 7616 CNRS, Paris 75005, France
| | - Pengyu Ren
- Department of Biomedical Engineering, The University of Texas at Austin, Austin, Texas 78712, United States
| |
Collapse
|
14
|
Davel CM, Bernat T, Wagner JR, Shirts MR. Parameterization of General Organic Polymers within the Open Force Field Framework. J Chem Inf Model 2024; 64:1290-1305. [PMID: 38303159 PMCID: PMC11090695 DOI: 10.1021/acs.jcim.3c01691] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/03/2024]
Abstract
Polymer and chemically modified biopolymer systems present unique challenges to traditional molecular simulation preparation workflows. First, typical polymer and biomolecular input formats, such as Protein Data Bank (PDB) files, lack adequate chemical information needed for the parameterization of new chemistries. Second, polymers are typically too large for accurate partial charge generation methods. In this work, we employ direct chemical perception through the Open Force Field toolkit to create a flexible polymer simulation workflow for organic polymers, encompassing everything from biopolymers to soft materials. We propose and test a new input specification for monomer information that can, along with a 3D conformational geometry, parametrize and simulate most soft-material systems within the same workflow used for smaller ligands. The monomer format encompasses a subset of the SMIRKS substructure query language to uniquely identify chemical information and repeating charges in underspecified systems through matching atomic connectivity. This workflow is combined with several different approaches for automatic partial-charge generation for larger systems. As an initial proof of concept, a variety of diverse polymeric systems were parametrized with the Open Force Field toolkit, including functionalized proteins, DNA, homopolymers, cross-linked systems, and sugars. Additionally, shape properties and radial distribution functions were computed from molecular dynamics simulations of poly(ethylene glycol), polyacrylamide, and poly(N-isopropylacrylamide) homopolymers in aqueous solution and compared to previous simulation results in order to demonstrate a start-to-finish workflow for simulation and property prediction. We expect that these tools will greatly expedite the day-to-day computational research of soft-matter simulations and create a robust atomic-scale polymer specification in conjunction with existing polymer structural notations.
Collapse
Affiliation(s)
- Connor M Davel
- Department of Chemical and Biological Engineering, University of Colorado Boulder, Boulder, Colorado 80309, United States
| | - Timotej Bernat
- Department of Chemical and Biological Engineering, University of Colorado Boulder, Boulder, Colorado 80309, United States
| | - Jeffrey R Wagner
- The Open Force Field Initiative, Open Molecular Software Foundation, Davis, California 95616, United States
| | - Michael R Shirts
- Department of Chemical and Biological Engineering, University of Colorado Boulder, Boulder, Colorado 80309, United States
| |
Collapse
|
15
|
Riedmiller K, Reiser P, Bobkova E, Maltsev K, Gryn'ova G, Friederich P, Gräter F. Substituting density functional theory in reaction barrier calculations for hydrogen atom transfer in proteins. Chem Sci 2024; 15:2518-2527. [PMID: 38362411 PMCID: PMC10866341 DOI: 10.1039/d3sc03922f] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2023] [Accepted: 01/10/2024] [Indexed: 02/17/2024] Open
Abstract
Hydrogen atom transfer (HAT) reactions are important in many biological systems. As these reactions are hard to observe experimentally, it is of high interest to shed light on them using simulations. Here, we present a machine learning model based on graph neural networks for the prediction of energy barriers of HAT reactions in proteins. As input, the model uses exclusively non-optimized structures as obtained from classical simulations. It was trained on more than 17 000 energy barriers calculated using hybrid density functional theory. We built and evaluated the model in the context of HAT in collagen, but we show that the same workflow can easily be applied to HAT reactions in other biological or synthetic polymers. We obtain for relevant reactions (small reaction distances) a model with good predictive power (R2 ∼ 0.9 and mean absolute error of <3 kcal mol-1). As the inference speed is high, this model enables evaluations of dozens of chemical situations within seconds. When combined with molecular dynamics in a kinetic Monte-Carlo scheme, the model paves the way toward reactive simulations.
Collapse
Affiliation(s)
- Kai Riedmiller
- Heidelberg Institute for Theoretical Studies Heidelberg Germany
| | - Patrick Reiser
- Institute of Theoretical Informatics, Karlsruhe Institute of Technology Engler-Bunte-Ring 8 Karlsruhe 76131 Germany
- Institute of Nanotechnology, Karlsruhe Institute of Technology Hermann-von-Helmholtz-Platz 1: 76344 Eggenstein-Leopoldshafen Germany
| | | | - Kiril Maltsev
- Heidelberg Institute for Theoretical Studies Heidelberg Germany
| | - Ganna Gryn'ova
- Heidelberg Institute for Theoretical Studies Heidelberg Germany
- Interdisciplinary Center for Scientific Computing, Heidelberg University Heidelberg Germany
| | - Pascal Friederich
- Institute of Theoretical Informatics, Karlsruhe Institute of Technology Engler-Bunte-Ring 8 Karlsruhe 76131 Germany
- Institute of Nanotechnology, Karlsruhe Institute of Technology Hermann-von-Helmholtz-Platz 1: 76344 Eggenstein-Leopoldshafen Germany
| | - Frauke Gräter
- Heidelberg Institute for Theoretical Studies Heidelberg Germany
- Interdisciplinary Center for Scientific Computing, Heidelberg University Heidelberg Germany
| |
Collapse
|
16
|
Yu Z, Annamareddy A, Morgan D, Wang B. How close are the classical two-body potentials to ab initio calculations? Insights from linear machine learning based force matching. J Chem Phys 2024; 160:054501. [PMID: 38310473 DOI: 10.1063/5.0175756] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2023] [Accepted: 01/08/2024] [Indexed: 02/05/2024] Open
Abstract
In this work, we propose a linear machine learning force matching approach that can directly extract pair atomic interactions from ab initio calculations in amorphous structures. The local feature representation is specifically chosen to make the linear weights a force field as a force/potential function of the atom pair distance. Consequently, this set of functions is the closest representation of the ab initio forces, given the two-body approximation and finite scanning in the configurational space. We validate this approach in amorphous silica. Potentials in the new force field (consisting of tabulated Si-Si, Si-O, and O-O potentials) are significantly different than existing potentials that are commonly used for silica, even though all of them produce the tetrahedral network structure and roughly similar glass properties. This suggests that the commonly used classical force fields do not offer fundamentally accurate representations of the atomic interaction in silica. The new force field furthermore produces a lower glass transition temperature (Tg ∼ 1800 K) and a positive liquid thermal expansion coefficient, suggesting the extraordinarily high Tg and negative liquid thermal expansion of simulated silica could be artifacts of previously developed classical potentials. Overall, the proposed approach provides a fundamental yet intuitive way to evaluate two-body potentials against ab initio calculations, thereby offering an efficient way to guide the development of classical force fields.
Collapse
Affiliation(s)
- Zheng Yu
- Department of Chemistry, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, USA
| | - Ajay Annamareddy
- Department of Materials Science and Engineering, University of Wisconsin-Madison, Madison, Wisconsin 53706, USA
| | - Dane Morgan
- Department of Materials Science and Engineering, University of Wisconsin-Madison, Madison, Wisconsin 53706, USA
| | - Bu Wang
- Department of Materials Science and Engineering, University of Wisconsin-Madison, Madison, Wisconsin 53706, USA
- Department of Civil and Environmental Engineering, University of Wisconsin-Madison, Madison, Wisconsin 53706, USA
| |
Collapse
|
17
|
Uba AI, Zengin G. In the quest for histone deacetylase inhibitors: current trends in the application of multilayered computational methods. Amino Acids 2023; 55:1709-1726. [PMID: 37367966 DOI: 10.1007/s00726-023-03297-y] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2023] [Accepted: 06/20/2023] [Indexed: 06/28/2023]
Abstract
Histone deacetylase (HDAC) inhibitors have gained attention over the past three decades because of their potential in the treatment of different diseases including various forms of cancers, neurodegenerative disorders, autoimmune, inflammatory diseases, and other metabolic disorders. To date, 5 HDAC inhibitor drugs are marketed for the treatment of hematological malignancies and several drug-candidate HDAC inhibitors are at different stages of clinical trials. However, due to the toxic side effects of these drugs resulting from the lack of target selectivity, active studies are ongoing to design and develop either class-selective or isoform-selective inhibitors. Computational methods have aided the discovery of HDAC inhibitors with the desired potency and/or selectivity. These methods include ligand-based approaches such as scaffold hopping, pharmacophore modeling, three-dimensional quantitative structure-activity relationships (3D-QSAR); and structure-based virtual screening (molecular docking). The current trends involve the application of the combination of these methods and incorporating molecular dynamics simulations coupled with Poisson-Boltzmann/molecular mechanics generalized Born surface area (MM-PBSA/MM-GBSA) to improve the prediction of ligand binding affinity. This review aimed at understanding the current trends in applying these multilayered strategies and their contribution to the design/identification of HDAC inhibitors.
Collapse
Affiliation(s)
- Abdullahi Ibrahim Uba
- Department of Molecular Biology and Genetics, Istanbul AREL University, Istanbul, 34537, Turkey.
| | - Gokhan Zengin
- Department of Biology, Science Faculty, Selcuk University, Konya, 42130, Turkey.
| |
Collapse
|
18
|
Lehner MT, Katzberger P, Maeder N, Schiebroek CC, Teetz J, Landrum GA, Riniker S. DASH: Dynamic Attention-Based Substructure Hierarchy for Partial Charge Assignment. J Chem Inf Model 2023; 63:6014-6028. [PMID: 37738206 PMCID: PMC10565818 DOI: 10.1021/acs.jcim.3c00800] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2023] [Indexed: 09/24/2023]
Abstract
We present a robust and computationally efficient approach for assigning partial charges of atoms in molecules. The method is based on a hierarchical tree constructed from attention values extracted from a graph neural network (GNN), which was trained to predict atomic partial charges from accurate quantum-mechanical (QM) calculations. The resulting dynamic attention-based substructure hierarchy (DASH) approach provides fast assignment of partial charges with the same accuracy as the GNN itself, is software-independent, and can easily be integrated in existing parametrization pipelines, as shown for the Open force field (OpenFF). The implementation of the DASH workflow, the final DASH tree, and the training set are available as open source/open data from public repositories.
Collapse
Affiliation(s)
| | | | - Niels Maeder
- Department of Chemistry and
Applied Biosciences, ETH Zurich, Vladimir-Prelog-Weg 2, 8093 Zurich, Switzerland
| | - Carl C.G. Schiebroek
- Department of Chemistry and
Applied Biosciences, ETH Zurich, Vladimir-Prelog-Weg 2, 8093 Zurich, Switzerland
| | - Jakob Teetz
- Department of Chemistry and
Applied Biosciences, ETH Zurich, Vladimir-Prelog-Weg 2, 8093 Zurich, Switzerland
| | - Gregory A. Landrum
- Department of Chemistry and
Applied Biosciences, ETH Zurich, Vladimir-Prelog-Weg 2, 8093 Zurich, Switzerland
| | - Sereina Riniker
- Department of Chemistry and
Applied Biosciences, ETH Zurich, Vladimir-Prelog-Weg 2, 8093 Zurich, Switzerland
| |
Collapse
|
19
|
Hagg A, Kirschner KN. Open-Source Machine Learning in Computational Chemistry. J Chem Inf Model 2023; 63:4505-4532. [PMID: 37466636 PMCID: PMC10430767 DOI: 10.1021/acs.jcim.3c00643] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2023] [Indexed: 07/20/2023]
Abstract
The field of computational chemistry has seen a significant increase in the integration of machine learning concepts and algorithms. In this Perspective, we surveyed 179 open-source software projects, with corresponding peer-reviewed papers published within the last 5 years, to better understand the topics within the field being investigated by machine learning approaches. For each project, we provide a short description, the link to the code, the accompanying license type, and whether the training data and resulting models are made publicly available. Based on those deposited in GitHub repositories, the most popular employed Python libraries are identified. We hope that this survey will serve as a resource to learn about machine learning or specific architectures thereof by identifying accessible codes with accompanying papers on a topic basis. To this end, we also include computational chemistry open-source software for generating training data and fundamental Python libraries for machine learning. Based on our observations and considering the three pillars of collaborative machine learning work, open data, open source (code), and open models, we provide some suggestions to the community.
Collapse
Affiliation(s)
- Alexander Hagg
- Institute
of Technology, Resource and Energy-Efficient Engineering (TREE), University of Applied Sciences Bonn-Rhein-Sieg, 53757 Sankt Augustin, Germany
- Department
of Electrical Engineering, Mechanical Engineering and Technical Journalism, University of Applied Sciences Bonn-Rhein-Sieg, 53757 Sankt Augustin, Germany
| | - Karl N. Kirschner
- Institute
of Technology, Resource and Energy-Efficient Engineering (TREE), University of Applied Sciences Bonn-Rhein-Sieg, 53757 Sankt Augustin, Germany
- Department
of Computer Science, University of Applied
Sciences Bonn-Rhein-Sieg, 53757 Sankt Augustin, Germany
| |
Collapse
|
20
|
Boothroyd S, Behara PK, Madin OC, Hahn DF, Jang H, Gapsys V, Wagner JR, Horton JT, Dotson DL, Thompson MW, Maat J, Gokey T, Wang LP, Cole DJ, Gilson MK, Chodera JD, Bayly CI, Shirts MR, Mobley DL. Development and Benchmarking of Open Force Field 2.0.0: The Sage Small Molecule Force Field. J Chem Theory Comput 2023; 19:3251-3275. [PMID: 37167319 PMCID: PMC10269353 DOI: 10.1021/acs.jctc.3c00039] [Citation(s) in RCA: 35] [Impact Index Per Article: 35.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2023] [Indexed: 05/13/2023]
Abstract
We introduce the Open Force Field (OpenFF) 2.0.0 small molecule force field for drug-like molecules, code-named Sage, which builds upon our previous iteration, Parsley. OpenFF force fields are based on direct chemical perception, which generalizes easily to highly diverse sets of chemistries based on substructure queries. Like the previous OpenFF iterations, the Sage generation of OpenFF force fields was validated in protein-ligand simulations to be compatible with AMBER biopolymer force fields. In this work, we detail the methodology used to develop this force field, as well as the innovations and improvements introduced since the release of Parsley 1.0.0. One particularly significant feature of Sage is a set of improved Lennard-Jones (LJ) parameters retrained against condensed phase mixture data, the first refit of LJ parameters in the OpenFF small molecule force field line. Sage also includes valence parameters refit to a larger database of quantum chemical calculations than previous versions, as well as improvements in how this fitting is performed. Force field benchmarks show improvements in general metrics of performance against quantum chemistry reference data such as root-mean-square deviations (RMSD) of optimized conformer geometries, torsion fingerprint deviations (TFD), and improved relative conformer energetics (ΔΔE). We present a variety of benchmarks for these metrics against our previous force fields as well as in some cases other small molecule force fields. Sage also demonstrates improved performance in estimating physical properties, including comparison against experimental data from various thermodynamic databases for small molecule properties such as ΔHmix, ρ(x), ΔGsolv, and ΔGtrans. Additionally, we benchmarked against protein-ligand binding free energies (ΔGbind), where Sage yields results statistically similar to previous force fields. All the data is made publicly available along with complete details on how to reproduce the training results at https://github.com/openforcefield/openff-sage.
Collapse
Affiliation(s)
| | - Pavan Kumar Behara
- Department
of Pharmaceutical Sciences, University of
California, Irvine, California 92697, United States
| | - Owen C. Madin
- Chemical
& Biological Engineering Department, University of Colorado Boulder, Boulder, Colorado 80309, United States
| | - David F. Hahn
- Computational
Chemistry, Janssen Research & Development, Turnhoutseweg 30, Beerse B-2340, Belgium
| | - Hyesu Jang
- Chemistry
Department, The University of California
at Davis, Davis, California 95616, United States
- OpenEye
Scientific Software, Santa
Fe, New Mexico 87508, United States
| | - Vytautas Gapsys
- Computational
Chemistry, Janssen Research & Development, Turnhoutseweg 30, Beerse B-2340, Belgium
- Computational
Biomolecular Dynamics Group, Department of Theoretical and Computational
Biophysics, Max Planck Institute for Multidisciplinary
Sciences, Am Fassberg 11, D-37077, Göttingen, Germany
| | - Jeffrey R. Wagner
- Department
of Pharmaceutical Sciences, University of
California, Irvine, California 92697, United States
- The Open
Force Field Initiative, Open Molecular Software
Foundation, Davis, California 95616, United States
| | - Joshua T. Horton
- School
of Natural and Environmental Sciences, Newcastle
University, Newcastle
upon Tyne NE1 7RU, U.K.
| | - David L. Dotson
- The Open
Force Field Initiative, Open Molecular Software
Foundation, Davis, California 95616, United States
- Datryllic LLC, Phoenix, Arizona 85003, United
States
| | - Matthew W. Thompson
- Chemical
& Biological Engineering Department, University of Colorado Boulder, Boulder, Colorado 80309, United States
- The Open
Force Field Initiative, Open Molecular Software
Foundation, Davis, California 95616, United States
| | - Jessica Maat
- Department
of Chemistry, University of California, Irvine, California 92697, United States
| | - Trevor Gokey
- Department
of Chemistry, University of California, Irvine, California 92697, United States
| | - Lee-Ping Wang
- Chemistry
Department, The University of California
at Davis, Davis, California 95616, United States
| | - Daniel J. Cole
- School
of Natural and Environmental Sciences, Newcastle
University, Newcastle
upon Tyne NE1 7RU, U.K.
| | - Michael K. Gilson
- Skaggs
School of Pharmacy and Pharmaceutical Sciences, The University of California at San Diego, La Jolla, California 92093, United States
| | - John D. Chodera
- Computational
& Systems Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, New York 10065, United States
| | | | - Michael R. Shirts
- Chemical
& Biological Engineering Department, University of Colorado Boulder, Boulder, Colorado 80309, United States
| | - David L. Mobley
- Department
of Pharmaceutical Sciences, University of
California, Irvine, California 92697, United States
- Department
of Chemistry, University of California, Irvine, California 92697, United States
| |
Collapse
|
21
|
Ding Y, Yu K, Huang J. Data science techniques in biomolecular force field development. Curr Opin Struct Biol 2023; 78:102502. [PMID: 36462448 DOI: 10.1016/j.sbi.2022.102502] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2022] [Revised: 10/18/2022] [Accepted: 10/25/2022] [Indexed: 12/03/2022]
Abstract
Recent advances in data science are impacting the development of classical force fields. Here we review some ideas and techniques from data science that have been used in force field development, including database construction, atom typing, and machine learning potentials. We highlight how new tools such as active learning and automatic differentiation are facilitating the generation of target data and the direct fitting with macroscopic observables. Philosophical changes on how force field models should be built and used are also discussed. It's inspiring that more accurate biomolecular force fields can be developed with the aid of data science techniques.
Collapse
Affiliation(s)
- Ye Ding
- Westlake AI Therapeutics Lab, Westlake Laboratory of Life Sciences and Biomedicine, Hangzhou, Zhejiang, 310024, China; Key Laboratory of Structural Biology of Zhejiang Province, School of Life Sciences, Westlake University, Hangzhou, Zhejiang, 310024, China
| | - Kuang Yu
- Tsinghua Shenzhen International Graduate School, Tsinghua University, Shenzhen, Guangdong, 518055, China
| | - Jing Huang
- Westlake AI Therapeutics Lab, Westlake Laboratory of Life Sciences and Biomedicine, Hangzhou, Zhejiang, 310024, China; Key Laboratory of Structural Biology of Zhejiang Province, School of Life Sciences, Westlake University, Hangzhou, Zhejiang, 310024, China.
| |
Collapse
|
22
|
Thürlemann M, Böselt L, Riniker S. Regularized by Physics: Graph Neural Network Parametrized Potentials for the Description of Intermolecular Interactions. J Chem Theory Comput 2023; 19:562-579. [PMID: 36633918 PMCID: PMC9878731 DOI: 10.1021/acs.jctc.2c00661] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2022] [Indexed: 01/13/2023]
Abstract
Simulations of molecular systems using electronic structure methods are still not feasible for many systems of biological importance. As a result, empirical methods such as force fields (FF) have become an established tool for the simulation of large and complex molecular systems. The parametrization of FF is, however, time-consuming and has traditionally been based on experimental data. Recent years have therefore seen increasing efforts to automatize FF parametrization or to replace FF with machine-learning (ML) based potentials. Here, we propose an alternative strategy to parametrize FF, which makes use of ML and gradient-descent based optimization while retaining a functional form founded in physics. Using a predefined functional form is shown to enable interpretability, robustness, and efficient simulations of large systems over long time scales. To demonstrate the strength of the proposed method, a fixed-charge and a polarizable model are trained on ab initio potential-energy surfaces. Given only information about the constituting elements, the molecular topology, and reference potential energies, the models successfully learn to assign atom types and corresponding FF parameters from scratch. The resulting models and parameters are validated on a wide range of experimentally and computationally derived properties of systems including dimers, pure liquids, and molecular crystals.
Collapse
Affiliation(s)
- Moritz Thürlemann
- Laboratory of Physical Chemistry, ETH Zürich, Vladimir-Prelog-Weg 2, 8093 Zürich, Switzerland
| | - Lennard Böselt
- Laboratory of Physical Chemistry, ETH Zürich, Vladimir-Prelog-Weg 2, 8093 Zürich, Switzerland
| | - Sereina Riniker
- Laboratory of Physical Chemistry, ETH Zürich, Vladimir-Prelog-Weg 2, 8093 Zürich, Switzerland
| |
Collapse
|
23
|
Lee J, Jang S, Kim M, Boraste DR, Kim K, Park KM, Seo J. Trapping Alkali Halide Cluster Ions Inside the Cucurbit[7]uril Cavity. J Phys Chem Lett 2022; 13:9581-9588. [PMID: 36205501 DOI: 10.1021/acs.jpclett.2c02583] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
In this study, the distinctive behavior of cucurbit[n]uril (CB[n]), which captures a variety of alkali halide clusters inside the cavity during the droplet evaporation, has been investigated by using ion mobility spectrometry-mass spectrometry. Complexes of CB[7] with various alkali chloride cluster cations or anions generated during the electrospray ionization were studied, and their collision cross-section (CCS) values were obtained to determine whether these clusters were trapped inside the cavity or not. It was found that the clusters smaller than a specific critical size were trapped inside the CB[7] cavity in the gas phase, although trapping of alkali halide clusters at the given concentration is supposed to be unfavorable in solution. We suggest that the rapid solvent evaporation rapidly increases ion concentrations and subsequently forms alkali-chloride contact ion pairs; therefore, it may provide a specific environment to enable the formation of the inclusion complexes.
Collapse
Affiliation(s)
- Jiyeon Lee
- Department of Chemistry, Pohang University of Science and Technology, Pohang 37673, Gyeongsangbuk-do, Republic of Korea
| | - Seongjae Jang
- Department of Chemistry, Pohang University of Science and Technology, Pohang 37673, Gyeongsangbuk-do, Republic of Korea
| | - Minsu Kim
- Department of Chemistry, Pohang University of Science and Technology, Pohang 37673, Gyeongsangbuk-do, Republic of Korea
| | - Deepak R Boraste
- Center for Self-assembly and Complexity, Institute for Basic Science, Pohang 37673, Gyeongsangbuk-do, Republic of Korea
| | - Kimoon Kim
- Department of Chemistry, Pohang University of Science and Technology, Pohang 37673, Gyeongsangbuk-do, Republic of Korea
- Center for Self-assembly and Complexity, Institute for Basic Science, Pohang 37673, Gyeongsangbuk-do, Republic of Korea
| | - Kyeng Min Park
- Department of Biochemistry, Daegu Catholic University School of Medicine, Daegu 42472, Republic of Korea
| | - Jongcheol Seo
- Department of Chemistry, Pohang University of Science and Technology, Pohang 37673, Gyeongsangbuk-do, Republic of Korea
| |
Collapse
|