1
|
Wang L, Behara PK, Thompson MW, Gokey T, Wang Y, Wagner JR, Cole DJ, Gilson MK, Shirts MR, Mobley DL. The Open Force Field Initiative: Open Software and Open Science for Molecular Modeling. J Phys Chem B 2024. [PMID: 38989715 DOI: 10.1021/acs.jpcb.4c01558] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/12/2024]
Abstract
Force fields are a key component of physics-based molecular modeling, describing the energies and forces in a molecular system as a function of the positions of the atoms and molecules involved. Here, we provide a review and scientific status report on the work of the Open Force Field (OpenFF) Initiative, which focuses on the science, infrastructure and data required to build the next generation of biomolecular force fields. We introduce the OpenFF Initiative and the related OpenFF Consortium, describe its approach to force field development and software, and discuss accomplishments to date as well as future plans. OpenFF releases both software and data under open and permissive licensing agreements to enable rapid application, validation, extension, and modification of its force fields and software tools. We discuss lessons learned to date in this new approach to force field development. We also highlight ways that other force field researchers can get involved, as well as some recent successes of outside researchers taking advantage of OpenFF tools and data.
Collapse
Affiliation(s)
- Lily Wang
- Open Force Field, Open Molecular Software Foundation, Davis, California 95616, United States
| | - Pavan Kumar Behara
- Center for Neurotherapeutics, University of California, Irvine, California 92697, United States
| | - Matthew W Thompson
- Open Force Field, Open Molecular Software Foundation, Davis, California 95616, United States
| | - Trevor Gokey
- Department of Chemistry, University of California, Irvine, California 92697, United States
| | - Yuanqing Wang
- Simons Center for Computational Physical Chemistry and Center for Data Science, New York, New York 10004, United States
| | - Jeffrey R Wagner
- Open Force Field, Open Molecular Software Foundation, Davis, California 95616, United States
| | - Daniel J Cole
- School of Natural and Environmental Sciences, Newcastle University, Newcastle upon Tyne NE1 7RU, United Kingdom
| | - Michael K Gilson
- Skaggs School of Pharmacy and Pharmaceutical Sciences, The University of California at San Diego, La Jolla, California 92093, United States
| | - Michael R Shirts
- Department of Chemical and Biological Engineering, University of Colorado Boulder, Boulder, Colorado 80305, United States
| | - David L Mobley
- Department of Chemistry, University of California, Irvine, California 92697, United States
- Department of Pharmaceutical Sciences, University of California, Irvine, California 92697, United States
| |
Collapse
|
2
|
Gheta SKO, Bonin A, Gerlach T, Göller AH. Predicting absolute aqueous solubility by applying a machine learning model for an artificially liquid-state as proxy for the solid-state. J Comput Aided Mol Des 2023; 37:765-789. [PMID: 37878216 DOI: 10.1007/s10822-023-00538-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2023] [Accepted: 10/02/2023] [Indexed: 10/26/2023]
Abstract
In this study, we use machine learning algorithms with QM-derived COSMO-RS descriptors, along with Morgan fingerprints, to predict the absolute solubility of drug-like compounds. The QM-derived descriptors account for the molecular properties of the solute, i.e., the solute-solute interactions in an artificial-liquid-state (super-cooled liquid), and the solute-solvent interactions in solution. We employ two main approaches to predict solubility: (i) a hypothetical pathway that involves melting the solute at room temperature T = T¯ ([Formula: see text]) and mixing the artificially liquid solute into the solvent ([Formula: see text]). In this approach [Formula: see text] is predicted using machine learning models, and the [Formula: see text] is obtained from COSMO-RS calculations; (ii) direct solubility prediction using machine learning algorithms. The models were trained on a large number of Bayer in-house compounds for which water solubility data is available at physiological pH of 6.5 and ambient temperature. We also evaluated our models using external datasets from a solubility challenge. Our models present great improvements compared to the absolute solubility prediction with the QSAR model for the artificial liquid state as implemented in the COSMOtherm software, for both in-house and external datasets. We are furthermore able to demonstrate the superiority of QM-derived descriptors compared to cheminformatics descriptors. We finally present low-cost alternative models using fragment-based COSMOquick calculations with only marginal reduction in the quality of predicted solubility.
Collapse
Affiliation(s)
- Sadra Kashef Ol Gheta
- Bayer AG, Pharmaceuticals, R&D, Computational Molecular Design, 42096, Wuppertal, Germany
| | - Anne Bonin
- Bayer AG, Pharmaceuticals, R&D, Computational Molecular Design, 42096, Wuppertal, Germany
| | - Thomas Gerlach
- Bayer AG, Crop Science, R&D, Digital Transformation, 40789, Monheim, Germany
- Bayer AG, Engineering & Technology, Thermal Separation Technologies, 51368, Leverkusen, Germany
| | - Andreas H Göller
- Bayer AG, Pharmaceuticals, R&D, Computational Molecular Design, 42096, Wuppertal, Germany.
| |
Collapse
|
3
|
Seo B, Savoie BM. Evidence That Less Can Be More for Transferable Force Fields. J Chem Inf Model 2023; 63:1188-1195. [PMID: 36744744 DOI: 10.1021/acs.jcim.2c01163] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
Graph-based parameter assignment has been the basis for developing transferable force fields for molecular dynamics simulations for decades. Nevertheless, transferable force fields vary in how specifically terms are defined with respect to the molecular graph and the procedures for generating parametrization data. More-specific force-field terms increase the complexity of the force field, theoretically increasing accuracy but also increasing training data requirements. In contrast, less-specific force fields can be reused across larger regions of chemical space, theoretically reducing accuracy but also reducing the number of parameters and training data requirements. Here, the tradeoffs between force-field specificity and accuracy are quantified by parametrizing three new sets of force fields with varying levels of graph specificity, using a shared procedure for generating training data. These force fields are benchmarked for their ability to reproduce the structural features and liquid properties of 87 organic molecules at 146 distinct state points. The overall accuracy for properties that were directly trained on rapidly saturates as the graph specificity of the force-field increases. From this, we conclude there is at best a marginal benefit of using less transferable and more complex force fields with common sources of quantum-chemically derived training data. When looking at properties unseen during training, there is some evidence that the more-complex force fields even perform slightly worse. These results are rationalized by the fortuitous regularization of force fields based on less-specific and more-transferable atom types. Both the saturation in the accuracy of training properties and the marginally worse performance on off-target properties fundamentally contradict the expectation that bespoke force fields are generally more accurate, given their larger number of parameters, and suggests that increasing force-field complexity should be carefully justified against performance gains and balanced against available training data.
Collapse
Affiliation(s)
- Bumjoon Seo
- Davidson School of Chemical Engineering, Purdue University, West Lafayette, Indiana47906, United States
| | - Brett M Savoie
- Davidson School of Chemical Engineering, Purdue University, West Lafayette, Indiana47906, United States
| |
Collapse
|
4
|
Kříž K, Schmidt L, Andersson AT, Walz MM, van der Spoel D. An Imbalance in the Force: The Need for Standardized Benchmarks for Molecular Simulation. J Chem Inf Model 2023; 63:412-431. [PMID: 36630710 PMCID: PMC9875315 DOI: 10.1021/acs.jcim.2c01127] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2022] [Indexed: 01/12/2023]
Abstract
Force fields (FFs) for molecular simulation have been under development for more than half a century. As with any predictive model, rigorous testing and comparisons of models critically depends on the availability of standardized data sets and benchmarks. While such benchmarks are rather common in the fields of quantum chemistry, this is not the case for empirical FFs. That is, few benchmarks are reused to evaluate FFs, and development teams rather use their own training and test sets. Here we present an overview of currently available tests and benchmarks for computational chemistry, focusing on organic compounds, including halogens and common ions, as FFs for these are the most common ones. We argue that many of the benchmark data sets from quantum chemistry can in fact be reused for evaluating FFs, but new gas phase data is still needed for compounds containing phosphorus and sulfur in different valence states. In addition, more nonequilibrium interaction energies and forces, as well as molecular properties such as electrostatic potentials around compounds, would be beneficial. For the condensed phases there is a large body of experimental data available, and tools to utilize these data in an automated fashion are under development. If FF developers, as well as researchers in artificial intelligence, would adopt a number of these data sets, it would become easier to compare the relative strengths and weaknesses of different models and to, eventually, restore the balance in the force.
Collapse
Affiliation(s)
- Kristian Kříž
- Department
of Cell and Molecular Biology, Uppsala University, Box 596, SE-75124Uppsala, Sweden
| | - Lisa Schmidt
- Faculty
of Biosciences, University of Heidelberg, Heidelberg69117, Germany
| | - Alfred T. Andersson
- Department
of Cell and Molecular Biology, Uppsala University, Box 596, SE-75124Uppsala, Sweden
| | - Marie-Madeleine Walz
- Department
of Cell and Molecular Biology, Uppsala University, Box 596, SE-75124Uppsala, Sweden
| | - David van der Spoel
- Department
of Cell and Molecular Biology, Uppsala University, Box 596, SE-75124Uppsala, Sweden
| |
Collapse
|
5
|
Horton J, Boothroyd S, Wagner J, Mitchell JA, Gokey T, Dotson DL, Behara PK, Ramaswamy VK, Mackey M, Chodera JD, Anwar J, Mobley DL, Cole DJ. Open Force Field BespokeFit: Automating Bespoke Torsion Parametrization at Scale. J Chem Inf Model 2022; 62:5622-5633. [PMID: 36351167 PMCID: PMC9709916 DOI: 10.1021/acs.jcim.2c01153] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
The development of accurate transferable force fields is key to realizing the full potential of atomistic modeling in the study of biological processes such as protein-ligand binding for drug discovery. State-of-the-art transferable force fields, such as those produced by the Open Force Field Initiative, use modern software engineering and automation techniques to yield accuracy improvements. However, force field torsion parameters, which must account for many stereoelectronic and steric effects, are considered to be less transferable than other force field parameters and are therefore often targets for bespoke parametrization. Here, we present the Open Force Field QCSubmit and BespokeFit software packages that, when combined, facilitate the fitting of torsion parameters to quantum mechanical reference data at scale. We demonstrate the use of QCSubmit for simplifying the process of creating and archiving large numbers of quantum chemical calculations, by generating a dataset of 671 torsion scans for druglike fragments. We use BespokeFit to derive individual torsion parameters for each of these molecules, thereby reducing the root-mean-square error in the potential energy surface from 1.1 kcal/mol, using the original transferable force field, to 0.4 kcal/mol using the bespoke version. Furthermore, we employ the bespoke force fields to compute the relative binding free energies of a congeneric series of inhibitors of the TYK2 protein, and demonstrate further improvements in accuracy, compared to the base force field (MUE reduced from 0.560.390.77 to 0.420.280.59 kcal/mol and R2 correlation improved from 0.720.350.87 to 0.930.840.97).
Collapse
Affiliation(s)
- Joshua
T. Horton
- School
of Natural and Environmental Sciences, Newcastle
University, Newcastle
upon TyneNE1 7RU, United
Kingdom
| | - Simon Boothroyd
- Boothroyd
Scientific Consulting Ltd., 71-75 Shelton Street, LondonWC2H 9JQ, Greater London, United Kingdom
| | - Jeffrey Wagner
- The
Open Force Field Initiative, Open Molecular
Software Foundation, Davis, California95616, United States
| | - Joshua A. Mitchell
- The
Open Force Field Initiative, Open Molecular
Software Foundation, Davis, California95616, United States
| | - Trevor Gokey
- Department
of Chemistry, University of California, Irvine, California92697, United States
| | - David L. Dotson
- The
Open Force Field Initiative, Open Molecular
Software Foundation, Davis, California95616, United States
| | - Pavan Kumar Behara
- Department
of Pharmaceutical Sciences, University of
California, Irvine, California92697, United States
| | | | - Mark Mackey
- Cresset, New Cambridge House, Bassingbourn
Road, LitlingtonSG8 0SS, Cambridgeshire, United Kingdom
| | - John D. Chodera
- Computational
& Systems Biology Program, Sloan Kettering
Institute, Memorial Sloan Kettering Cancer Center, New
York, New York10065, United States
| | - Jamshed Anwar
- Department
of Chemistry, Lancaster University, LancasterLA1 4YW, United Kingdom
| | - David L. Mobley
- Department
of Chemistry, University of California, Irvine, California92697, United States,Department
of Pharmaceutical Sciences, University of
California, Irvine, California92697, United States
| | - Daniel J. Cole
- School
of Natural and Environmental Sciences, Newcastle
University, Newcastle
upon TyneNE1 7RU, United
Kingdom,
| |
Collapse
|
6
|
Oliveira MP, Gonçalves YMH, Ol Gheta SK, Rieder SR, Horta BAC, Hünenberger PH. Comparison of the United- and All-Atom Representations of (Halo)alkanes Based on Two Condensed-Phase Force Fields Optimized against the Same Experimental Data Set. J Chem Theory Comput 2022; 18:6757-6778. [PMID: 36190354 DOI: 10.1021/acs.jctc.2c00524] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
The level of accuracy that can be achieved by a force field is influenced by choices made in the interaction-function representation and in the relevant simulation parameters. These choices, referred to here as functional-form variants (FFVs), include for example the model resolution, the charge-derivation procedure, the van der Waals combination rules, the cutoff distance, and the treatment of the long-range interactions. Ideally, assessing the effect of a given FFV on the intrinsic accuracy of the force-field representation requires that only the specific FFV is changed and that this change is performed at an optimal level of parametrization, a requirement that may prove extremely challenging to achieve in practice. Here, we present a first attempt at such a comparison for one specific FFV, namely the choice of a united-atom (UA) versus an all-atom (AA) resolution in a force field for saturated acyclic (halo)alkanes. Two force-field versions (UA vs AA) are optimized in an automated way using the CombiFF approach against 961 experimental values for the pure-liquid densities ρliq and vaporization enthalpies ΔHvap of 591 compounds. For the AA force field, the torsional and third-neighbor Lennard-Jones parameters are also refined based on quantum-mechanical rotational-energy profiles. The comparison between the UA and AA resolutions is also extended to properties that have not been included as parameterization targets, namely the surface-tension coefficient γ, the isothermal compressibility κT, the isobaric thermal-expansion coefficient αP, the isobaric heat capacity cP, the static relative dielectric permittivity ϵ, the self-diffusion coefficient D, the shear viscosity η, the hydration free energy ΔGwat, and the free energy of solvation ΔGche in cyclohexane. For the target properties ρliq and ΔHvap, the UA and AA resolutions reach very similar levels of accuracy after optimization. For the nine other properties, the AA representation leads to more accurate results in terms of η; comparably accurate results in terms of γ, κT, αP, ϵ, D, and ΔGche; and less accurate results in terms of cP and ΔGwat. This work also represents a first step toward the calibration of a GROMOS-compatible force field at the AA resolution.
Collapse
Affiliation(s)
- Marina P Oliveira
- Laboratorium für Physikalische Chemie, ETH Zürich, ETH-Hönggerberg, HCI, CH-8093 Zürich, Switzerland
| | - Yan M H Gonçalves
- Laboratorium für Physikalische Chemie, ETH Zürich, ETH-Hönggerberg, HCI, CH-8093 Zürich, Switzerland
| | - S Kashef Ol Gheta
- Laboratorium für Physikalische Chemie, ETH Zürich, ETH-Hönggerberg, HCI, CH-8093 Zürich, Switzerland
| | - Salomé R Rieder
- Laboratorium für Physikalische Chemie, ETH Zürich, ETH-Hönggerberg, HCI, CH-8093 Zürich, Switzerland
| | - Bruno A C Horta
- Laboratorium für Physikalische Chemie, ETH Zürich, ETH-Hönggerberg, HCI, CH-8093 Zürich, Switzerland
| | - Philippe H Hünenberger
- Laboratorium für Physikalische Chemie, ETH Zürich, ETH-Hönggerberg, HCI, CH-8093 Zürich, Switzerland
| |
Collapse
|