1
|
Sivaraman G, Benmore CJ. Deciphering diffuse scattering with machine learning and the equivariant foundation model: the case of molten FeO. JOURNAL OF PHYSICS. CONDENSED MATTER : AN INSTITUTE OF PHYSICS JOURNAL 2024; 36:381501. [PMID: 38866028 DOI: 10.1088/1361-648x/ad577b] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/29/2024] [Accepted: 06/12/2024] [Indexed: 06/14/2024]
Abstract
Bridging the gap between diffuse x-ray or neutron scattering measurements and predicted structures derived from atom-atom pair potentials in disordered materials, has been a longstanding challenge in condensed matter physics. This perspective gives a brief overview of the traditional approaches employed over the past several decades. Namely, the use of approximate interatomic pair potentials that relate three-dimensional structural models to the measured structure factor and its' associated pair distribution function. The use of machine learned interatomic potentials has grown in the past few years, and has been particularly successful in the cases of ionic and oxide systems. Recent advances in large scale sampling, along with a direct integration of scattering measurements into the model development, has provided improved agreement between experiments and large-scale models calculated with quantum mechanical accuracy. However, details of local polyhedral bonding and connectivity in meta-stable disordered systems still require improvement. Here we leverage MACE-MP-0; a newly introduced equivariant foundation model and validate the results against high-quality experimental scattering data for the case of molten iron(II) oxide (FeO). These preliminary results suggest that the emerging foundation model has the potential to surpass the traditional limitations of classical interatomic potentials.
Collapse
Affiliation(s)
- Ganesh Sivaraman
- Department of Chemical and Biomolecular Engineering, University of Illinois at Urbana-Champaign, Urbana, IL 61801, United States of America
- C-STEEL Center for Steel Electrification by Electrosynthesis, Argonne National Laboratory, Argonne, IL 60438, United States of America
| | - Chris J Benmore
- C-STEEL Center for Steel Electrification by Electrosynthesis, Argonne National Laboratory, Argonne, IL 60438, United States of America
- X-Ray Science Division, Advanced Photon Source, Argonne National Laboratory, Argonne, IL 60438, United States of America
| |
Collapse
|
2
|
Finkbeiner J, Tovey S, Holm C. Generating Minimal Training Sets for Machine Learned Potentials. PHYSICAL REVIEW LETTERS 2024; 132:167301. [PMID: 38701485 DOI: 10.1103/physrevlett.132.167301] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/21/2022] [Revised: 09/11/2023] [Accepted: 03/19/2024] [Indexed: 05/05/2024]
Abstract
This Letter presents a novel approach for identifying uncorrelated atomic configurations from extensive datasets with a nonstandard neural network workflow known as random network distillation (RND) for training machine-learned interatomic potentials (MLPs). This method is coupled with a DFT workflow wherein initial data are generated with cheaper classical methods before only the minimal subset is passed to a more computationally expensive ab initio calculation. This benefits training not only by reducing the number of expensive DFT calculations required but also by providing a pathway to the use of more accurate quantum mechanical calculations. The method's efficacy is demonstrated by constructing machine-learned interatomic potentials for the molten salts KCl and NaCl. Our RND method allows accurate models to be fit on minimal datasets, as small as 32 configurations, reducing the required structures by at least 1 order of magnitude compared to alternative methods. This reduction in dataset sizes not only substantially reduces computational overhead for training data generation but also provides a more comprehensive starting point for active-learning procedures.
Collapse
Affiliation(s)
- Jan Finkbeiner
- Peter Grünberg Institute Forschungszentrum Jülich GmbH Wilhelm-Johnen-Straße, 52428 Jülich, Germany
| | - Samuel Tovey
- Institute for Computational Physics University of Stuttgart Allmandring 3, 70569 Stuttgart, Germany
| | - Christian Holm
- Institute for Computational Physics University of Stuttgart Allmandring 3, 70569 Stuttgart, Germany
| |
Collapse
|
3
|
Morado J, Mortenson PN, Nissink JWM, Essex JW, Skylaris CK. Does a Machine-Learned Potential Perform Better Than an Optimally Tuned Traditional Force Field? A Case Study on Fluorohydrins. J Chem Inf Model 2023; 63:2810-2827. [PMID: 37071825 PMCID: PMC10170518 DOI: 10.1021/acs.jcim.2c01510] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/20/2023]
Abstract
We present a comparative study that evaluates the performance of a machine learning potential (ANI-2x), a conventional force field (GAFF), and an optimally tuned GAFF-like force field in the modeling of a set of 10 γ-fluorohydrins that exhibit a complex interplay between intra- and intermolecular interactions in determining conformer stability. To benchmark the performance of each molecular model, we evaluated their energetic, geometric, and sampling accuracies relative to quantum-mechanical data. This benchmark involved conformational analysis both in the gas phase and chloroform solution. We also assessed the performance of the aforementioned molecular models in estimating nuclear spin-spin coupling constants by comparing their predictions to experimental data available in chloroform. The results and discussion presented in this study demonstrate that ANI-2x tends to predict stronger-than-expected hydrogen bonding and overstabilize global minima and shows problems related to inadequate description of dispersion interactions. Furthermore, while ANI-2x is a viable model for modeling in the gas phase, conventional force fields still play an important role, especially for condensed-phase simulations. Overall, this study highlights the strengths and weaknesses of each model, providing guidelines for the use and future development of force fields and machine learning potentials.
Collapse
Affiliation(s)
- João Morado
- School of Chemistry, University of Southampton, Highfield, Southampton SO17 1BJ, United Kingdom
| | - Paul N Mortenson
- Astex Pharmaceuticals, 436 Cambridge Science Park, Milton Road, Cambridge CB4 0QA, United Kingdom
| | - J Willem M Nissink
- Computational Chemistry, Oncology R&D, AstraZeneca, Cambridge CB4 0WG, United Kingdom
| | - Jonathan W Essex
- School of Chemistry, University of Southampton, Highfield, Southampton SO17 1BJ, United Kingdom
| | - Chris-Kriton Skylaris
- School of Chemistry, University of Southampton, Highfield, Southampton SO17 1BJ, United Kingdom
| |
Collapse
|
4
|
Bougueroua S, Bricage M, Aboulfath Y, Barth D, Gaigeot MP. Algorithmic Graph Theory, Reinforcement Learning and Game Theory in MD Simulations: From 3D Structures to Topological 2D-Molecular Graphs (2D-MolGraphs) and Vice Versa. Molecules 2023; 28:molecules28072892. [PMID: 37049654 PMCID: PMC10096312 DOI: 10.3390/molecules28072892] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2023] [Revised: 03/17/2023] [Accepted: 03/18/2023] [Indexed: 04/14/2023] Open
Abstract
This paper reviews graph-theory-based methods that were recently developed in our group for post-processing molecular dynamics trajectories. We show that the use of algorithmic graph theory not only provides a direct and fast methodology to identify conformers sampled over time but also allows to follow the interconversions between the conformers through graphs of transitions in time. Examples of gas phase molecules and inhomogeneous aqueous solid interfaces are presented to demonstrate the power of topological 2D graphs and their versatility for post-processing molecular dynamics trajectories. An even more complex challenge is to predict 3D structures from topological 2D graphs. Our first attempts to tackle such a challenge are presented with the development of game theory and reinforcement learning methods for predicting the 3D structure of a gas-phase peptide.
Collapse
Affiliation(s)
- Sana Bougueroua
- Université Paris-Saclay, University Evry, CY Cergy Paris Université, CNRS, LAMBE UMR8587, 91025 Evry-Courcouronnes, France
| | - Marie Bricage
- Université Paris-Saclay, University Versailles Saint Quentin, DAVID, 78000 Versailles, France
| | - Ylène Aboulfath
- Université Paris-Saclay, University Versailles Saint Quentin, DAVID, 78000 Versailles, France
| | - Dominique Barth
- Université Paris-Saclay, University Versailles Saint Quentin, DAVID, 78000 Versailles, France
| | - Marie-Pierre Gaigeot
- Université Paris-Saclay, University Evry, CY Cergy Paris Université, CNRS, LAMBE UMR8587, 91025 Evry-Courcouronnes, France
| |
Collapse
|
5
|
Bougueroua S, Aboulfath Y, Barth D, Gaigeot MP. Algorithmic graph theory for post-processing molecular dynamics trajectories. Mol Phys 2023. [DOI: 10.1080/00268976.2022.2162456] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/21/2023]
Affiliation(s)
- Sana Bougueroua
- Université Paris-Saclay, Univ Evry, CNRS, LAMBE UMR8587, Evry-Courcouronnes, France
| | - Ylène Aboulfath
- Université Paris-Saclay, Univ Versailles SQ, DAVID, Versailles, France
| | - Dominique Barth
- Université Paris-Saclay, Univ Versailles SQ, DAVID, Versailles, France
| | - Marie-Pierre Gaigeot
- Université Paris-Saclay, Univ Evry, CNRS, LAMBE UMR8587, Evry-Courcouronnes, France
| |
Collapse
|
6
|
Browning NJ, Faber FA, Anatole von Lilienfeld O. GPU-accelerated approximate kernel method for quantum machine learning. J Chem Phys 2022; 157:214801. [PMID: 36511559 DOI: 10.1063/5.0108967] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
We introduce Quantum Machine Learning (QML)-Lightning, a PyTorch package containing graphics processing unit (GPU)-accelerated approximate kernel models, which can yield trained models within seconds. QML-Lightning includes a cost-efficient GPU implementation of FCHL19, which together can provide energy and force predictions with competitive accuracy on a microsecond per atom timescale. Using modern GPU hardware, we report learning curves of energies and forces as well as timings as numerical evidence for select legacy benchmarks from atomistic simulation including QM9, MD-17, and 3BPA.
Collapse
Affiliation(s)
- Nicholas J Browning
- Institute of Physical Chemistry and National Center for Computational Design and Discovery of Novel Materials, Department of Chemistry, University of Basel, Klingelbergstrasse 80, CH-4056 Basel, Switzerland
| | - Felix A Faber
- Department of Physics, University of Cambridge, Cambridge, United Kingdom
| | | |
Collapse
|
7
|
Bieniek MK, Cree B, Pirie R, Horton JT, Tatum NJ, Cole DJ. An open-source molecular builder and free energy preparation workflow. Commun Chem 2022; 5:136. [PMID: 36320862 PMCID: PMC9607723 DOI: 10.1038/s42004-022-00754-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2022] [Accepted: 10/11/2022] [Indexed: 01/27/2023] Open
Abstract
Automated free energy calculations for the prediction of binding free energies of congeneric series of ligands to a protein target are growing in popularity, but building reliable initial binding poses for the ligands is challenging. Here, we introduce the open-source FEgrow workflow for building user-defined congeneric series of ligands in protein binding pockets for input to free energy calculations. For a given ligand core and receptor structure, FEgrow enumerates and optimises the bioactive conformations of the grown functional group(s), making use of hybrid machine learning/molecular mechanics potential energy functions where possible. Low energy structures are optionally scored using the gnina convolutional neural network scoring function, and output for more rigorous protein-ligand binding free energy predictions. We illustrate use of the workflow by building and scoring binding poses for ten congeneric series of ligands bound to targets from a standard, high quality dataset of protein-ligand complexes. Furthermore, we build a set of 13 inhibitors of the SARS-CoV-2 main protease from the literature, and use free energy calculations to retrospectively compute their relative binding free energies. FEgrow is freely available at https://github.com/cole-group/FEgrow, along with a tutorial.
Collapse
Affiliation(s)
- Mateusz K. Bieniek
- School of Natural and Environmental Sciences, Newcastle University, Newcastle upon Tyne, NE1 7RU UK
| | - Ben Cree
- School of Natural and Environmental Sciences, Newcastle University, Newcastle upon Tyne, NE1 7RU UK
| | - Rachael Pirie
- School of Natural and Environmental Sciences, Newcastle University, Newcastle upon Tyne, NE1 7RU UK
| | - Joshua T. Horton
- School of Natural and Environmental Sciences, Newcastle University, Newcastle upon Tyne, NE1 7RU UK
| | - Natalie J. Tatum
- Newcastle University Centre for Cancer, Translational and Clinical Research Institute, Newcastle University, Newcastle upon Tyne, NE2 4HH UK
| | - Daniel J. Cole
- School of Natural and Environmental Sciences, Newcastle University, Newcastle upon Tyne, NE1 7RU UK
| |
Collapse
|
8
|
Gallarati S, Fabregat R, Juraskova V, Inizan TJ, Corminboeuf C. How Robust Is the Reversible Steric Shielding Strategy for Photoswitchable Organocatalysts? J Org Chem 2022; 87:8849-8857. [PMID: 35762705 PMCID: PMC9295146 DOI: 10.1021/acs.joc.1c02991] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
A highly appealing strategy to modulate a catalyst's activity and/or selectivity in a dynamic and noninvasive way is to incorporate a photoresponsive unit into a catalytically competent molecule. However, the description of the photoinduced conformational or structural changes that alter the catalyst's intrinsic reactivity is often reduced to a handful of intuitive static representations, which can struggle to capture the complexity of flexible organocatalysts. Here, we show how a comprehensive exploration of the free energy landscape of N-alkylated azobenzene-tethered piperidine catalysts is essential to unravel the conformational characteristics of each configurational state and explain the experimentally observed reactivity trends. Mapping the catalysts' conformational space highlights the existence of false ON or OFF states that lower their switching ability. Our findings expose the challenges associated with the realization of a reversible steric shielding for the photocontrol of Brønsted basicity of piperidine photoswitchable organocatalysts.
Collapse
Affiliation(s)
- Simone Gallarati
- Laboratory for Computational Molecular Design, Institute of Chemical Sciences and Engineering, Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne 1015, Switzerland
| | - Raimon Fabregat
- Laboratory for Computational Molecular Design, Institute of Chemical Sciences and Engineering, Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne 1015, Switzerland
| | - Veronika Juraskova
- Laboratory for Computational Molecular Design, Institute of Chemical Sciences and Engineering, Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne 1015, Switzerland
| | - Theo Jaffrelot Inizan
- Laboratory for Computational Molecular Design, Institute of Chemical Sciences and Engineering, Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne 1015, Switzerland
| | - Clemence Corminboeuf
- Laboratory for Computational Molecular Design, Institute of Chemical Sciences and Engineering, Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne 1015, Switzerland.,National Center for Competence in Research─Catalysis (NCCR-Catalysis), Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne 1015, Switzerland.,National Center for Computational Design and Discovery of Novel Materials (MARVEL), Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne 1015, Switzerland
| |
Collapse
|
9
|
Kovács DP, Oord CVD, Kucera J, Allen AEA, Cole DJ, Ortner C, Csányi G. Linear Atomic Cluster Expansion Force Fields for Organic Molecules: Beyond RMSE. J Chem Theory Comput 2021; 17:7696-7711. [PMID: 34735161 PMCID: PMC8675139 DOI: 10.1021/acs.jctc.1c00647] [Citation(s) in RCA: 31] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2021] [Indexed: 01/25/2023]
Abstract
We demonstrate that fast and accurate linear force fields can be built for molecules using the atomic cluster expansion (ACE) framework. The ACE models parametrize the potential energy surface in terms of body-ordered symmetric polynomials making the functional form reminiscent of traditional molecular mechanics force fields. We show that the four- or five-body ACE force fields improve on the accuracy of the empirical force fields by up to a factor of 10, reaching the accuracy typical of recently proposed machine-learning-based approaches. We not only show state of the art accuracy and speed on the widely used MD17 and ISO17 benchmark data sets, but we also go beyond RMSE by comparing a number of ML and empirical force fields to ACE on more important tasks such as normal-mode prediction, high-temperature molecular dynamics, dihedral torsional profile prediction, and even bond breaking. We also demonstrate the smoothness, transferability, and extrapolation capabilities of ACE on a new challenging benchmark data set comprised of a potential energy surface of a flexible druglike molecule.
Collapse
Affiliation(s)
- Dávid Péter Kovács
- Engineering
Laboratory, University of Cambridge, Cambridge, CB2 1PZUnited Kingdom
| | - Cas van der Oord
- Engineering
Laboratory, University of Cambridge, Cambridge, CB2 1PZUnited Kingdom
| | - Jiri Kucera
- Engineering
Laboratory, University of Cambridge, Cambridge, CB2 1PZUnited Kingdom
| | - Alice E. A. Allen
- Department
of Physics and Materials Science, University
of Luxembourg, L-1511Luxembourg City, Luxembourg
| | - Daniel J. Cole
- School
of Natural and Environmental Sciences, Newcastle University, Newcastle upon Tyne, NE1
7RUUnited Kingdom
| | - Christoph Ortner
- Department
of Mathematics, University of British Columbia, Vancouver, BC, CanadaV6T 1Z2
| | - Gábor Csányi
- Engineering
Laboratory, University of Cambridge, Cambridge, CB2 1PZUnited Kingdom
| |
Collapse
|
10
|
Gaigeot MP. Some opinions on MD-based vibrational spectroscopy of gas phase molecules and their assembly: An overview of what has been achieved and where to go. SPECTROCHIMICA ACTA. PART A, MOLECULAR AND BIOMOLECULAR SPECTROSCOPY 2021; 260:119864. [PMID: 34052762 DOI: 10.1016/j.saa.2021.119864] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/14/2021] [Revised: 04/13/2021] [Accepted: 04/18/2021] [Indexed: 06/12/2023]
Abstract
We hereby review molecular dynamics simulations for anharmonic gas phase spectroscopy and provide some of our opinions of where the field is heading. With these new directions, the theoretical IR/Raman spectroscopy of large (bio)-molecular systems will be more easily achievable over longer time-scale MD trajectories for an increase in accuracy of the MD-IR and MD-Raman calculated spectra. With the new directions presented here, the high throughput 'decoding' of experimental IR/Raman spectra into 3D-structures should thus be possible, hence advancing e.g. the field of MS-IR for structural characterization by spectroscopy. We also review the assignment of vibrational spectra in terms of anharmonic molecular modes from the MD trajectories, and especially introduce our recent developments based on Graph Theory algorithms. Graph Theory algorithmic is also introduced in this review for the identification of the molecular 3D-structures sampled over MD trajectories.
Collapse
Affiliation(s)
- Marie-Pierre Gaigeot
- Université Paris-Saclay, Univ Evry, CNRS, LAMBE UMR8587, 91025 Evry-Courcouronnes, France.
| |
Collapse
|
11
|
Wang X, Xu Y, Zheng H, Yu K. A Scalable Graph Neural Network Method for Developing an Accurate Force Field of Large Flexible Organic Molecules. J Phys Chem Lett 2021; 12:7982-7987. [PMID: 34433274 DOI: 10.1021/acs.jpclett.1c02214] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
An accurate force field is the key to the success of all molecular mechanics simulations on organic polymers and biomolecules. Accurate correlated wave function (CW) methods scale poorly with system size, so this poses a great challenge to the development of an extendible ab initio force field for large flexible organic molecules at the CW level of accuracy. In this work, we combine the physics-driven nonbonding potential with a data-driven subgraph neural network bonding model (named sGNN). Tests on polyethylene glycol, polyethene, and their block polymers show that our strategy is highly accurate and robust for molecules of different sizes and chemical compositions. Therefore, one can develop a parameter library of small molecular fragments (with sizes easily accessible to CW methods) and assemble them to predict the energy of large polymers, thus opening a new path to next-generation organic force fields.
Collapse
Affiliation(s)
- Xufei Wang
- Two Sigma Investments, New York, New York 10013, United States
| | - Yuanda Xu
- The Program in Applied & Computational Mathematics, Princeton University, Princeton, New Jersey 08544-1000, United States
| | - Han Zheng
- Tsinghua-Berkeley Shenzhen Institute (TBSI), Institute of Materials Research (iMR), Tsinghua Shenzhen International Graduate School (TSIGS), Tsinghua University, Shenzhen 518055, P. R. China
| | - Kuang Yu
- Tsinghua-Berkeley Shenzhen Institute (TBSI), Institute of Materials Research (iMR), Tsinghua Shenzhen International Graduate School (TSIGS), Tsinghua University, Shenzhen 518055, P. R. China
| |
Collapse
|
12
|
Young TA, Johnston-Wood T, Deringer VL, Duarte F. A transferable active-learning strategy for reactive molecular force fields. Chem Sci 2021; 12:10944-10955. [PMID: 34476072 PMCID: PMC8372546 DOI: 10.1039/d1sc01825f] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2021] [Accepted: 07/04/2021] [Indexed: 11/25/2022] Open
Abstract
Predictive molecular simulations require fast, accurate and reactive interatomic potentials. Machine learning offers a promising approach to construct such potentials by fitting energies and forces to high-level quantum-mechanical data, but doing so typically requires considerable human intervention and data volume. Here we show that, by leveraging hierarchical and active learning, accurate Gaussian Approximation Potential (GAP) models can be developed for diverse chemical systems in an autonomous manner, requiring only hundreds to a few thousand energy and gradient evaluations on a reference potential-energy surface. The approach uses separate intra- and inter-molecular fits and employs a prospective error metric to assess the accuracy of the potentials. We demonstrate applications to a range of molecular systems with relevance to computational organic chemistry: ranging from bulk solvents, a solvated metal ion and a metallocage onwards to chemical reactivity, including a bifurcating Diels-Alder reaction in the gas phase and non-equilibrium dynamics (a model SN2 reaction) in explicit solvent. The method provides a route to routinely generating machine-learned force fields for reactive molecular systems.
Collapse
Affiliation(s)
- Tom A Young
- Chemistry Research Laboratory, University of Oxford Mansfield Road Oxford OX1 3TA UK
| | - Tristan Johnston-Wood
- Chemistry Research Laboratory, University of Oxford Mansfield Road Oxford OX1 3TA UK
| | - Volker L Deringer
- Department of Chemistry, Inorganic Chemistry Laboratory, University of Oxford Oxford OX1 3QR UK
| | - Fernanda Duarte
- Chemistry Research Laboratory, University of Oxford Mansfield Road Oxford OX1 3TA UK
| |
Collapse
|
13
|
Meli R, Anighoro A, Bodkin MJ, Morris GM, Biggin PC. Learning protein-ligand binding affinity with atomic environment vectors. J Cheminform 2021; 13:59. [PMID: 34391475 PMCID: PMC8364054 DOI: 10.1186/s13321-021-00536-w] [Citation(s) in RCA: 20] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2021] [Accepted: 07/21/2021] [Indexed: 12/03/2022] Open
Abstract
Scoring functions for the prediction of protein-ligand binding affinity have seen renewed interest in recent years when novel machine learning and deep learning methods started to consistently outperform classical scoring functions. Here we explore the use of atomic environment vectors (AEVs) and feed-forward neural networks, the building blocks of several neural network potentials, for the prediction of protein-ligand binding affinity. The AEV-based scoring function, which we term AEScore, is shown to perform as well or better than other state-of-the-art scoring functions on binding affinity prediction, with an RMSE of 1.22 pK units and a Pearson’s correlation coefficient of 0.83 for the CASF-2016 benchmark. However, AEScore does not perform as well in docking and virtual screening tasks, for which it has not been explicitly trained. Therefore, we show that the model can be combined with the classical scoring function AutoDock Vina in the context of \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\Delta$$\end{document}Δ-learning, where corrections to the AutoDock Vina scoring function are learned instead of the protein-ligand binding affinity itself. Combined with AutoDock Vina, \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\Delta$$\end{document}Δ-AEScore has an RMSE of 1.32 pK units and a Pearson’s correlation coefficient of 0.80 on the CASF-2016 benchmark, while retaining the docking and screening power of the underlying classical scoring function.
Collapse
Affiliation(s)
- Rocco Meli
- Department of Biochemistry, University of Oxford, Oxford, UK
| | | | | | | | - Philip C Biggin
- Department of Biochemistry, University of Oxford, Oxford, UK.
| |
Collapse
|
14
|
Yang L, Horton JT, Payne MC, Penfold TJ, Cole DJ. Modeling Molecular Emitters in Organic Light-Emitting Diodes with the Quantum Mechanical Bespoke Force Field. J Chem Theory Comput 2021; 17:5021-5033. [PMID: 34264669 DOI: 10.1021/acs.jctc.1c00135] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Combined molecular dynamics (MD) and quantum mechanics (QM) simulation procedures have gained popularity in modeling the spectral properties of functional organic molecules. However, the potential energy surfaces used to propagate long-time scale dynamics in these simulations are typically described using general, transferable force fields designed for organic molecules in their electronic ground states. These force fields do not typically include spectroscopic data in their training, and importantly, there is no general protocol for including changes in geometry or intermolecular interactions with the environment that may occur upon electronic excitation. In this work, we show that parameters tailored for thermally activated delayed fluorescence (TADF) emitters used in organic light-emitting diodes (OLEDs), in both their ground and electronically excited states, can be readily derived from a small number of QM calculations using the QUBEKit (QUantum mechanical BEspoke toolKit) software and improve the overall accuracy of these simulations.
Collapse
Affiliation(s)
- Lupeng Yang
- TCM Group, Cavendish Laboratory, 19 JJ Thomson Avenue, Cambridge CB3 0HE, United Kingdom
| | - Joshua T Horton
- School of Natural and Environmental Sciences, Newcastle University, Newcastle upon Tyne NE1 7RU, United Kingdom
| | - Michael C Payne
- TCM Group, Cavendish Laboratory, 19 JJ Thomson Avenue, Cambridge CB3 0HE, United Kingdom
| | - Thomas J Penfold
- School of Natural and Environmental Sciences, Newcastle University, Newcastle upon Tyne NE1 7RU, United Kingdom
| | - Daniel J Cole
- School of Natural and Environmental Sciences, Newcastle University, Newcastle upon Tyne NE1 7RU, United Kingdom
| |
Collapse
|
15
|
Sivaraman G, Gallington L, Krishnamoorthy AN, Stan M, Csányi G, Vázquez-Mayagoitia Á, Benmore CJ. Experimentally Driven Automated Machine-Learned Interatomic Potential for a Refractory Oxide. PHYSICAL REVIEW LETTERS 2021; 126:156002. [PMID: 33929252 DOI: 10.1103/physrevlett.126.156002] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/09/2020] [Accepted: 02/17/2021] [Indexed: 06/12/2023]
Abstract
Understanding the structure and properties of refractory oxides is critical for high temperature applications. In this work, a combined experimental and simulation approach uses an automated closed loop via an active learner, which is initialized by x-ray and neutron diffraction measurements, and sequentially improves a machine-learning model until the experimentally predetermined phase space is covered. A multiphase potential is generated for a canonical example of the archetypal refractory oxide, HfO_{2}, by drawing a minimum number of training configurations from room temperature to the liquid state at ∼2900 °C. The method significantly reduces model development time and human effort.
Collapse
Affiliation(s)
- Ganesh Sivaraman
- Leadership Computing Facility, Argonne National Laboratory, Lemont, Illinois 60439, USA
| | - Leighanne Gallington
- X-ray Science Division, Argonne National Laboratory, Lemont, Illinois 60439, USA
| | - Anand Narayanan Krishnamoorthy
- Helmholtz-Institute Munster: Ionics in Energy Storage (IEK-12), Forschungszentrum Julich GmbH, Corrensstrasse 46, 48149 Munster, Germany
| | - Marius Stan
- Applied Materials Division, Argonne National Laboratory, Lemont, Illinois 60439, USA
| | - Gábor Csányi
- Department of Engineering, University of Cambridge, Trumpington Street, Cambridge CB2 1PZ, United Kingdom
| | | | - Chris J Benmore
- X-ray Science Division, Argonne National Laboratory, Lemont, Illinois 60439, USA
| |
Collapse
|
16
|
Allen AEA, Dusson G, Ortner C, Csányi G. Atomic permutationally invariant polynomials for fitting molecular force fields. MACHINE LEARNING-SCIENCE AND TECHNOLOGY 2021. [DOI: 10.1088/2632-2153/abd51e] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
|
17
|
Machine learning, artificial intelligence, and data science breaking into drug design and neglected diseases. WIRES COMPUTATIONAL MOLECULAR SCIENCE 2021. [DOI: 10.1002/wcms.1513] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
|
18
|
Abstract
We introduce new and robust decompositions of mean-field Hartree-Fock and Kohn-Sham density functional theory relying on the use of localized molecular orbitals and physically sound charge population protocols. The new lossless property decompositions, which allow for partitioning one-electron reduced density matrices into either bond-wise or atomic contributions, are compared to alternatives from the literature with regard to both molecular energies and dipole moments. Besides commenting on possible applications as an interpretative tool in the rationalization of certain electronic phenomena, we demonstrate how decomposed mean-field theory makes it possible to expose and amplify compositional features in the context of machine-learned quantum chemistry. This is made possible by improving upon the granularity of the underlying data. On the basis of our preliminary proof-of-concept results, we conjecture that many of the structure-property inferences in existence today may be further refined by efficiently leveraging an increase in dataset complexity and richness.
Collapse
Affiliation(s)
- Janus J Eriksen
- School of Chemistry, University of Bristol, Cantock's Close, Bristol BS8 1TS, United Kingdom
| |
Collapse
|
19
|
Lahey SLJ, Thien Phuc TN, Rowley CN. Benchmarking Force Field and the ANI Neural Network Potentials for the Torsional Potential Energy Surface of Biaryl Drug Fragments. J Chem Inf Model 2020; 60:6258-6268. [PMID: 33263401 DOI: 10.1021/acs.jcim.0c00904] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Abstract
Many drug molecules contain biaryl fragments, resulting in a torsional barrier corresponding to rotation around the bond linking the aryls. The potential energy surfaces of these torsions vary significantly because of steric and electronic effects, ultimately affecting the relative stability of the molecular conformations in the protein-bound and solution states. Simulations of protein-ligand binding require accurate computational models to represent the intramolecular interactions to provide accurate predictions of the structure and dynamics of binding. In this article, we compare four force fields [generalized AMBER force field (GAFF), open force field (OpenFF), CHARMM general force field (CGenFF), optimized potentials for liquid simulations (OPLS)] and two neural network potentials (ANI-2x and ANI-1ccx) for their ability to predict the torsional potential energy surfaces of 88 biaryls extracted from drug fragments. The root mean square deviation (rmsd) over the full potential energy surface and the mean absolute deviation of the torsion rotational barrier height (MADB) relative to high-level ab initio reference data (CCSD(T1)*) were used as the measure of accuracy. Uncertainties in these metrics due to the composition of the data set were estimated using bootstrap analysis. In comparison to high-level ab initio data, ANI-1ccx was most accurate for predicting the barrier height (rmsd: 0.5 ± 0.0 kcal/mol, MADB: 0.8 ± 0.1 kcal/mol), followed closely by ANI-2x (rmsd: 0.5 ± 0.0 kcal/mol, MADB: 1.0 ± 0.2 kcal/mol), then CGenFF (rmsd: 0.8 ± 0.1 kcal/mol, MADB: 1.3 ± 0.1 kcal/mol) and OpenFF (rmsd: 0.7 ± 0.1 kcal/mol, MADB: 1.3 ± 0.1 kcal/mol), then GAFF (rmsd: 1.2 ± 0.2 kcal/mol, MADB: 2.6 ± 0.5 kcal/mol), and finally OPLS (rmsd: 3.6 ± 0.3 kcal/mol, MADB: 3.6 ± 0.3 kcal/mol). Significantly, the neural network potentials (NNPs) are systematically more accurate and more reliable than any of the force fields. As a practical example, the NNP/molecular mechanics method was used to simulate the isomerization of ozanimod, a drug used for multiple sclerosis. Multinanosecond molecular dynamics (MD) simulations in an explicit aqueous solvent were performed, as well as umbrella sampling and adaptive biasing force-enhanced sampling techniques. The rate constant for this isomerization calculated using transition state theory was 4.30 × 10-1 ns-1, which is consistent with direct MD simulations.
Collapse
Affiliation(s)
- Shae-Lynn J Lahey
- Department of Chemistry, Memorial University of Newfoundland, St. John's, Newfoundland and Labrador A1B 3T4, Canada
| | - Tu Nguyen Thien Phuc
- Department of Chemistry, Memorial University of Newfoundland, St. John's, Newfoundland and Labrador A1B 3T4, Canada
| | - Christopher N Rowley
- Department of Chemistry, Memorial University of Newfoundland, St. John's, Newfoundland and Labrador A1B 3T4, Canada
| |
Collapse
|
20
|
Mey ASJS, Allen BK, Macdonald HEB, Chodera JD, Hahn DF, Kuhn M, Michel J, Mobley DL, Naden LN, Prasad S, Rizzi A, Scheen J, Shirts MR, Tresadern G, Xu H. Best Practices for Alchemical Free Energy Calculations [Article v1.0]. LIVING JOURNAL OF COMPUTATIONAL MOLECULAR SCIENCE 2020; 2:18378. [PMID: 34458687 PMCID: PMC8388617 DOI: 10.33011/livecoms.2.1.18378] [Citation(s) in RCA: 114] [Impact Index Per Article: 28.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/13/2023]
Abstract
Alchemical free energy calculations are a useful tool for predicting free energy differences associated with the transfer of molecules from one environment to another. The hallmark of these methods is the use of "bridging" potential energy functions representing alchemical intermediate states that cannot exist as real chemical species. The data collected from these bridging alchemical thermodynamic states allows the efficient computation of transfer free energies (or differences in transfer free energies) with orders of magnitude less simulation time than simulating the transfer process directly. While these methods are highly flexible, care must be taken in avoiding common pitfalls to ensure that computed free energy differences can be robust and reproducible for the chosen force field, and that appropriate corrections are included to permit direct comparison with experimental data. In this paper, we review current best practices for several popular application domains of alchemical free energy calculations performed with equilibrium simulations, in particular relative and absolute small molecule binding free energy calculations to biomolecular targets.
Collapse
Affiliation(s)
- Antonia S. J. S. Mey
- EaStCHEM School of Chemistry, David Brewster Road, Joseph Black Building, The King’s Buildings, Edinburgh, EH9 3FJ, UK
| | | | - Hannah E. Bruce Macdonald
- Computational and Systems Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York NY, USA
| | - John D. Chodera
- Computational and Systems Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York NY, USA
| | - David F. Hahn
- Computational Chemistry, Janssen Research & Development, Turnhoutseweg 30, Beerse B-2340, Belgium
| | - Maximilian Kuhn
- EaStCHEM School of Chemistry, David Brewster Road, Joseph Black Building, The King’s Buildings, Edinburgh, EH9 3FJ, UK
- Cresset, Cambridgeshire, UK
| | - Julien Michel
- EaStCHEM School of Chemistry, David Brewster Road, Joseph Black Building, The King’s Buildings, Edinburgh, EH9 3FJ, UK
| | - David L. Mobley
- Departments of Pharmaceutical Sciences and Chemistry, University of California, Irvine, Irvine, USA
| | - Levi N. Naden
- Molecular Sciences Software Institute, Blacksburg VA, USA
| | | | - Andrea Rizzi
- Silicon Therapeutics, Boston, MA, USA
- Tri-Institutional Training Program in Computational Biology and Medicine, New York, NY, USA
| | - Jenke Scheen
- EaStCHEM School of Chemistry, David Brewster Road, Joseph Black Building, The King’s Buildings, Edinburgh, EH9 3FJ, UK
| | | | - Gary Tresadern
- Computational Chemistry, Janssen Research & Development, Turnhoutseweg 30, Beerse B-2340, Belgium
| | | |
Collapse
|