1
|
Eastman P, Pritchard BP, Chodera JD, Markland TE. Nutmeg and SPICE: Models and Data for Biomolecular Machine Learning. J Chem Theory Comput 2024; 20:8583-8593. [PMID: 39318326 DOI: 10.1021/acs.jctc.4c00794] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/26/2024]
Abstract
We describe version 2 of the SPICE data set, a collection of quantum chemistry calculations for training machine learning potentials. It expands on the original data set by adding much more sampling of chemical space and more data on noncovalent interactions. We train a set of potential energy functions called Nutmeg on it. They are based on the TensorNet architecture. They use a novel mechanism to improve performance on charged and polar molecules, injecting precomputed partial charges into the model to provide a reference for the large-scale charge distribution. Evaluation of the new models shows that they do an excellent job of reproducing energy differences between conformations even on highly charged molecules or ones that are significantly larger than the molecules in the training set. They also produce stable molecular dynamics trajectories and are fast enough to be useful for routine simulation of small molecules.
Collapse
Affiliation(s)
- Peter Eastman
- Department of Chemistry, Stanford University, Stanford, California 94305, United States
| | - Benjamin P Pritchard
- Molecular Sciences Software Institute, Virginia Polytechnic Institute and State University, Blacksburg, Virginia 24060, United States
| | - John D Chodera
- Computational and Systems Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, New York 10065, United States
| | - Thomas E Markland
- Department of Chemistry, Stanford University, Stanford, California 94305, United States
| |
Collapse
|
2
|
Shirani H, Hashemianzadeh SM. Quantum-level machine learning calculations of Levodopa. Comput Biol Chem 2024; 112:108146. [PMID: 39067350 DOI: 10.1016/j.compbiolchem.2024.108146] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2024] [Revised: 06/20/2024] [Accepted: 07/08/2024] [Indexed: 07/30/2024]
Abstract
Many drug molecules contain functional groups, resulting in a torsional barrier corresponding to rotation around the bond linking the fragments. In medicinal chemistry and pharmaceutical sciences, inclusive of drug design studies, the exact calculation of the potential energy surface (PES) of these molecular torsions is extremely important and precious. Machine learning (ML), including deep learning (DL), is currently one of the most rapidly evolving tools in computer-aided drug discovery and molecular simulations. In this work, we used ANI-1x neural network potential as a quantum-level ML to predict the PESs of the L-3,4-dihydroxyphenylalanine (Levodopa) antiparkinsonian drug molecule. The electronic energies and structural parameters calculated by density functional theory (DFT) using the wB97X method and all possible Pople's basis sets indicated the 6-31G(d) basis set, when used with the wB97X functional, exhibits behavior similar to that of the ANI-1x model. The vibrational frequencies investigation showed a linear correlation between DFT and ML data. All ANI-1x calculations were completed quickly in a very short computing time. From this perspective, we expect the ANI-1x dataset applied in this work to be appreciably efficient and effective in computational structure-based drug design studies.
Collapse
Affiliation(s)
- Hossein Shirani
- Molecular Simulation Research Laboratory, Department of Chemistry, Iran University of Science and Technology, P.O. Box 16846-13114, Tehran, Iran.
| | - Seyed Majid Hashemianzadeh
- Molecular Simulation Research Laboratory, Department of Chemistry, Iran University of Science and Technology, P.O. Box 16846-13114, Tehran, Iran.
| |
Collapse
|
3
|
Hou YF, Zhang L, Zhang Q, Ge F, Dral PO. Physics-Informed Active Learning for Accelerating Quantum Chemical Simulations. J Chem Theory Comput 2024. [PMID: 39264419 DOI: 10.1021/acs.jctc.4c00821] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/13/2024]
Abstract
Quantum chemical simulations can be greatly accelerated by constructing machine learning potentials, which is often done using active learning (AL). The usefulness of the constructed potentials is often limited by the high effort required and their insufficient robustness in the simulations. Here, we introduce the end-to-end AL for constructing robust data-efficient potentials with affordable investment of time and resources and minimum human interference. Our AL protocol is based on the physics-informed sampling of training points, automatic selection of initial data, uncertainty quantification, and convergence monitoring. The versatility of this protocol is shown in our implementation of quasi-classical molecular dynamics for simulating vibrational spectra, conformer search of a key biochemical molecule, and time-resolved mechanism of the Diels-Alder reaction. These investigations took us days instead of weeks of pure quantum chemical calculations on a high-performance computing cluster.
Collapse
Affiliation(s)
- Yi-Fan Hou
- State Key Laboratory of Physical Chemistry of Solid Surfaces, College of Chemistry and Chemical Engineering, Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, and Innovation Laboratory for Sciences and Technologies of Energy Materials of Fujian Province (IKKEM), Xiamen University, Xiamen, Fujian 361005, China
| | - Lina Zhang
- State Key Laboratory of Physical Chemistry of Solid Surfaces, College of Chemistry and Chemical Engineering, Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, and Innovation Laboratory for Sciences and Technologies of Energy Materials of Fujian Province (IKKEM), Xiamen University, Xiamen, Fujian 361005, China
| | - Quanhao Zhang
- State Key Laboratory of Physical Chemistry of Solid Surfaces, College of Chemistry and Chemical Engineering, Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, and Innovation Laboratory for Sciences and Technologies of Energy Materials of Fujian Province (IKKEM), Xiamen University, Xiamen, Fujian 361005, China
| | - Fuchun Ge
- State Key Laboratory of Physical Chemistry of Solid Surfaces, College of Chemistry and Chemical Engineering, Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, and Innovation Laboratory for Sciences and Technologies of Energy Materials of Fujian Province (IKKEM), Xiamen University, Xiamen, Fujian 361005, China
| | - Pavlo O Dral
- State Key Laboratory of Physical Chemistry of Solid Surfaces, College of Chemistry and Chemical Engineering, Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, and Innovation Laboratory for Sciences and Technologies of Energy Materials of Fujian Province (IKKEM), Xiamen University, Xiamen, Fujian 361005, China
- Institute of Physics, Faculty of Physics, Astronomy, and Informatics, Nicolaus Copernicus University in Toruń, ul. Grudziądzka 5, Toruń 87-100, Poland
| |
Collapse
|
4
|
Wang J, Wang Y, Zhang H, Yang Z, Liang Z, Shi J, Wang HT, Xing D, Sun J. E(n)-Equivariant cartesian tensor message passing interatomic potential. Nat Commun 2024; 15:7607. [PMID: 39218987 PMCID: PMC11366765 DOI: 10.1038/s41467-024-51886-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2024] [Accepted: 08/16/2024] [Indexed: 09/04/2024] Open
Abstract
Machine learning potential (MLP) has been a popular topic in recent years for its capability to replace expensive first-principles calculations in some large systems. Meanwhile, message passing networks have gained significant attention due to their remarkable accuracy, and a wave of message passing networks based on Cartesian coordinates has emerged. However, the information of the node in these models is usually limited to scalars, and vectors. In this work, we propose High-order Tensor message Passing interatomic Potential (HotPP), an E(n) equivariant message passing neural network that extends the node embedding and message to an arbitrary order tensor. By performing some basic equivariant operations, high order tensors can be coupled very simply and thus the model can make direct predictions of high-order tensors such as dipole moments and polarizabilities without any modifications. The tests in several datasets show that HotPP not only achieves high accuracy in predicting target properties, but also successfully performs tasks such as calculating phonon spectra, infrared spectra, and Raman spectra, demonstrating its potential as a tool for future research.
Collapse
Affiliation(s)
- Junjie Wang
- National Laboratory of Solid State Microstructures, School of Physics and Collaborative Innovation Center of Advanced Microstructures, Nanjing University, Nanjing, 210093, China
| | - Yong Wang
- National Laboratory of Solid State Microstructures, School of Physics and Collaborative Innovation Center of Advanced Microstructures, Nanjing University, Nanjing, 210093, China
- Department of Chemistry, Princeton University, Princeton, NJ, 08544, USA
| | - Haoting Zhang
- National Laboratory of Solid State Microstructures, School of Physics and Collaborative Innovation Center of Advanced Microstructures, Nanjing University, Nanjing, 210093, China
| | - Ziyang Yang
- National Laboratory of Solid State Microstructures, School of Physics and Collaborative Innovation Center of Advanced Microstructures, Nanjing University, Nanjing, 210093, China
| | - Zhixin Liang
- National Laboratory of Solid State Microstructures, School of Physics and Collaborative Innovation Center of Advanced Microstructures, Nanjing University, Nanjing, 210093, China
| | - Jiuyang Shi
- National Laboratory of Solid State Microstructures, School of Physics and Collaborative Innovation Center of Advanced Microstructures, Nanjing University, Nanjing, 210093, China
| | - Hui-Tian Wang
- National Laboratory of Solid State Microstructures, School of Physics and Collaborative Innovation Center of Advanced Microstructures, Nanjing University, Nanjing, 210093, China
| | - Dingyu Xing
- National Laboratory of Solid State Microstructures, School of Physics and Collaborative Innovation Center of Advanced Microstructures, Nanjing University, Nanjing, 210093, China
| | - Jian Sun
- National Laboratory of Solid State Microstructures, School of Physics and Collaborative Innovation Center of Advanced Microstructures, Nanjing University, Nanjing, 210093, China.
| |
Collapse
|
5
|
Glick ZL, Metcalf DP, Glick CS, Spronk SA, Koutsoukas A, Cheney DL, Sherrill CD. A physics-aware neural network for protein-ligand interactions with quantum chemical accuracy. Chem Sci 2024; 15:13313-13324. [PMID: 39183910 PMCID: PMC11339967 DOI: 10.1039/d4sc01029a] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2024] [Accepted: 07/09/2024] [Indexed: 08/27/2024] Open
Abstract
Quantifying intermolecular interactions with quantum chemistry (QC) is useful for many chemical problems, including understanding the nature of protein-ligand interactions. Unfortunately, QC computations on protein-ligand systems are too computationally expensive for most use cases. The flourishing field of machine-learned (ML) potentials is a promising solution, but it is limited by an inability to easily capture long range, non-local interactions. In this work we develop an atomic-pairwise neural network (AP-Net) specialized for modeling intermolecular interactions. This model benefits from a number of physical constraints, including a two-component equivariant message passing neural network architecture that predicts interaction energies via an intermediate prediction of monomer electron densities. The AP-Net model is trained on a comprehensive dataset composed of paired ligand and protein fragments. This model accurately predicts QC-quality interaction energies of protein-ligand systems at a computational cost reduced by orders of magnitude. Applications of the AP-Net model to molecular crystal structure prediction are explored, as well as limitations in modeling highly polarizable systems.
Collapse
Affiliation(s)
- Zachary L Glick
- School of Chemistry and Biochemistry, School of Computational Science and Engineering, Georgia Institute of Technology Atlanta Georgia 30332-0400 USA
| | - Derek P Metcalf
- School of Chemistry and Biochemistry, School of Computational Science and Engineering, Georgia Institute of Technology Atlanta Georgia 30332-0400 USA
| | - Caroline S Glick
- School of Chemistry and Biochemistry, School of Computational Science and Engineering, Georgia Institute of Technology Atlanta Georgia 30332-0400 USA
| | - Steven A Spronk
- Molecular Structure and Design, Bristol Myers Squibb Company P.O. Box 5400 Princeton New Jersey 08543 USA
| | - Alexios Koutsoukas
- Molecular Structure and Design, Bristol Myers Squibb Company P.O. Box 5400 Princeton New Jersey 08543 USA
| | - Daniel L Cheney
- Molecular Structure and Design, Bristol Myers Squibb Company P.O. Box 5400 Princeton New Jersey 08543 USA
| | - C David Sherrill
- School of Chemistry and Biochemistry, School of Computational Science and Engineering, Georgia Institute of Technology Atlanta Georgia 30332-0400 USA
| |
Collapse
|
6
|
Behara PK, Jang H, Horton JT, Gokey T, Dotson DL, Boothroyd S, Bayly CI, Cole DJ, Wang LP, Mobley DL. Benchmarking Quantum Mechanical Levels of Theory for Valence Parametrization in Force Fields. J Phys Chem B 2024; 128:7888-7902. [PMID: 39087913 PMCID: PMC11331531 DOI: 10.1021/acs.jpcb.4c03167] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2024] [Revised: 07/09/2024] [Accepted: 07/15/2024] [Indexed: 08/02/2024]
Abstract
A wide range of density functional methods and basis sets are available to derive the electronic structure and properties of molecules. Quantum mechanical calculations are too computationally intensive for routine simulation of molecules in the condensed phase, prompting the development of computationally efficient force fields based on quantum mechanical data. Parametrizing general force fields, which cover a vast chemical space, necessitates the generation of sizable quantum mechanical data sets with optimized geometries and torsion scans. To achieve this efficiently, choosing a quantum mechanical method that balances computational cost and accuracy is crucial. In this study, we seek to assess the accuracy of quantum mechanical theory for specific properties such as conformer energies and torsion energetics. To comprehensively evaluate various methods, we focus on a representative set of 59 diverse small molecules, comparing approximately 25 combinations of functional and basis sets against the reference level coupled cluster calculations at the complete basis set limit.
Collapse
Affiliation(s)
- Pavan Kumar Behara
- Center
for Neurotherapeutics, University of California, Irvine, California 92697, United States
| | - Hyesu Jang
- Chemistry
Department, University of California at
Davis, Davis, California 95616, United States
- OpenEye
Scientific Software, Santa
Fe, New Mexico 87508, United States
| | - Joshua T. Horton
- School
of Natural and Environmental Sciences, Newcastle
University, Newcastle
upon Tyne NE1 7RU, U.K.
| | - Trevor Gokey
- Department
of Chemistry, University of California, Irvine, California 92697, United States
| | - David L. Dotson
- The
Open Force Field Initiative, Open Molecular Software Foundation, Davis, California 95616, United States
- Datryllic
LLC, Phoenix, Arizona 85003, United States
| | | | | | - Daniel J. Cole
- School
of Natural and Environmental Sciences, Newcastle
University, Newcastle
upon Tyne NE1 7RU, U.K.
| | - Lee-Ping Wang
- Chemistry
Department, University of California at
Davis, Davis, California 95616, United States
| | - David L. Mobley
- Center
for Neurotherapeutics, University of California, Irvine, California 92697, United States
- Department
of Chemistry, University of California, Irvine, California 92697, United States
| |
Collapse
|
7
|
Frank JT, Unke OT, Müller KR, Chmiela S. A Euclidean transformer for fast and stable machine learned force fields. Nat Commun 2024; 15:6539. [PMID: 39107296 PMCID: PMC11303804 DOI: 10.1038/s41467-024-50620-6] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2023] [Accepted: 07/10/2024] [Indexed: 08/10/2024] Open
Abstract
Recent years have seen vast progress in the development of machine learned force fields (MLFFs) based on ab-initio reference calculations. Despite achieving low test errors, the reliability of MLFFs in molecular dynamics (MD) simulations is facing growing scrutiny due to concerns about instability over extended simulation timescales. Our findings suggest a potential connection between robustness to cumulative inaccuracies and the use of equivariant representations in MLFFs, but the computational cost associated with these representations can limit this advantage in practice. To address this, we propose a transformer architecture called SO3KRATES that combines sparse equivariant representations (Euclidean variables) with a self-attention mechanism that separates invariant and equivariant information, eliminating the need for expensive tensor products. SO3KRATES achieves a unique combination of accuracy, stability, and speed that enables insightful analysis of quantum properties of matter on extended time and system size scales. To showcase this capability, we generate stable MD trajectories for flexible peptides and supra-molecular structures with hundreds of atoms. Furthermore, we investigate the PES topology for medium-sized chainlike molecules (e.g., small peptides) by exploring thousands of minima. Remarkably, SO3KRATES demonstrates the ability to strike a balance between the conflicting demands of stability and the emergence of new minimum-energy conformations beyond the training data, which is crucial for realistic exploration tasks in the field of biochemistry.
Collapse
Affiliation(s)
- J Thorben Frank
- Machine Learning Group, TU Berlin, Berlin, Germany
- BIFOLD, Berlin Institute for the Foundations of Learning and Data, Berlin, Germany
| | | | - Klaus-Robert Müller
- Machine Learning Group, TU Berlin, Berlin, Germany.
- BIFOLD, Berlin Institute for the Foundations of Learning and Data, Berlin, Germany.
- Google DeepMind, Berlin, Germany.
- Department of Artificial Intelligence, Korea University, Seoul, Korea.
- Max Planck Institut für Informatik, Saarbrücken, Germany.
| | - Stefan Chmiela
- Machine Learning Group, TU Berlin, Berlin, Germany.
- BIFOLD, Berlin Institute for the Foundations of Learning and Data, Berlin, Germany.
| |
Collapse
|
8
|
Plé T, Adjoua O, Lagardère L, Piquemal JP. FeNNol: An efficient and flexible library for building force-field-enhanced neural network potentials. J Chem Phys 2024; 161:042502. [PMID: 39051830 DOI: 10.1063/5.0217688] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2024] [Accepted: 06/28/2024] [Indexed: 07/27/2024] Open
Abstract
Neural network interatomic potentials (NNPs) have recently proven to be powerful tools to accurately model complex molecular systems while bypassing the high numerical cost of ab initio molecular dynamics simulations. In recent years, numerous advances in model architectures as well as the development of hybrid models combining machine-learning (ML) with more traditional, physically motivated, force-field interactions have considerably increased the design space of ML potentials. In this paper, we present FeNNol, a new library for building, training, and running force-field-enhanced neural network potentials. It provides a flexible and modular system for building hybrid models, allowing us to easily combine state-of-the-art embeddings with ML-parameterized physical interaction terms without the need for explicit programming. Furthermore, FeNNol leverages the automatic differentiation and just-in-time compilation features of the Jax Python library to enable fast evaluation of NNPs, shrinking the performance gap between ML potentials and standard force-fields. This is demonstrated with the popular ANI-2x model reaching simulation speeds nearly on par with the AMOEBA polarizable force-field on commodity GPUs (graphics processing units). We hope that FeNNol will facilitate the development and application of new hybrid NNP architectures for a wide range of molecular simulation problems.
Collapse
Affiliation(s)
- Thomas Plé
- Sorbonne Université, LCT, UMR 7616 CNRS, 75005 Paris, France
| | - Olivier Adjoua
- Sorbonne Université, LCT, UMR 7616 CNRS, 75005 Paris, France
| | - Louis Lagardère
- Sorbonne Université, LCT, UMR 7616 CNRS, 75005 Paris, France
| | | |
Collapse
|
9
|
Martire S, Decherchi S, Cavalli A. OBIWAN: An Element-Wise Scalable Feed-Forward Neural Network Potential. J Chem Theory Comput 2024; 20:6287-6302. [PMID: 38978155 DOI: 10.1021/acs.jctc.4c00342] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/10/2024]
Abstract
Estimating the potential energy of a molecular system at a quantum level of theory is a task of paramount importance in computational chemistry. The often employed density functional theory approach allows one to accomplish this task, yet most often at significant computational costs. This prompted the community to develop so-called machine learning potentials to achieve near-quantum accuracy at molecular mechanics computational cost. In this paper, we introduce OBIWAN, a feed-forward neural network that bears some relevant structural properties that also led to the definition of a new kind of general-purpose neural network layer. Its featurization process scales efficiently with newly added atomic species. This allows one to seamlessly add new atom types without requiring to change the topology of the network. Also, this allows one to train on new data sets leveraging a previously trained OBIWAN, hence converging very quickly. This avoids training from scratch and renders the approach more compliant with a green computing perspective.
Collapse
Affiliation(s)
- Stefano Martire
- Department of Pharmacy and Biotechnology, University of Bologna, Via Belmeloro 6, Bologna 40126, Italy
- Computational and Chemical Biology, Fondazione Istituto Italiano di Tecnologia, Via Morego 30, Genoa 16163, Italy
| | - Sergio Decherchi
- Data Science and Computation Facility, Fondazione Istituto Italiano di Tecnologia, Via Morego 30, Genoa 16163, Italy
| | - Andrea Cavalli
- Computational and Chemical Biology, Fondazione Istituto Italiano di Tecnologia, Via Morego 30, Genoa 16163, Italy
- Centre Européen de Calcul Atomique et Moléculaire, Ecole Polytechnique Fédérale de Lausanne, Avenue de Forel 3, Lausanne 1015, Switzerland
| |
Collapse
|
10
|
Cheng Z, Bi H, Liu S, Chen J, Misquitta AJ, Yu K. Developing a Differentiable Long-Range Force Field for Proteins with E(3) Neural Network-Predicted Asymptotic Parameters. J Chem Theory Comput 2024; 20:5598-5608. [PMID: 38888427 DOI: 10.1021/acs.jctc.4c00337] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/20/2024]
Abstract
Accurately describing long-range interactions is a significant challenge in molecular dynamics (MD) simulations of proteins. High-quality long-range potential is also an important component of the range-separated machine learning force field. This study introduces a comprehensive asymptotic parameter database encompassing atomic multipole moments, polarizabilities, and dispersion coefficients. Leveraging active learning, our database comprehensively represents protein fragments with up to 8 heavy atoms, capturing their conformational diversity with merely 78,000 data points. Additionally, the E(3) neural network (E3NN) is employed to predict the asymptotic parameters directly from the local geometry. The E3NN models demonstrate exceptional accuracy and transferability across all asymptotic parameters, achieving an R2 of 0.999 for both protein fragments and 20 amino acid dipeptide test sets. The long-range electrostatic and dispersion energies can be obtained using the E3NN-predicted parameters, with an error of 0.07 and 0.02 kcal/mol, respectively, when compared to symmetry-adapted perturbation theory (SAPT). Therefore, our force fields demonstrate the capability to accurately describe long-range interactions in proteins, paving the way for next-generation protein force fields.
Collapse
Affiliation(s)
- Zheng Cheng
- School of Mathematical Sciences, Peking University, Beijing 100871, China
- AI for Science Institute, Beijing 100084, P. R. China
| | - Hangrui Bi
- School of Mathematical Sciences, Peking University, Beijing 100871, China
- DP Technology, Beijing 100080, P. R. China
| | - Siyuan Liu
- DP Technology, Beijing 100080, P. R. China
| | - Junmin Chen
- Tsinghua-Berkeley Shenzhen Institute, Shenzhen 518055, Guangdong, P. R. China
- Tsinghua Shenzhen International Graduate School, Shenzhen 518055, Guangdong, P. R. China
| | - Alston J Misquitta
- School of Physics and Astronomy, Queen Mary, University of London, London E1 4NS, U.K
| | - Kuang Yu
- Tsinghua-Berkeley Shenzhen Institute, Shenzhen 518055, Guangdong, P. R. China
- Tsinghua Shenzhen International Graduate School, Shenzhen 518055, Guangdong, P. R. China
| |
Collapse
|
11
|
Medrano Sandonas L, Van Rompaey D, Fallani A, Hilfiker M, Hahn D, Perez-Benito L, Verhoeven J, Tresadern G, Kurt Wegner J, Ceulemans H, Tkatchenko A. Dataset for quantum-mechanical exploration of conformers and solvent effects in large drug-like molecules. Sci Data 2024; 11:742. [PMID: 38972891 PMCID: PMC11228031 DOI: 10.1038/s41597-024-03521-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2024] [Accepted: 06/13/2024] [Indexed: 07/09/2024] Open
Abstract
We here introduce the Aquamarine (AQM) dataset, an extensive quantum-mechanical (QM) dataset that contains the structural and electronic information of 59,783 low-and high-energy conformers of 1,653 molecules with a total number of atoms ranging from 2 to 92 (mean: 50.9), and containing up to 54 (mean: 28.2) non-hydrogen atoms. To gain insights into the solvent effects as well as collective dispersion interactions for drug-like molecules, we have performed QM calculations supplemented with a treatment of many-body dispersion (MBD) interactions of structures and properties in the gas phase and implicit water. Thus, AQM contains over 40 global and local physicochemical properties (including ground-state and response properties) per conformer computed at the tightly converged PBE0+MBD level of theory for gas-phase molecules, whereas PBE0+MBD with the modified Poisson-Boltzmann (MPB) model of water was used for solvated molecules. By addressing both molecule-solvent and dispersion interactions, AQM dataset can serve as a challenging benchmark for state-of-the-art machine learning methods for property modeling and de novo generation of large (solvated) molecules with pharmaceutical and biological relevance.
Collapse
Affiliation(s)
- Leonardo Medrano Sandonas
- Department of Physics and Materials Science, University of Luxembourg, L-1511, Luxembourg City, Luxembourg.
- Institute for Materials Science and Max Bergmann Center of Biomaterials, TU Dresden, 01062, Dresden, Germany.
| | - Dries Van Rompaey
- Drug Discovery Data Sciences (D3S), Janssen Pharmaceutica NV, Turnhoutseweg 30, 2340, Beerse, Belgium.
| | - Alessio Fallani
- Department of Physics and Materials Science, University of Luxembourg, L-1511, Luxembourg City, Luxembourg
- Drug Discovery Data Sciences (D3S), Janssen Pharmaceutica NV, Turnhoutseweg 30, 2340, Beerse, Belgium
| | - Mathias Hilfiker
- Department of Physics and Materials Science, University of Luxembourg, L-1511, Luxembourg City, Luxembourg
| | - David Hahn
- Computational Chemistry, Janssen Pharmaceutica NV, Turnhoutseweg 30, 2340, Beerse, Belgium
| | - Laura Perez-Benito
- Computational Chemistry, Janssen Pharmaceutica NV, Turnhoutseweg 30, 2340, Beerse, Belgium
| | - Jonas Verhoeven
- Drug Discovery Data Sciences (D3S), Janssen Pharmaceutica NV, Turnhoutseweg 30, 2340, Beerse, Belgium
| | - Gary Tresadern
- Computational Chemistry, Janssen Pharmaceutica NV, Turnhoutseweg 30, 2340, Beerse, Belgium
| | - Joerg Kurt Wegner
- Drug Discovery Data Sciences (D3S), Janssen Pharmaceutica NV, Turnhoutseweg 30, 2340, Beerse, Belgium
- Drug Discovery Data Sciences (D3S), Johnson & Johnson Innovative Medicine, 301 Binney Street, MA 02142, Cambridge, USA
| | - Hugo Ceulemans
- Drug Discovery Data Sciences (D3S), Janssen Pharmaceutica NV, Turnhoutseweg 30, 2340, Beerse, Belgium
| | - Alexandre Tkatchenko
- Department of Physics and Materials Science, University of Luxembourg, L-1511, Luxembourg City, Luxembourg.
| |
Collapse
|
12
|
Giese TJ, Zeng J, Lerew L, McCarthy E, Tao Y, Ekesan Ş, York DM. Software Infrastructure for Next-Generation QM/MM-ΔMLP Force Fields. J Phys Chem B 2024; 128:6257-6271. [PMID: 38905451 PMCID: PMC11414325 DOI: 10.1021/acs.jpcb.4c01466] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/23/2024]
Abstract
We present software infrastructure for the design and testing of new quantum mechanical/molecular mechanical and machine-learning potential (QM/MM-ΔMLP) force fields for a wide range of applications. The software integrates Amber's molecular dynamics simulation capabilities with fast, approximate quantum models in the xtb package and machine-learning potential corrections in DeePMD-kit. The xtb package implements the recently developed density-functional tight-binding QM models with multipolar electrostatics and density-dependent dispersion (GFN2-xTB), and the interface with Amber enables their use in periodic boundary QM/MM simulations with linear-scaling QM/MM particle-mesh Ewald electrostatics. The accuracy of the semiempirical models is enhanced by including machine-learning correction potentials (ΔMLPs) enabled through an interface with the DeePMD-kit software. The goal of this paper is to present and validate the implementation of this software infrastructure in molecular dynamics and free energy simulations. The utility of the new infrastructure is demonstrated in proof-of-concept example applications. The software elements presented here are open source and freely available. Their interface provides a powerful enabling technology for the design of new QM/MM-ΔMLP models for studying a wide range of problems, including biomolecular reactivity and protein-ligand binding.
Collapse
Affiliation(s)
- Timothy J Giese
- Laboratory for Biomolecular Simulation Research, Institute for Quantitative Biomedicine and Department of Chemistry and Chemical Biology, Rutgers University, Piscataway, New Jersey 08854, United States
| | - Jinzhe Zeng
- Laboratory for Biomolecular Simulation Research, Institute for Quantitative Biomedicine and Department of Chemistry and Chemical Biology, Rutgers University, Piscataway, New Jersey 08854, United States
| | - Lauren Lerew
- Laboratory for Biomolecular Simulation Research, Institute for Quantitative Biomedicine and Department of Chemistry and Chemical Biology, Rutgers University, Piscataway, New Jersey 08854, United States
| | - Erika McCarthy
- Laboratory for Biomolecular Simulation Research, Institute for Quantitative Biomedicine and Department of Chemistry and Chemical Biology, Rutgers University, Piscataway, New Jersey 08854, United States
| | - Yujun Tao
- Laboratory for Biomolecular Simulation Research, Institute for Quantitative Biomedicine and Department of Chemistry and Chemical Biology, Rutgers University, Piscataway, New Jersey 08854, United States
| | - Şölen Ekesan
- Laboratory for Biomolecular Simulation Research, Institute for Quantitative Biomedicine and Department of Chemistry and Chemical Biology, Rutgers University, Piscataway, New Jersey 08854, United States
| | - Darrin M York
- Laboratory for Biomolecular Simulation Research, Institute for Quantitative Biomedicine and Department of Chemistry and Chemical Biology, Rutgers University, Piscataway, New Jersey 08854, United States
| |
Collapse
|
13
|
Fu W, Mo Y, Xiao Y, Liu C, Zhou F, Wang Y, Zhou J, Zhang YJ. Enhancing Molecular Energy Predictions with Physically Constrained Modifications to the Neural Network Potential. J Chem Theory Comput 2024; 20:4533-4544. [PMID: 38828925 DOI: 10.1021/acs.jctc.3c01181] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/05/2024]
Abstract
Exclusively prioritizing the precision of energy prediction frequently proves inadequate in satisfying multifaceted requirements. A heightened focus is warranted on assessing the rationality of potential energy curves predicted by machine learning-based force fields (MLFFs), alongside evaluating the pragmatic utility of these MLFFs. This study introduces SWANI, an optimized neural network potential stemming from the ANI framework. Through the incorporation of supplementary physical constraints, SWANI aligns more cohesively with chemical expectations, yielding rational potential energy profiles. It also exhibits superior predictive precision compared with that of the ANI model. Additionally, a comprehensive comparison is conducted between SWANI and a prominent graph neural network-based model. The findings indicate that SWANI outperforms the latter, particularly for molecules exceeding the dimensions of the training set. This outcome underscores SWANI's exceptional capacity for generalization and its proficiency in handling larger molecular systems.
Collapse
Affiliation(s)
- Weiqiang Fu
- Beijing StoneWise Technology Co., Ltd., Haidian Street 15, Haidian District, Beijing 100080, China
| | - Yujie Mo
- Beijing StoneWise Technology Co., Ltd., Haidian Street 15, Haidian District, Beijing 100080, China
| | - Yi Xiao
- Beijing StoneWise Technology Co., Ltd., Haidian Street 15, Haidian District, Beijing 100080, China
| | - Chang Liu
- Beijing StoneWise Technology Co., Ltd., Haidian Street 15, Haidian District, Beijing 100080, China
| | - Feng Zhou
- Beijing StoneWise Technology Co., Ltd., Haidian Street 15, Haidian District, Beijing 100080, China
| | - Yang Wang
- Beijing StoneWise Technology Co., Ltd., Haidian Street 15, Haidian District, Beijing 100080, China
| | - Jielong Zhou
- Beijing StoneWise Technology Co., Ltd., Haidian Street 15, Haidian District, Beijing 100080, China
| | - Yingsheng J Zhang
- Beijing StoneWise Technology Co., Ltd., Haidian Street 15, Haidian District, Beijing 100080, China
| |
Collapse
|
14
|
Pelaez RP, Simeon G, Galvelis R, Mirarchi A, Eastman P, Doerr S, Thölke P, Markland TE, De Fabritiis G. TorchMD-Net 2.0: Fast Neural Network Potentials for Molecular Simulations. J Chem Theory Comput 2024; 20:4076-4087. [PMID: 38743033 DOI: 10.1021/acs.jctc.4c00253] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/16/2024]
Abstract
Achieving a balance between computational speed, prediction accuracy, and universal applicability in molecular simulations has been a persistent challenge. This paper presents substantial advancements in TorchMD-Net software, a pivotal step forward in the shift from conventional force fields to neural network-based potentials. The evolution of TorchMD-Net into a more comprehensive and versatile framework is highlighted, incorporating cutting-edge architectures such as TensorNet. This transformation is achieved through a modular design approach, encouraging customized applications within the scientific community. The most notable enhancement is a significant improvement in computational efficiency, achieving a very remarkable acceleration in the computation of energy and forces for TensorNet models, with performance gains ranging from 2× to 10× over previous, nonoptimized, iterations. Other enhancements include highly optimized neighbor search algorithms that support periodic boundary conditions and smooth integration with existing molecular dynamics frameworks. Additionally, the updated version introduces the capability to integrate physical priors, further enriching its application spectrum and utility in research. The software is available at https://github.com/torchmd/torchmd-net.
Collapse
Affiliation(s)
- Raul P Pelaez
- Computational Science Laboratory, Universitat Pompeu Fabra, Barcelona Biomedical Research Park (PRBB), C Dr. Aiguader 88, 08003 Barcelona, Spain
| | - Guillem Simeon
- Computational Science Laboratory, Universitat Pompeu Fabra, Barcelona Biomedical Research Park (PRBB), C Dr. Aiguader 88, 08003 Barcelona, Spain
| | - Raimondas Galvelis
- Computational Science Laboratory, Universitat Pompeu Fabra, Barcelona Biomedical Research Park (PRBB), C Dr. Aiguader 88, 08003 Barcelona, Spain
- Acellera Labs, C Dr Trueta 183, 08005 Barcelona, Spain
| | - Antonio Mirarchi
- Computational Science Laboratory, Universitat Pompeu Fabra, Barcelona Biomedical Research Park (PRBB), C Dr. Aiguader 88, 08003 Barcelona, Spain
| | - Peter Eastman
- Department of Chemistry, Stanford University, Stanford, California 94305, United States
| | - Stefan Doerr
- Acellera Labs, C Dr Trueta 183, 08005 Barcelona, Spain
| | | | - Thomas E Markland
- Department of Chemistry, Stanford University, Stanford, California 94305, United States
| | - Gianni De Fabritiis
- Computational Science Laboratory, Universitat Pompeu Fabra, Barcelona Biomedical Research Park (PRBB), C Dr. Aiguader 88, 08003 Barcelona, Spain
- Acellera Labs, C Dr Trueta 183, 08005 Barcelona, Spain
- Institució Catalana de Recerca i Estudis Avançats (ICREA), Passeig Lluis Companys 23, 08010 Barcelona, Spain
| |
Collapse
|
15
|
Grassano JS, Pickering I, Roitberg AE, González Lebrero MC, Estrin DA, Semelak JA. Assessment of Embedding Schemes in a Hybrid Machine Learning/Classical Potentials (ML/MM) Approach. J Chem Inf Model 2024; 64:4047-4058. [PMID: 38710065 DOI: 10.1021/acs.jcim.4c00478] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/08/2024]
Abstract
Machine learning (ML) methods have reached high accuracy levels for the prediction of in vacuo molecular properties. However, the simulation of large systems solely through ML methods (such as those based on neural network potentials) is still a challenge. In this context, one of the most promising frameworks for integrating ML schemes in the simulation of complex molecular systems are the so-called ML/MM methods. These multiscale approaches combine ML methods with classical force fields (MM), in the same spirit as the successful hybrid quantum mechanics-molecular mechanics methods (QM/MM). The key issue for such ML/MM methods is an adequate description of the coupling between the region of the system described by ML and the region described at the MM level. In the context of QM/MM schemes, the main ingredient of the interaction is electrostatic, and the state of the art is the so-called electrostatic-embedding. In this study, we analyze the quality of simpler mechanical embedding-based approaches, specifically focusing on their application within a ML/MM framework utilizing atomic partial charges derived in vacuo. Taking as reference electrostatic embedding calculations performed at a QM(DFT)/MM level, we explore different atomic charges schemes, as well as a polarization correction computed using atomic polarizabilites. Our benchmark data set comprises a set of about 80k small organic structures from the ANI-1x and ANI-2x databases, solvated in water. The results suggest that the minimal basis iterative stockholder (MBIS) atomic charges yield the best agreement with the reference coupling energy. Remarkable enhancements are achieved by including a simple polarization correction.
Collapse
Affiliation(s)
- Juan S Grassano
- Facultad de Ciencias Exactas y Naturales, Departamento de Química Inorgánica, Analítica y Química Física, Universidad de Buenos Aires, Intendente Güiraldes 2160, Buenos Aires C1428EHA, Argentina
- CONICET─Universidad de Buenos Aires, Instituto de Química-Física de los Materiales, Medio Ambiente y Energía (INQUIMAE), Ciudad Universitaria, Pabellón 2, Buenos Aires C1428EHA, Argentina
| | - Ignacio Pickering
- Department of Chemistry, University of Florida, Gainesville, Florida 32611, United States
| | - Adrian E Roitberg
- CONICET─Universidad de Buenos Aires, Instituto de Química-Física de los Materiales, Medio Ambiente y Energía (INQUIMAE), Ciudad Universitaria, Pabellón 2, Buenos Aires C1428EHA, Argentina
- Department of Chemistry, University of Florida, Gainesville, Florida 32611, United States
| | - Mariano C González Lebrero
- Facultad de Ciencias Exactas y Naturales, Departamento de Química Inorgánica, Analítica y Química Física, Universidad de Buenos Aires, Intendente Güiraldes 2160, Buenos Aires C1428EHA, Argentina
- CONICET─Universidad de Buenos Aires, Instituto de Química-Física de los Materiales, Medio Ambiente y Energía (INQUIMAE), Ciudad Universitaria, Pabellón 2, Buenos Aires C1428EHA, Argentina
| | - Dario A Estrin
- Facultad de Ciencias Exactas y Naturales, Departamento de Química Inorgánica, Analítica y Química Física, Universidad de Buenos Aires, Intendente Güiraldes 2160, Buenos Aires C1428EHA, Argentina
- CONICET─Universidad de Buenos Aires, Instituto de Química-Física de los Materiales, Medio Ambiente y Energía (INQUIMAE), Ciudad Universitaria, Pabellón 2, Buenos Aires C1428EHA, Argentina
| | - Jonathan A Semelak
- Facultad de Ciencias Exactas y Naturales, Departamento de Química Inorgánica, Analítica y Química Física, Universidad de Buenos Aires, Intendente Güiraldes 2160, Buenos Aires C1428EHA, Argentina
- CONICET─Universidad de Buenos Aires, Instituto de Química-Física de los Materiales, Medio Ambiente y Energía (INQUIMAE), Ciudad Universitaria, Pabellón 2, Buenos Aires C1428EHA, Argentina
| |
Collapse
|
16
|
Pelaez RP, Simeon G, Galvelis R, Mirarchi A, Eastman P, Doerr S, Thölke P, Markland TE, De Fabritiis G. TorchMD-Net 2.0: Fast Neural Network Potentials for Molecular Simulations. ARXIV 2024:arXiv:2402.17660v3. [PMID: 38463504 PMCID: PMC10925388] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Subscribe] [Scholar Register] [Indexed: 03/12/2024]
Abstract
Achieving a balance between computational speed, prediction accuracy, and universal applicability in molecular simulations has been a persistent challenge. This paper presents substantial advancements in the TorchMD-Net software, a pivotal step forward in the shift from conventional force fields to neural network-based potentials. The evolution of TorchMD-Net into a more comprehensive and versatile framework is highlighted, incorporating cutting-edge architectures such as TensorNet. This transformation is achieved through a modular design approach, encouraging customized applications within the scientific community. The most notable enhancement is a significant improvement in computational efficiency, achieving a very remarkable acceleration in the computation of energy and forces for Tensor-Net models, with performance gains ranging from 2x to 10x over previous, non-optimized, iterations. Other enhancements include highly optimized neighbor search algorithms that support periodic boundary conditions and smooth integration with existing molecular dynamics frameworks. Additionally, the updated version introduces the capability to integrate physical priors, further enriching its application spectrum and utility in research. The software is available at https://github.com/torchmd/torchmd-net.
Collapse
Affiliation(s)
- Raul P Pelaez
- Computational Science Laboratory, Universitat Pompeu Fabra, Barcelona Biomedical Research Park (PRBB), C Dr. Aiguader 88, 08003 Barcelona, Spain
| | - Guillem Simeon
- Computational Science Laboratory, Universitat Pompeu Fabra, Barcelona Biomedical Research Park (PRBB), C Dr. Aiguader 88, 08003 Barcelona, Spain
| | - Raimondas Galvelis
- Computational Science Laboratory, Universitat Pompeu Fabra, Barcelona Biomedical Research Park (PRBB), C Dr. Aiguader 88, 08003 Barcelona, Spain
- Acellera Labs, C Dr Trueta 183, 08005, Barcelona, Spain
| | - Antonio Mirarchi
- Computational Science Laboratory, Universitat Pompeu Fabra, Barcelona Biomedical Research Park (PRBB), C Dr. Aiguader 88, 08003 Barcelona, Spain
| | - Peter Eastman
- Department of Chemistry, Stanford University, Stanford, CA 94305, USA
| | - Stefan Doerr
- Acellera Labs, C Dr Trueta 183, 08005, Barcelona, Spain
| | | | - Thomas E Markland
- Department of Chemistry, Stanford University, Stanford, CA 94305, USA
| | - Gianni De Fabritiis
- Computational Science Laboratory, Universitat Pompeu Fabra, Barcelona Biomedical Research Park (PRBB), C Dr. Aiguader 88, 08003 Barcelona, Spain
- Acellera Labs, C Dr Trueta 183, 08005, Barcelona, Spain
- Institució Catalana de Recerca i Estudis Avançats (ICREA), Passeig Lluis Companys 23, 08010 Barcelona, Spain
| |
Collapse
|
17
|
Wang G, Wang C, Zhang X, Li Z, Zhou J, Sun Z. Machine learning interatomic potential: Bridge the gap between small-scale models and realistic device-scale simulations. iScience 2024; 27:109673. [PMID: 38646181 PMCID: PMC11033164 DOI: 10.1016/j.isci.2024.109673] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/23/2024] Open
Abstract
Machine learning interatomic potential (MLIP) overcomes the challenges of high computational costs in density-functional theory and the relatively low accuracy in classical large-scale molecular dynamics, facilitating more efficient and precise simulations in materials research and design. In this review, the current state of the four essential stages of MLIP is discussed, including data generation methods, material structure descriptors, six unique machine learning algorithms, and available software. Furthermore, the applications of MLIP in various fields are investigated, notably in phase-change memory materials, structure searching, material properties predicting, and the pre-trained universal models. Eventually, the future perspectives, consisting of standard datasets, transferability, generalization, and trade-off between accuracy and complexity in MLIPs, are reported.
Collapse
Affiliation(s)
- Guanjie Wang
- School of Materials Science and Engineering, Beihang University, Beijing 100191, China
- School of Integrated Circuit Science and Engineering, Beihang University, Beijing 100191, China
| | - Changrui Wang
- School of Materials Science and Engineering, Beihang University, Beijing 100191, China
| | - Xuanguang Zhang
- School of Materials Science and Engineering, Beihang University, Beijing 100191, China
| | - Zefeng Li
- School of Materials Science and Engineering, Beihang University, Beijing 100191, China
| | - Jian Zhou
- School of Materials Science and Engineering, Beihang University, Beijing 100191, China
| | - Zhimei Sun
- School of Materials Science and Engineering, Beihang University, Beijing 100191, China
| |
Collapse
|
18
|
Zhang S, Makoś MZ, Jadrich RB, Kraka E, Barros K, Nebgen BT, Tretiak S, Isayev O, Lubbers N, Messerly RA, Smith JS. Exploring the frontiers of condensed-phase chemistry with a general reactive machine learning potential. Nat Chem 2024; 16:727-734. [PMID: 38454071 PMCID: PMC11087274 DOI: 10.1038/s41557-023-01427-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2023] [Accepted: 12/12/2023] [Indexed: 03/09/2024]
Abstract
Atomistic simulation has a broad range of applications from drug design to materials discovery. Machine learning interatomic potentials (MLIPs) have become an efficient alternative to computationally expensive ab initio simulations. For this reason, chemistry and materials science would greatly benefit from a general reactive MLIP, that is, an MLIP that is applicable to a broad range of reactive chemistry without the need for refitting. Here we develop a general reactive MLIP (ANI-1xnr) through automated sampling of condensed-phase reactions. ANI-1xnr is then applied to study five distinct systems: carbon solid-phase nucleation, graphene ring formation from acetylene, biofuel additives, combustion of methane and the spontaneous formation of glycine from early earth small molecules. In all studies, ANI-1xnr closely matches experiment (when available) and/or previous studies using traditional model chemistry methods. As such, ANI-1xnr proves to be a highly general reactive MLIP for C, H, N and O elements in the condensed phase, enabling high-throughput in silico reactive chemistry experimentation.
Collapse
Affiliation(s)
- Shuhao Zhang
- Department of Chemistry, Mellon College of Science, Carnegie Mellon University, Pittsburgh, PA, USA
- Theoretical Division, Los Alamos National Laboratory, Los Alamos, NM, USA
| | - Małgorzata Z Makoś
- Computational and Theoretical Chemistry Group, Department of Chemistry, Southern Methodist University, Dallas, TX, USA
- Computer, Computational, and Statistical Sciences Division, Los Alamos National Laboratory, Los Alamos, NM, USA
| | - Ryan B Jadrich
- Theoretical Division, Los Alamos National Laboratory, Los Alamos, NM, USA
- Center for Nonlinear Studies, Los Alamos National Laboratory, Los Alamos, NM, USA
| | - Elfi Kraka
- Computational and Theoretical Chemistry Group, Department of Chemistry, Southern Methodist University, Dallas, TX, USA
| | - Kipton Barros
- Theoretical Division, Los Alamos National Laboratory, Los Alamos, NM, USA
- Center for Nonlinear Studies, Los Alamos National Laboratory, Los Alamos, NM, USA
| | - Benjamin T Nebgen
- Theoretical Division, Los Alamos National Laboratory, Los Alamos, NM, USA
| | - Sergei Tretiak
- Theoretical Division, Los Alamos National Laboratory, Los Alamos, NM, USA
- Center for Nonlinear Studies, Los Alamos National Laboratory, Los Alamos, NM, USA
| | - Olexandr Isayev
- Department of Chemistry, Mellon College of Science, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Nicholas Lubbers
- Computer, Computational, and Statistical Sciences Division, Los Alamos National Laboratory, Los Alamos, NM, USA.
| | - Richard A Messerly
- Theoretical Division, Los Alamos National Laboratory, Los Alamos, NM, USA.
| | - Justin S Smith
- Theoretical Division, Los Alamos National Laboratory, Los Alamos, NM, USA.
- NVIDIA Corp., Santa Clara, CA, USA.
| |
Collapse
|
19
|
Wan K, He J, Shi X. Construction of High Accuracy Machine Learning Interatomic Potential for Surface/Interface of Nanomaterials-A Review. ADVANCED MATERIALS (DEERFIELD BEACH, FLA.) 2024; 36:e2305758. [PMID: 37640376 DOI: 10.1002/adma.202305758] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/15/2023] [Revised: 08/24/2023] [Indexed: 08/31/2023]
Abstract
The inherent discontinuity and unique dimensional attributes of nanomaterial surfaces and interfaces bestow them with various exceptional properties. These properties, however, also introduce difficulties for both experimental and computational studies. The advent of machine learning interatomic potential (MLIP) addresses some of the limitations associated with empirical force fields, presenting a valuable avenue for accurate simulations of these surfaces/interfaces of nanomaterials. Central to this approach is the idea of capturing the relationship between system configuration and potential energy, leveraging the proficiency of machine learning (ML) to precisely approximate high-dimensional functions. This review offers an in-depth examination of MLIP principles and their execution and elaborates on their applications in the realm of nanomaterial surface and interface systems. The prevailing challenges faced by this potent methodology are also discussed.
Collapse
Affiliation(s)
- Kaiwei Wan
- Laboratory of Theoretical and Computational Nanoscience, National Center for Nanoscience and Technology, Chinese Academy of Sciences, Beijing, 100190, China
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing, 100049, China
| | - Jianxin He
- Laboratory of Theoretical and Computational Nanoscience, National Center for Nanoscience and Technology, Chinese Academy of Sciences, Beijing, 100190, China
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing, 100049, China
| | - Xinghua Shi
- Laboratory of Theoretical and Computational Nanoscience, National Center for Nanoscience and Technology, Chinese Academy of Sciences, Beijing, 100190, China
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing, 100049, China
| |
Collapse
|
20
|
Chen M, Jiang X, Zhang L, Chen X, Wen Y, Gu Z, Li X, Zheng M. The emergence of machine learning force fields in drug design. Med Res Rev 2024; 44:1147-1182. [PMID: 38173298 DOI: 10.1002/med.22008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2023] [Revised: 11/29/2023] [Accepted: 12/05/2023] [Indexed: 01/05/2024]
Abstract
In the field of molecular simulation for drug design, traditional molecular mechanic force fields and quantum chemical theories have been instrumental but limited in terms of scalability and computational efficiency. To overcome these limitations, machine learning force fields (MLFFs) have emerged as a powerful tool capable of balancing accuracy with efficiency. MLFFs rely on the relationship between molecular structures and potential energy, bypassing the need for a preconceived notion of interaction representations. Their accuracy depends on the machine learning models used, and the quality and volume of training data sets. With recent advances in equivariant neural networks and high-quality datasets, MLFFs have significantly improved their performance. This review explores MLFFs, emphasizing their potential in drug design. It elucidates MLFF principles, provides development and validation guidelines, and highlights successful MLFF implementations. It also addresses potential challenges in developing and applying MLFFs. The review concludes by illuminating the path ahead for MLFFs, outlining the challenges to be overcome and the opportunities to be harnessed. This inspires researchers to embrace MLFFs in their investigations as a new tool to perform molecular simulations in drug design.
Collapse
Affiliation(s)
- Mingan Chen
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
- School of Physical Science and Technology, ShanghaiTech University, Shanghai, China
- Lingang Laboratory, Shanghai, China
| | - Xinyu Jiang
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
- School of Pharmacy, University of Chinese Academy of Sciences, Beijing, China
| | - Lehan Zhang
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
- School of Pharmacy, University of Chinese Academy of Sciences, Beijing, China
| | - Xiaoxu Chen
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
- School of Pharmacy, University of Chinese Academy of Sciences, Beijing, China
- School of Pharmaceutical Science and Technology, Hangzhou Institute for Advanced Study, UCAS, Hangzhou, China
| | - Yiming Wen
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
- School of Pharmacy, University of Chinese Academy of Sciences, Beijing, China
- School of Pharmaceutical Science and Technology, Hangzhou Institute for Advanced Study, UCAS, Hangzhou, China
| | - Zhiyong Gu
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
- School of Pharmacy, University of Chinese Academy of Sciences, Beijing, China
- School of Pharmaceutical Science and Technology, Hangzhou Institute for Advanced Study, UCAS, Hangzhou, China
| | - Xutong Li
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
- School of Pharmacy, University of Chinese Academy of Sciences, Beijing, China
| | - Mingyue Zheng
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
- School of Pharmacy, University of Chinese Academy of Sciences, Beijing, China
- School of Pharmaceutical Science and Technology, Hangzhou Institute for Advanced Study, UCAS, Hangzhou, China
| |
Collapse
|
21
|
Ge F, Wang R, Qu C, Zheng P, Nandi A, Conte R, Houston PL, Bowman JM, Dral PO. Tell Machine Learning Potentials What They Are Needed For: Simulation-Oriented Training Exemplified for Glycine. J Phys Chem Lett 2024; 15:4451-4460. [PMID: 38626460 DOI: 10.1021/acs.jpclett.4c00746] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/18/2024]
Abstract
Machine learning potentials (MLPs) are widely applied as an efficient alternative way to represent potential energy surfaces (PESs) in many chemical simulations. The MLPs are often evaluated with the root-mean-square errors on the test set drawn from the same distribution as the training data. Here, we systematically investigate the relationship between such test errors and the simulation accuracy with MLPs on an example of a full-dimensional, global PES for the glycine amino acid. Our results show that the errors in the test set do not unambiguously reflect the MLP performance in different simulation tasks, such as relative conformer energies, barriers, vibrational levels, and zero-point vibrational energies. We also offer an easily accessible solution for improving the MLP quality in a simulation-oriented manner, yielding the most precise relative conformer energies and barriers. This solution also passed the stringent test by diffusion Monte Carlo simulations.
Collapse
Affiliation(s)
- Fuchun Ge
- State Key Laboratory of Physical Chemistry of Solid Surfaces, College of Chemistry and Chemical Engineering, Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, and Innovation Laboratory for Sciences and Technologies of Energy Materials of Fujian Province (IKKEM), Xiamen University, Xiamen, Fujian 361005, China
| | - Ran Wang
- State Key Laboratory of Physical Chemistry of Solid Surfaces, College of Chemistry and Chemical Engineering, Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, and Innovation Laboratory for Sciences and Technologies of Energy Materials of Fujian Province (IKKEM), Xiamen University, Xiamen, Fujian 361005, China
| | - Chen Qu
- Independent Researcher, Toronto, Ontario M9B0E3, Canada
| | - Peikun Zheng
- State Key Laboratory of Physical Chemistry of Solid Surfaces, College of Chemistry and Chemical Engineering, Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, and Innovation Laboratory for Sciences and Technologies of Energy Materials of Fujian Province (IKKEM), Xiamen University, Xiamen, Fujian 361005, China
| | - Apurba Nandi
- Department of Chemistry and Cherry L. Emerson Center for Scientific Computation, Emory University, Atlanta, Georgia 30322, United States
- Department of Physics and Materials Science, University of Luxembourg, Luxembourg City L-1511, Luxembourg
| | - Riccardo Conte
- Dipartimento di Chimica, Università degli Studi di Milano, via Golgi 19, 20133 Milano, Italy
| | - Paul L Houston
- Department of Chemistry and Chemical Biology, Cornell University, Ithaca, New York 14853, United States
| | - Joel M Bowman
- Department of Chemistry and Cherry L. Emerson Center for Scientific Computation, Emory University, Atlanta, Georgia 30322, United States
| | - Pavlo O Dral
- State Key Laboratory of Physical Chemistry of Solid Surfaces, College of Chemistry and Chemical Engineering, Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, and Innovation Laboratory for Sciences and Technologies of Energy Materials of Fujian Province (IKKEM), Xiamen University, Xiamen, Fujian 361005, China
| |
Collapse
|
22
|
Dral PO. AI in computational chemistry through the lens of a decade-long journey. Chem Commun (Camb) 2024; 60:3240-3258. [PMID: 38444290 DOI: 10.1039/d4cc00010b] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/07/2024]
Abstract
This article gives a perspective on the progress of AI tools in computational chemistry through the lens of the author's decade-long contributions put in the wider context of the trends in this rapidly expanding field. This progress over the last decade is tremendous: while a decade ago we had a glimpse of what was to come through many proof-of-concept studies, now we witness the emergence of many AI-based computational chemistry tools that are mature enough to make faster and more accurate simulations increasingly routine. Such simulations in turn allow us to validate and even revise experimental results, deepen our understanding of the physicochemical processes in nature, and design better materials, devices, and drugs. The rapid introduction of powerful AI tools gives rise to unique challenges and opportunities that are discussed in this article too.
Collapse
Affiliation(s)
- Pavlo O Dral
- State Key Laboratory of Physical Chemistry of Solid Surfaces, College of Chemistry and Chemical Engineering, Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, and Innovation Laboratory for Sciences and Technologies of Energy Materials of Fujian Province (IKKEM), Xiamen University, Xiamen, Fujian 361005, China.
| |
Collapse
|
23
|
Wang Y, Inizan TJ, Liu C, Piquemal JP, Ren P. Incorporating Neural Networks into the AMOEBA Polarizable Force Field. J Phys Chem B 2024; 128:2381-2388. [PMID: 38445577 PMCID: PMC10985787 DOI: 10.1021/acs.jpcb.3c08166] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/07/2024]
Abstract
Neural network potentials (NNPs) offer significant promise to bridge the gap between the accuracy of quantum mechanics and the efficiency of molecular mechanics in molecular simulation. Most NNPs rely on the locality assumption that ensures the model's transferability and scalability and thus lack the treatment of long-range interactions, which are essential for molecular systems in the condensed phase. Here we present an integrated hybrid model, AMOEBA+NN, which combines the AMOEBA potential for the short- and long-range noncovalent atomic interactions and an NNP to capture the remaining local covalent contributions. The AMOEBA+NN model was trained on the conformational energy of the ANI-1x data set and tested on several external data sets ranging from small molecules to tetrapeptides. The hybrid model demonstrated substantial improvements over the baseline models in term of accuracy as the molecule size increased, suggesting its potential as a next-generation approach for chemically accurate molecular simulations.
Collapse
Affiliation(s)
- Yanxing Wang
- Department of Biomedical Engineering, The University of Texas at Austin, Austin, Texas 78712, United States
| | - Théo Jaffrelot Inizan
- Sorbonne Université, Laboratoire de Chimie Théorique, UMR 7616 CNRS, Paris 75005, France
| | - Chengwen Liu
- Department of Biomedical Engineering, The University of Texas at Austin, Austin, Texas 78712, United States
| | - Jean-Philip Piquemal
- Sorbonne Université, Laboratoire de Chimie Théorique, UMR 7616 CNRS, Paris 75005, France
| | - Pengyu Ren
- Department of Biomedical Engineering, The University of Texas at Austin, Austin, Texas 78712, United States
| |
Collapse
|
24
|
Buterez D, Janet JP, Kiddle SJ, Oglic D, Lió P. Transfer learning with graph neural networks for improved molecular property prediction in the multi-fidelity setting. Nat Commun 2024; 15:1517. [PMID: 38409255 PMCID: PMC11258334 DOI: 10.1038/s41467-024-45566-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2023] [Accepted: 01/25/2024] [Indexed: 02/28/2024] Open
Abstract
We investigate the potential of graph neural networks for transfer learning and improving molecular property prediction on sparse and expensive to acquire high-fidelity data by leveraging low-fidelity measurements as an inexpensive proxy for a targeted property of interest. This problem arises in discovery processes that rely on screening funnels for trading off the overall costs against throughput and accuracy. Typically, individual stages in these processes are loosely connected and each one generates data at different scale and fidelity. We consider this setup holistically and demonstrate empirically that existing transfer learning techniques for graph neural networks are generally unable to harness the information from multi-fidelity cascades. Here, we propose several effective transfer learning strategies and study them in transductive and inductive settings. Our analysis involves a collection of more than 28 million unique experimental protein-ligand interactions across 37 targets from drug discovery by high-throughput screening and 12 quantum properties from the dataset QMugs. The results indicate that transfer learning can improve the performance on sparse tasks by up to eight times while using an order of magnitude less high-fidelity training data. Moreover, the proposed methods consistently outperform existing transfer learning strategies for graph-structured data on drug discovery and quantum mechanics datasets.
Collapse
Affiliation(s)
- David Buterez
- Department of Computer Science and Technology, University of Cambridge, Cambridge, UK.
| | - Jon Paul Janet
- Molecular AI, BioPharmaceuticals R&D, AstraZeneca, Gothenburg, Sweden
| | - Steven J Kiddle
- Data Science & Advanced Analytics, Data Science & AI, R&D, AstraZeneca, Cambridge, UK
| | - Dino Oglic
- Centre for AI, BioPharmaceuticals R&D, AstraZeneca, Cambridge, UK
| | - Pietro Lió
- Department of Computer Science and Technology, University of Cambridge, Cambridge, UK
| |
Collapse
|
25
|
Hedelius BE, Tingey D, Della Corte D. TrIP─Transformer Interatomic Potential Predicts Realistic Energy Surface Using Physical Bias. J Chem Theory Comput 2024; 20:199-211. [PMID: 38150692 DOI: 10.1021/acs.jctc.3c00936] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2023]
Abstract
Accurate interatomic energies and forces enable high-quality molecular dynamics simulations, torsion scans, potential energy surface mappings, and geometry optimizations. Machine learning algorithms have enabled rapid estimates of the energies and forces with high accuracy. Further development of machine learning algorithms holds promise for producing universal potentials that support many different atomic species. We present the Transformer Interatomic Potential (TrIP): a chemically sound potential based on the SE(3)-Transformer. TrIP's species-agnostic architecture, which uses continuous atomic representation and homogeneous graph convolutions, encourages parameter sharing between atomic species for more general representations of chemical environments, maintains a reasonable number of parameters, serves as a form of regularization, and is a step toward accurate universal interatomic potentials. TrIP achieves state-of-the-art accuracies on the COMP6 benchmark with an energy prediction of just 1.02 kcal/mol MAE. We introduce physical bias in the form of Ziegler-Biersack-Littmark-screened nuclear repulsion and constrained atomization energies. An energy scan of a water molecule demonstrates that these changes improve long- and near-range interactions compared to other neural network potentials. TrIP also demonstrates stability in molecular dynamics simulations, demonstrating reasonable exploration of Ramachandran space.
Collapse
Affiliation(s)
- Bryce E Hedelius
- Department of Physics and Astronomy, Brigham Young University, Provo, Utah 84602, United States
| | - Damon Tingey
- Department of Physics and Astronomy, Brigham Young University, Provo, Utah 84602, United States
| | - Dennis Della Corte
- Department of Physics and Astronomy, Brigham Young University, Provo, Utah 84602, United States
| |
Collapse
|
26
|
Gelžinytė E, Öeren M, Segall MD, Csányi G. Transferable Machine Learning Interatomic Potential for Bond Dissociation Energy Prediction of Drug-like Molecules. J Chem Theory Comput 2024; 20:164-177. [PMID: 38108269 PMCID: PMC10782450 DOI: 10.1021/acs.jctc.3c00710] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2023] [Revised: 11/30/2023] [Accepted: 11/30/2023] [Indexed: 12/19/2023]
Abstract
We present a transferable MACE interatomic potential that is applicable to open- and closed-shell drug-like molecules containing hydrogen, carbon, and oxygen atoms. Including an accurate description of radical species extends the scope of possible applications to bond dissociation energy (BDE) prediction, for example, in the context of cytochrome P450 (CYP) metabolism. The transferability of the MACE potential was validated on the COMP6 data set, containing only closed-shell molecules, where it reaches better accuracy than the readily available general ANI-2x potential. MACE achieves similar accuracy on two CYP metabolism-specific data sets, which include open- and closed-shell structures. This model enables us to calculate the aliphatic C-H BDE, which allows us to compare reaction energies of hydrogen abstraction, which is the rate-limiting step of the aliphatic hydroxylation reaction catalyzed by CYPs. On the "CYP 3A4" data set, MACE achieves a BDE RMSE of 1.37 kcal/mol and better prediction of BDE ranks than alternatives: the semiempirical AM1 and GFN2-xTB methods and the ALFABET model that directly predicts bond dissociation enthalpies. Finally, we highlight the smoothness of the MACE potential over paths of sp3C-H bond elongation and show that a minimal extension is enough for the MACE model to start finding reasonable minimum energy paths of methoxy radical-mediated hydrogen abstraction. Altogether, this work lays the ground for further extensions of scope in terms of chemical elements, (CYP-mediated) reaction classes and modeling the full reaction paths, not only BDEs.
Collapse
Affiliation(s)
- Elena Gelžinytė
- Engineering
Laboratory, University of Cambridge, Trumpington Street, Cambridge CB2 1PZ, U.K.
| | - Mario Öeren
- Optibrium
Limited, Cambridge Innovation Park, Denny End Road, Cambridge CB25 9GL, U.K.
| | - Matthew D. Segall
- Optibrium
Limited, Cambridge Innovation Park, Denny End Road, Cambridge CB25 9GL, U.K.
| | - Gábor Csányi
- Engineering
Laboratory, University of Cambridge, Trumpington Street, Cambridge CB2 1PZ, U.K.
| |
Collapse
|
27
|
Swanson HA, Lau KHA, Tuttle T. Minimal Peptoid Dynamics Inform Self-Assembly Propensity. J Phys Chem B 2023; 127:10601-10614. [PMID: 38038956 PMCID: PMC10726364 DOI: 10.1021/acs.jpcb.3c03725] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2023] [Revised: 10/30/2023] [Accepted: 11/09/2023] [Indexed: 12/02/2023]
Abstract
Peptoids are structural isomers of natural peptides, with side chain attachment at the amide nitrogen, conferring this class of compounds with the ability to access both cis and trans ω torsions as well as an increased diversity of ψ/φ states with respect to peptides. Sampling within these dimensions is controlled through side chain selection, and an expansive set of viable peptoid residues exists. It has been shown recently that "minimal" di- and tripeptoids with aromatic side chains can self-assemble into highly ordered structures, with size and morphological definition varying as a function of sequence pattern (e.g., XFF and FXF, where X = a nonaromatic peptoid monomer). Aromatic groups, such as phenylalanine, are regularly used in the design of minimal peptide assemblers. In recognition of this, and to draw parallels between these compounds classes, we have developed a series of descriptors for intramolecular dynamics of aromatic side chains to discern whether these dynamics, in a preassembly condition, can be related to experimentally observed nanoscale assemblies. To do this, we have built on the atomistic peptoid force field reported by Weiser and Santiso (CGenFF-WS) through the rigorous fitting of partial charges and the collation of Charmm General Force Field (CGenFF) parameters relevant to these systems. Our study finds that the intramolecular dynamics of side chains, for a given sequence, is dependent on the specific combination of backbone ω torsions and that homogeneity of sampling across these states correlates well with the experimentally observed ability to assemble into nanomorphologies with long-range order. Sequence patterning is also shown to affect sampling, in a manner consistent for both tripeptoids and tripeptides. Additionally, sampling similarities between the nanofiber forming tripeptoid, Nf-Nke-Nf in the cc state, and the nanotube forming dipeptide FF, highlight a structural motif which may be relevant to the emergence of extended linear assemblies. To assess these properties, a variety of computational approaches have been employed.
Collapse
Affiliation(s)
- Hamish
W. A. Swanson
- Department of Pure and Applied Chemistry, University of Strathclyde, 295 Cathedral Street, Glasgow G1 1XL, U.K.
| | - King Hang Aaron Lau
- Department of Pure and Applied Chemistry, University of Strathclyde, 295 Cathedral Street, Glasgow G1 1XL, U.K.
| | - Tell Tuttle
- Department of Pure and Applied Chemistry, University of Strathclyde, 295 Cathedral Street, Glasgow G1 1XL, U.K.
| |
Collapse
|
28
|
Fonseca G, Poltavsky I, Tkatchenko A. Force Field Analysis Software and Tools (FFAST): Assessing Machine Learning Force Fields under the Microscope. J Chem Theory Comput 2023; 19:8706-8717. [PMID: 38011895 PMCID: PMC10720330 DOI: 10.1021/acs.jctc.3c00985] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2023] [Revised: 11/06/2023] [Accepted: 11/07/2023] [Indexed: 11/29/2023]
Abstract
As the sophistication of machine learning force fields (MLFF) increases to match the complexity of extended molecules and materials, so does the need for tools to properly analyze and assess the practical performance of MLFFs. To go beyond average error metrics and into a complete picture of a model's applicability and limitations, we developed FFAST (force field analysis software and tools): a cross-platform software package designed to gain detailed insights into a model's performance and limitations, complete with an easy-to-use graphical user interface. The software allows the user to gauge the performance of any molecular force field,─such as popular state-of-the-art MLFF models, ─ on various popular data set types, providing general prediction error overviews, outlier detection mechanisms, atom-projected errors, and more. It has a 3D visualizer to find and picture problematic configurations, atoms, or clusters in a large data set. In this paper, the example of the MACE and NequIP models is used on two data sets of interest [stachyose and docosahexaenoic acid (DHA)]─to illustrate the use cases of the software. With this, it was found that carbons and oxygens involved in or near glycosidic bonds inside the stachyose molecule present increased prediction errors. In addition, prediction errors on DHA rise as the molecule folds, especially for the carboxylic group at the edge of the molecule. We emphasize the need for a systematic assessment of MLFF models for ensuring their successful application to the study of dynamics of molecules and materials.
Collapse
Affiliation(s)
- Gregory Fonseca
- Department of Physics and Materials
Science, University of Luxembourg, Luxembourg City L-1511, Luxembourg
| | - Igor Poltavsky
- Department of Physics and Materials
Science, University of Luxembourg, Luxembourg City L-1511, Luxembourg
| | - Alexandre Tkatchenko
- Department of Physics and Materials
Science, University of Luxembourg, Luxembourg City L-1511, Luxembourg
| |
Collapse
|
29
|
Folmsbee D, Koes DR, Hutchison GR. Systematic Comparison of Experimental Crystallographic Geometries and Gas-Phase Computed Conformers for Torsion Preferences. J Chem Inf Model 2023; 63:7401-7411. [PMID: 38000780 PMCID: PMC10716907 DOI: 10.1021/acs.jcim.3c01278] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2023] [Revised: 11/07/2023] [Accepted: 11/13/2023] [Indexed: 11/26/2023]
Abstract
We performed exhaustive torsion sampling on more than 3 million compounds using the GFN2-xTB method and performed a comparison of experimental crystallographic and gas-phase conformers. Many conformer sampling methods derive torsional angle distributions from experimental crystallographic data, limiting the torsion preferences to molecules that must be stable, synthetically accessible, and able to be crystallized. In this work, we evaluate the differences in torsional preferences of experimental crystallographic geometries and gas-phase computed conformers from a broad selection of compounds to determine whether torsional angle distributions obtained from semiempirical methods are suitable priors for conformer sampling. We find that differences in torsion preferences can be mostly attributed to a lack of available experimental crystallographic data with small deviations derived from gas-phase geometry differences. GFN2 demonstrates the ability to provide accurate and reliable torsional preferences that can provide a basis for new methods free from the limitations of experimental data collection. We provide Gaussian-based fits and sampling distributions suitable for torsion sampling and propose an alternative to the widely used "experimental torsion and knowledge distance geometry" (ETKDG) method using quantum torsion-derived distance geometry (QTDG) methods.
Collapse
Affiliation(s)
- Dakota
L. Folmsbee
- Department
of Chemistry, University of Pittsburgh, 219 Parkman Avenue, Pittsburgh, Pennsylvania 15260, United States
- Department
of Anesthesiology & Perioperative Medicine, School of Medicine, University of Pittsburgh, Pittsburgh, Pennsylvania 15261, United States
| | - David R. Koes
- Department
of Computational & Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, Pennsylvania 15260, United States
| | - Geoffrey R. Hutchison
- Department
of Chemistry, University of Pittsburgh, 219 Parkman Avenue, Pittsburgh, Pennsylvania 15260, United States
- Department
of Chemical & Petroleum Engineering, University of Pittsburgh, 3700 O’Hara Street, Pittsburgh, Pennsylvania 15261, United States
| |
Collapse
|
30
|
Medrano Sandonas L, Hoja J, Ernst BG, Vázquez-Mayagoitia Á, DiStasio RA, Tkatchenko A. "Freedom of design" in chemical compound space: towards rational in silico design of molecules with targeted quantum-mechanical properties. Chem Sci 2023; 14:10702-10717. [PMID: 37829035 PMCID: PMC10566466 DOI: 10.1039/d3sc03598k] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2023] [Accepted: 08/17/2023] [Indexed: 10/14/2023] Open
Abstract
The rational design of molecules with targeted quantum-mechanical (QM) properties requires an advanced understanding of the structure-property/property-property relationships (SPR/PPR) that exist across chemical compound space (CCS). In this work, we analyze these fundamental relationships in the sector of CCS spanned by small (primarily organic) molecules using the recently developed QM7-X dataset, a systematic, extensive, and tightly converged collection of 42 QM properties corresponding to ≈4.2M equilibrium and non-equilibrium molecular structures containing up to seven heavy/non-hydrogen atoms (including C, N, O, S, and Cl). By characterizing and enumerating progressively more complex manifolds of molecular property space-the corresponding high-dimensional space defined by the properties of each molecule in this sector of CCS-our analysis reveals that one has a substantial degree of flexibility or "freedom of design" when searching for a single molecule with a desired pair of properties or a set of distinct molecules sharing an array of properties. To explore how this intrinsic flexibility manifests in the molecular design process, we used multi-objective optimization to search for molecules with simultaneously large polarizabilities and HOMO-LUMO gaps; analysis of the resulting Pareto fronts identified non-trivial paths through CCS consisting of sequential structural and/or compositional changes that yield molecules with optimal combinations of these properties.
Collapse
Affiliation(s)
- Leonardo Medrano Sandonas
- Department of Physics and Materials Science, University of Luxembourg L-1511 Luxembourg City Luxembourg
| | - Johannes Hoja
- Department of Physics and Materials Science, University of Luxembourg L-1511 Luxembourg City Luxembourg
- Institute of Chemistry, University of Graz 8010 Graz Austria
| | - Brian G Ernst
- Department of Chemistry and Chemical Biology, Cornell University Ithaca NY 14853 USA
| | | | - Robert A DiStasio
- Department of Chemistry and Chemical Biology, Cornell University Ithaca NY 14853 USA
| | - Alexandre Tkatchenko
- Department of Physics and Materials Science, University of Luxembourg L-1511 Luxembourg City Luxembourg
| |
Collapse
|
31
|
Hu F, He F, Yaron DJ. Treating Semiempirical Hamiltonians as Flexible Machine Learning Models Yields Accurate and Interpretable Results. J Chem Theory Comput 2023; 19:6185-6196. [PMID: 37705220 PMCID: PMC10536991 DOI: 10.1021/acs.jctc.3c00491] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2023] [Indexed: 09/15/2023]
Abstract
Quantum chemistry provides chemists with invaluable information, but the high computational cost limits the size and type of systems that can be studied. Machine learning (ML) has emerged as a means to dramatically lower the cost while maintaining high accuracy. However, ML models often sacrifice interpretability by using components such as the artificial neural networks of deep learning that function as black boxes. These components impart the flexibility needed to learn from large volumes of data but make it difficult to gain insight into the physical or chemical basis for the predictions. Here, we demonstrate that semiempirical quantum chemical (SEQC) models can learn from large volumes of data without sacrificing interpretability. The SEQC model is that of density-functional-based tight binding (DFTB) with fixed atomic orbital energies and interactions that are one-dimensional functions of the interatomic distance. This model is trained to ab initio data in a manner that is analogous to that used to train deep learning models. Using benchmarks that reflect the accuracy of the training data, we show that the resulting model maintains a physically reasonable functional form while achieving an accuracy, relative to coupled cluster energies with a complete basis set extrapolation (CCSD(T)*/CBS), that is comparable to that of density functional theory (DFT). This suggests that trained SEQC models can achieve a low computational cost and high accuracy without sacrificing interpretability. Use of a physically motivated model form also substantially reduces the amount of ab initio data needed to train the model compared to that required for deep learning models.
Collapse
Affiliation(s)
- Frank Hu
- Department of Chemistry, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213, United States
| | - Francis He
- Department of Chemistry, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213, United States
| | - David J. Yaron
- Department of Chemistry, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213, United States
| |
Collapse
|
32
|
Nakata M, Maeda T. PubChemQC B3LYP/6-31G*//PM6 Data Set: The Electronic Structures of 86 Million Molecules Using B3LYP/6-31G* Calculations. J Chem Inf Model 2023; 63:5734-5754. [PMID: 37677147 DOI: 10.1021/acs.jcim.3c00899] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/09/2023]
Abstract
The presented "PubChemQC B3LYP/6-31G*//PM6" data set is composed of the electronic properties of 85,938,443 molecules, encompassing a broad spectrum of molecules from essential compounds to biomolecules with a molecular weight up to 1000. These molecules account for 94.0% of the original PubChem Compound catalog as of August 29, 2016. The electronic properties, including orbitals, orbital energies, total energies, dipole moments, and other pertinent properties, were computed by using the B3LYP/6-31G* and PM6 methods. The data set, available in three formats, namely, GAMESS quantum chemistry program files, selected JSON output files, and a PostgreSQL database, provides researchers with the ability to query molecular properties. It is further subdivided into five subdata sets for more specific data. The first two subsets encompass molecules with carbon, hydrogen, oxygen, and nitrogen with molecular weights under 300 and 500, respectively. The third and fourth subsets incorporate molecules with carbon, hydrogen, nitrogen, oxygen, phosphorus, sulfur, fluorine, and chlorine, with molecular weights under 300 and 500, respectively. The fifth subset comprises molecules with carbon, hydrogen, nitrogen, oxygen, phosphorus, sulfur, fluorine, chlorine, sodium, potassium, magnesium, and calcium, with a molecular weight of under 500. The coefficients of determination for the highest occupied molecular orbital-lowest unoccupied molecular orbital energy gap range from 0.892 (for CHON500) to 0.803 (for the whole data set). These comprehensive results pave the way for applications in drug discovery and materials science, among others. The data sets can be accessed under the Creative Commons Attribution 4.0 International license at the following web address: https://nakatamaho.riken.jp/pubchemqc.riken.jp/b3lyp_pm6_datasets.html.
Collapse
Affiliation(s)
- Maho Nakata
- RIKEN Cluster for Pioneering Research, 2-1 Hirosawa, Wako, Saitama 351-0198, Japan
| | - Toshiyuki Maeda
- Software Technology and Artificial Intelligence Research Laboratory, Chiba Institute of Technology, 2-17-1 Tsudanuma, Narashino, Chiba 275-0016, Japan
| |
Collapse
|
33
|
Fedik N, Nebgen B, Lubbers N, Barros K, Kulichenko M, Li YW, Zubatyuk R, Messerly R, Isayev O, Tretiak S. Synergy of semiempirical models and machine learning in computational chemistry. J Chem Phys 2023; 159:110901. [PMID: 37712780 DOI: 10.1063/5.0151833] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2023] [Accepted: 07/11/2023] [Indexed: 09/16/2023] Open
Abstract
Catalyzed by enormous success in the industrial sector, many research programs have been exploring data-driven, machine learning approaches. Performance can be poor when the model is extrapolated to new regions of chemical space, e.g., new bonding types, new many-body interactions. Another important limitation is the spatial locality assumption in model architecture, and this limitation cannot be overcome with larger or more diverse datasets. The outlined challenges are primarily associated with the lack of electronic structure information in surrogate models such as interatomic potentials. Given the fast development of machine learning and computational chemistry methods, we expect some limitations of surrogate models to be addressed in the near future; nevertheless spatial locality assumption will likely remain a limiting factor for their transferability. Here, we suggest focusing on an equally important effort-design of physics-informed models that leverage the domain knowledge and employ machine learning only as a corrective tool. In the context of material science, we will focus on semi-empirical quantum mechanics, using machine learning to predict corrections to the reduced-order Hamiltonian model parameters. The resulting models are broadly applicable, retain the speed of semiempirical chemistry, and frequently achieve accuracy on par with much more expensive ab initio calculations. These early results indicate that future work, in which machine learning and quantum chemistry methods are developed jointly, may provide the best of all worlds for chemistry applications that demand both high accuracy and high numerical efficiency.
Collapse
Affiliation(s)
- Nikita Fedik
- Theoretical Division, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, USA
- Center for Nonlinear Studies, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, USA
| | - Benjamin Nebgen
- Theoretical Division, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, USA
| | - Nicholas Lubbers
- Computer, Computational, and Statistical Sciences Division, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, USA
| | - Kipton Barros
- Theoretical Division, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, USA
- Center for Nonlinear Studies, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, USA
| | - Maksim Kulichenko
- Theoretical Division, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, USA
| | - Ying Wai Li
- Computer, Computational, and Statistical Sciences Division, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, USA
| | - Roman Zubatyuk
- Department of Chemistry, Mellon College of Science, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213, USA
| | - Richard Messerly
- Theoretical Division, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, USA
| | - Olexandr Isayev
- Department of Chemistry, Mellon College of Science, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213, USA
| | - Sergei Tretiak
- Theoretical Division, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, USA
- Center for Nonlinear Studies, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, USA
- Center for Integrated Nanotechnologies Los Alamos National Laboratory, Los Alamos, New Mexico 87545, USA
| |
Collapse
|
34
|
Tkachenko NV, Tkachenko AA, Nebgen B, Tretiak S, Boldyrev AI. Neural network atomistic potentials for global energy minima search in carbon clusters. Phys Chem Chem Phys 2023; 25:21173-21182. [PMID: 37490276 DOI: 10.1039/d3cp02317f] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/26/2023]
Abstract
The global energy optimization problem is an acute and important problem in chemistry. It is crucial to know the geometry of the lowest energy isomer (global minimum, GM) of a given compound for the evaluation of its chemical and physical properties. This problem is especially relevant for atomic clusters. Due to the exponential growth of the number of local minima geometries with the increase of the number of atoms in the cluster, it is important to find a computationally efficient and reliable method to navigate the energy landscape and locate a true global minima structure. Newly developed neural network (NN) atomistic potentials offer a numerically efficient and relatively accurate approach for molecular structure optimization. An important question that needs to be answered is "Can NN potentials, trained on a given set, represent the potential energy surface (PES) of a neighboring domain?". In this work, we tested the applicability of ANI-1ccx and ANI-nr NN atomistic potentials for the global minima optimization of carbon clusters Cn (n = 3-10). We showed that with the introduction of the cluster connectivity restriction and consequent DFT or ab initio calculations, ANI-1ccx and ANI-nr can be considered as robust PES pre-samplers that can capture the GM structure even for large clusters such as C20.
Collapse
Affiliation(s)
- Nikolay V Tkachenko
- Department of Chemistry and Biochemistry, Utah State University, Logan, Utah 84322-0300, USA.
| | | | - Benjamin Nebgen
- Theoretical Division, Los Alamos National Laboratory, Los Alamos, NM 87545, USA
| | - Sergei Tretiak
- Theoretical Division, Los Alamos National Laboratory, Los Alamos, NM 87545, USA
| | - Alexander I Boldyrev
- Department of Chemistry and Biochemistry, Utah State University, Logan, Utah 84322-0300, USA.
| |
Collapse
|
35
|
Wang Y, Xu C, Li Z, Barati Farimani A. Denoise Pretraining on Nonequilibrium Molecules for Accurate and Transferable Neural Potentials. J Chem Theory Comput 2023; 19:5077-5087. [PMID: 37390120 PMCID: PMC10413865 DOI: 10.1021/acs.jctc.3c00289] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2023] [Indexed: 07/02/2023]
Abstract
Recent advances in equivariant graph neural networks (GNNs) have made deep learning amenable to developing fast surrogate models to expensive ab initio quantum mechanics (QM) approaches for molecular potential predictions. However, building accurate and transferable potential models using GNNs remains challenging, as the data are greatly limited by the expensive computational costs and level of theory of QM methods, especially for large and complex molecular systems. In this work, we propose denoise pretraining on nonequilibrium molecular conformations to achieve more accurate and transferable GNN potential predictions. Specifically, atomic coordinates of sampled nonequilibrium conformations are perturbed by random noises, and GNNs are pretrained to denoise the perturbed molecular conformations which recovers the original coordinates. Rigorous experiments on multiple benchmarks reveal that pretraining significantly improves the accuracy of neural potentials. Furthermore, we show that the proposed pretraining approach is model-agnostic, as it improves the performance of different invariant and equivariant GNNs. Notably, our models pretrained on small molecules demonstrate remarkable transferability, improving performance when fine-tuned on diverse molecular systems, including different elements, charged molecules, biomolecules, and larger systems. These results highlight the potential for leveraging denoise pretraining approaches to build more generalizable neural potentials for complex molecular systems.
Collapse
Affiliation(s)
- Yuyang Wang
- Department
of Mechanical Engineering, Carnegie Mellon
University, Pittsburgh, Pennsylvania 15213, United States
- Machine
Learning Department, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213, United States
| | - Changwen Xu
- Department
of Mechanical Engineering, Carnegie Mellon
University, Pittsburgh, Pennsylvania 15213, United States
| | - Zijie Li
- Department
of Mechanical Engineering, Carnegie Mellon
University, Pittsburgh, Pennsylvania 15213, United States
| | - Amir Barati Farimani
- Department
of Mechanical Engineering, Carnegie Mellon
University, Pittsburgh, Pennsylvania 15213, United States
- Machine
Learning Department, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213, United States
- Department
of Materials Science and Engineering, Carnegie
Mellon University, Pittsburgh, Pennsylvania 15213, United States
- Department
of Chemical Engineering, Carnegie Mellon
University, Pittsburgh, Pennsylvania 15213, United States
| |
Collapse
|
36
|
Kovács DP, Batatia I, Arany ES, Csányi G. Evaluation of the MACE force field architecture: From medicinal chemistry to materials science. J Chem Phys 2023; 159:044118. [PMID: 37522405 DOI: 10.1063/5.0155322] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2023] [Accepted: 06/29/2023] [Indexed: 08/01/2023] Open
Abstract
The MACE architecture represents the state of the art in the field of machine learning force fields for a variety of in-domain, extrapolation, and low-data regime tasks. In this paper, we further evaluate MACE by fitting models for published benchmark datasets. We show that MACE generally outperforms alternatives for a wide range of systems, from amorphous carbon, universal materials modeling, and general small molecule organic chemistry to large molecules and liquid water. We demonstrate the capabilities of the model on tasks ranging from constrained geometry optimization to molecular dynamics simulations and find excellent performance across all tested domains. We show that MACE is very data efficient and can reproduce experimental molecular vibrational spectra when trained on as few as 50 randomly selected reference configurations. We further demonstrate that the strictly local atom-centered model is sufficient for such tasks even in the case of large molecules and weakly interacting molecular assemblies.
Collapse
Affiliation(s)
- Dávid Péter Kovács
- Engineering Laboratory, University of Cambridge, Cambridge CB2 1PZ, United Kingdom
| | - Ilyes Batatia
- Engineering Laboratory, University of Cambridge, Cambridge CB2 1PZ, United Kingdom
- ENS Paris-Saclay, Université Paris-Saclay, 91190 Gif-sur-Yvette, France
| | - Eszter Sára Arany
- School of Clinical Medicine, University of Cambridge, Cambridge CB2 0SP, United Kingdom
| | - Gábor Csányi
- Engineering Laboratory, University of Cambridge, Cambridge CB2 1PZ, United Kingdom
| |
Collapse
|
37
|
Knattrup Y, Kubečka J, Ayoubi D, Elm J. Clusterome: A Comprehensive Data Set of Atmospheric Molecular Clusters for Machine Learning Applications. ACS OMEGA 2023; 8:25155-25164. [PMID: 37483242 PMCID: PMC10357536 DOI: 10.1021/acsomega.3c02203] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/02/2023] [Accepted: 06/16/2023] [Indexed: 07/25/2023]
Abstract
Formation and growth of atmospheric molecular clusters into aerosol particles impact the global climate and contribute to the high uncertainty in modern climate models. Cluster formation is usually studied using quantum chemical methods, which quickly becomes computationally expensive when system sizes grow. In this work, we present a large database of ∼250k atmospheric relevant cluster structures, which can be applied for developing machine learning (ML) models. The database is used to train the ML model kernel ridge regression (KRR) with the FCHL19 representation. We test the ability of the model to extrapolate from smaller clusters to larger clusters, between different molecules, between equilibrium structures and out-of-equilibrium structures, and the transferability onto systems with new interactions. We show that KRR models can extrapolate to larger sizes and transfer acid and base interactions with mean absolute errors below 1 kcal/mol. We suggest introducing an iterative ML step in configurational sampling processes, which can reduce the computational expense. Such an approach would allow us to study significantly more cluster systems at higher accuracy than previously possible and thereby allow us to cover a much larger part of relevant atmospheric compounds.
Collapse
Affiliation(s)
- Yosef Knattrup
- Department
of Chemistry, Aarhus University, Langelandsgade 140, 8000 Aarhus C, Denmark
| | - Jakub Kubečka
- Department
of Chemistry, Aarhus University, Langelandsgade 140, 8000 Aarhus C, Denmark
| | - Daniel Ayoubi
- Department
of Chemistry, Aarhus University, Langelandsgade 140, 8000 Aarhus C, Denmark
| | - Jonas Elm
- Department
of Chemistry, iClimate, Aarhus University, Langelandsgade 140, 8000 Aarhus C, Denmark
| |
Collapse
|
38
|
Ruth M, Gerbig D, Schreiner PR. Machine Learning for Bridging the Gap between Density Functional Theory and Coupled Cluster Energies. J Chem Theory Comput 2023. [PMID: 37418619 DOI: 10.1021/acs.jctc.3c00274] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/09/2023]
Abstract
Accurate electronic energies and properties are crucial for successful reaction design and mechanistic investigations. Computing energies and properties of molecular structures has proven extremely useful, and, with increasing computational power, the limits of high-level approaches (such as coupled cluster theory) are expanding to ever larger systems. However, because scaling is highly unfavorable, these methods are still not universally applicable to larger systems. To address the need for fast and accurate electronic energies of larger systems, we created a database of around 8000 small organic monomers (2000 dimers) optimized at the B3LYP-D3(BJ)/cc-pVTZ level of theory. This database also includes single-point energies computed at various levels of theory, including PBE1PBE, ωΒ97Χ, M06-2X, revTPSS, B3LYP, and BP86, for density functional theory as well as DLPNO-CCSD(T) and CCSD(T) for coupled cluster theory, all in conjunction with a cc-pVTZ basis. We used this database to train machine learning models based on graph neural networks using two different graph representations. Our models are able to make energy predictions from B3LYP-D3(BJ)/cc-pVTZ inputs to CCSD(T)/cc-pVTZ outputs with a mean absolute error of 0.78 and to DLPNO-CCSD(T)/cc-pVTZ with an mean absolute error of 0.50 and 0.18 kcal mol-1 for monomers and dimers, respectively. The model for dimers was further validated on the S22 database, and the monomer model was tested on challenging systems, including those with highly conjugated or functionally complex molecules.
Collapse
Affiliation(s)
- Marcel Ruth
- Institute of Organic Chemistry, Justus Liebig University, Heinrich-Buff-Ring 17, 35392 Giessen, Germany
| | - Dennis Gerbig
- Institute of Organic Chemistry, Justus Liebig University, Heinrich-Buff-Ring 17, 35392 Giessen, Germany
| | - Peter R Schreiner
- Institute of Organic Chemistry, Justus Liebig University, Heinrich-Buff-Ring 17, 35392 Giessen, Germany
| |
Collapse
|
39
|
Kubečka J, Knattrup Y, Engsvang M, Jensen AB, Ayoubi D, Wu H, Christiansen O, Elm J. Current and future machine learning approaches for modeling atmospheric cluster formation. NATURE COMPUTATIONAL SCIENCE 2023; 3:495-503. [PMID: 38177415 DOI: 10.1038/s43588-023-00435-0] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/21/2022] [Accepted: 03/16/2023] [Indexed: 01/06/2024]
Abstract
The formation of strongly bound atmospheric molecular clusters is the first step towards forming new aerosol particles. Recent advances in the application of machine learning models open an enormous opportunity for complementing expensive quantum chemical calculations with efficient machine learning predictions. In this Perspective, we present how data-driven approaches can be applied to accelerate cluster configurational sampling, thereby greatly increasing the number of chemically relevant systems that can be covered.
Collapse
Affiliation(s)
- Jakub Kubečka
- Department of Chemistry, Aarhus University, Aarhus, Denmark
| | - Yosef Knattrup
- Department of Chemistry, Aarhus University, Aarhus, Denmark
| | | | | | - Daniel Ayoubi
- Department of Chemistry, Aarhus University, Aarhus, Denmark
| | - Haide Wu
- Department of Chemistry, Aarhus University, Aarhus, Denmark
| | | | - Jonas Elm
- Department of Chemistry, Aarhus University, Aarhus, Denmark.
- iCLIMATE Aarhus University Interdisciplinary Centre for Climate Change, Aarhus, Denmark.
| |
Collapse
|
40
|
Jaffrelot Inizan T, Plé T, Adjoua O, Ren P, Gökcan H, Isayev O, Lagardère L, Piquemal JP. Scalable hybrid deep neural networks/polarizable potentials biomolecular simulations including long-range effects. Chem Sci 2023; 14:5438-5452. [PMID: 37234902 PMCID: PMC10208042 DOI: 10.1039/d2sc04815a] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2022] [Accepted: 04/03/2023] [Indexed: 07/28/2023] Open
Abstract
Deep-HP is a scalable extension of the Tinker-HP multi-GPU molecular dynamics (MD) package enabling the use of Pytorch/TensorFlow Deep Neural Network (DNN) models. Deep-HP increases DNNs' MD capabilities by orders of magnitude offering access to ns simulations for 100k-atom biosystems while offering the possibility of coupling DNNs to any classical (FFs) and many-body polarizable (PFFs) force fields. It allows therefore the introduction of the ANI-2X/AMOEBA hybrid polarizable potential designed for ligand binding studies where solvent-solvent and solvent-solute interactions are computed with the AMOEBA PFF while solute-solute ones are computed by the ANI-2X DNN. ANI-2X/AMOEBA explicitly includes AMOEBA's physical long-range interactions via an efficient Particle Mesh Ewald implementation while preserving ANI-2X's solute short-range quantum mechanical accuracy. The DNN/PFF partition can be user-defined allowing for hybrid simulations to include key ingredients of biosimulation such as polarizable solvents, polarizable counter ions, etc.… ANI-2X/AMOEBA is accelerated using a multiple-timestep strategy focusing on the model's contributions to low-frequency modes of nuclear forces. It primarily evaluates AMOEBA forces while including ANI-2X ones only via correction-steps resulting in an order of magnitude acceleration over standard Velocity Verlet integration. Simulating more than 10 μs, we compute charged/uncharged ligand solvation free energies in 4 solvents, and absolute binding free energies of host-guest complexes from SAMPL challenges. ANI-2X/AMOEBA average errors are discussed in terms of statistical uncertainty and appear in the range of chemical accuracy compared to experiment. The availability of the Deep-HP computational platform opens the path towards large-scale hybrid DNN simulations, at force-field cost, in biophysics and drug discovery.
Collapse
Affiliation(s)
- Théo Jaffrelot Inizan
- Sorbonne Université, Laboratoire de Chimie Théorique UMR 7616 CNRS Paris 75005 France
| | - Thomas Plé
- Sorbonne Université, Laboratoire de Chimie Théorique UMR 7616 CNRS Paris 75005 France
| | - Olivier Adjoua
- Sorbonne Université, Laboratoire de Chimie Théorique UMR 7616 CNRS Paris 75005 France
| | - Pengyu Ren
- Department of Biomedical Engineering, University of Texas at Austin Austin Texas USA
| | - Hatice Gökcan
- Department of Chemistry, Carnegie Mellon University Pittsburgh Pennsylvania USA
| | - Olexandr Isayev
- Department of Chemistry, Carnegie Mellon University Pittsburgh Pennsylvania USA
| | - Louis Lagardère
- Sorbonne Université, Laboratoire de Chimie Théorique UMR 7616 CNRS Paris 75005 France
- Sorbonne Université, Institut Parisien de Chimie Physique et Théorique FR 2622 CNRS Paris France
| | - Jean-Philip Piquemal
- Sorbonne Université, Laboratoire de Chimie Théorique UMR 7616 CNRS Paris 75005 France
- Department of Biomedical Engineering, University of Texas at Austin Austin Texas USA
| |
Collapse
|
41
|
Chigaev M, Smith JS, Anaya S, Nebgen B, Bettencourt M, Barros K, Lubbers N. Lightweight and effective tensor sensitivity for atomistic neural networks. J Chem Phys 2023; 158:2889493. [PMID: 37158328 DOI: 10.1063/5.0142127] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2023] [Accepted: 04/20/2023] [Indexed: 05/10/2023] Open
Abstract
Atomistic machine learning focuses on the creation of models that obey fundamental symmetries of atomistic configurations, such as permutation, translation, and rotation invariances. In many of these schemes, translation and rotation invariance are achieved by building on scalar invariants, e.g., distances between atom pairs. There is growing interest in molecular representations that work internally with higher rank rotational tensors, e.g., vector displacements between atoms, and tensor products thereof. Here, we present a framework for extending the Hierarchically Interacting Particle Neural Network (HIP-NN) with Tensor Sensitivity information (HIP-NN-TS) from each local atomic environment. Crucially, the method employs a weight tying strategy that allows direct incorporation of many-body information while adding very few model parameters. We show that HIP-NN-TS is more accurate than HIP-NN, with negligible increase in parameter count, for several datasets and network sizes. As the dataset becomes more complex, tensor sensitivities provide greater improvements to model accuracy. In particular, HIP-NN-TS achieves a record mean absolute error of 0.927 kcalmol for conformational energy variation on the challenging COMP6 benchmark, which includes a broad set of organic molecules. We also compare the computational performance of HIP-NN-TS to HIP-NN and other models in the literature.
Collapse
Affiliation(s)
- Michael Chigaev
- Theoretical Division, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, USA
- Center for Nonlinear Studies, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, USA
| | - Justin S Smith
- Theoretical Division, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, USA
- Center for Nonlinear Studies, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, USA
- NVIDIA, 2788 San Tomas Expy, Santa Clara, California 95051, USA
| | - Steven Anaya
- High Performance Computing Division, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, USA
- Computer, Computational, and Statistical Sciences Division, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, USA
| | - Benjamin Nebgen
- Theoretical Division, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, USA
| | | | - Kipton Barros
- Theoretical Division, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, USA
- Center for Nonlinear Studies, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, USA
| | - Nicholas Lubbers
- Computer, Computational, and Statistical Sciences Division, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, USA
| |
Collapse
|
42
|
Buterez D, Janet JP, Kiddle SJ, Liò P. MF-PCBA: Multifidelity High-Throughput Screening Benchmarks for Drug Discovery and Machine Learning. J Chem Inf Model 2023; 63:2667-2678. [PMID: 37058588 PMCID: PMC10170507 DOI: 10.1021/acs.jcim.2c01569] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2022] [Indexed: 04/16/2023]
Abstract
High-throughput screening (HTS), as one of the key techniques in drug discovery, is frequently used to identify promising drug candidates in a largely automated and cost-effective way. One of the necessary conditions for successful HTS campaigns is a large and diverse compound library, enabling hundreds of thousands of activity measurements per project. Such collections of data hold great promise for computational and experimental drug discovery efforts, especially when leveraged in combination with modern deep learning techniques, and can potentially lead to improved drug activity predictions and cheaper and more effective experimental design. However, existing collections of machine-learning-ready public datasets do not exploit the multiple data modalities present in real-world HTS projects. Thus, the largest fraction of experimental measurements, corresponding to hundreds of thousands of "noisy" activity values from primary screening, are effectively ignored in the majority of machine learning models of HTS data. To address these limitations, we introduce Multifidelity PubChem BioAssay (MF-PCBA), a curated collection of 60 datasets that includes two data modalities for each dataset, corresponding to primary and confirmatory screening, an aspect that we call multifidelity. Multifidelity data accurately reflect real-world HTS conventions and present a new, challenging task for machine learning: the integration of low- and high-fidelity measurements through molecular representation learning, taking into account the orders-of-magnitude difference in size between the primary and confirmatory screens. Here we detail the steps taken to assemble MF-PCBA in terms of data acquisition from PubChem and the filtering steps required to curate the raw data. We also provide an evaluation of a recent deep-learning-based method for multifidelity integration across the introduced datasets, demonstrating the benefit of leveraging all HTS modalities, and a discussion in terms of the roughness of the molecular activity landscape. In total, MF-PCBA contains over 16.6 million unique molecule-protein interactions. The datasets can be easily assembled by using the source code available at https://github.com/davidbuterez/mf-pcba.
Collapse
Affiliation(s)
- David Buterez
- Department
of Computer Science and Technology, University
of Cambridge, Cambridge CB3 0FD, U.K.
| | - Jon Paul Janet
- Molecular
AI, Discovery Sciences, R&D, AstraZeneca, 431 50 Gothenburg, Sweden
| | - Steven J. Kiddle
- Data
Science & Advanced Analytics, Data Science & Artificial Intelligence,
R&D, AstraZeneca, Cambridge CB2 8PA, U.K.
| | - Pietro Liò
- Department
of Computer Science and Technology, University
of Cambridge, Cambridge CB3 0FD, U.K.
| |
Collapse
|
43
|
Morado J, Mortenson PN, Nissink JWM, Essex JW, Skylaris CK. Does a Machine-Learned Potential Perform Better Than an Optimally Tuned Traditional Force Field? A Case Study on Fluorohydrins. J Chem Inf Model 2023; 63:2810-2827. [PMID: 37071825 PMCID: PMC10170518 DOI: 10.1021/acs.jcim.2c01510] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/20/2023]
Abstract
We present a comparative study that evaluates the performance of a machine learning potential (ANI-2x), a conventional force field (GAFF), and an optimally tuned GAFF-like force field in the modeling of a set of 10 γ-fluorohydrins that exhibit a complex interplay between intra- and intermolecular interactions in determining conformer stability. To benchmark the performance of each molecular model, we evaluated their energetic, geometric, and sampling accuracies relative to quantum-mechanical data. This benchmark involved conformational analysis both in the gas phase and chloroform solution. We also assessed the performance of the aforementioned molecular models in estimating nuclear spin-spin coupling constants by comparing their predictions to experimental data available in chloroform. The results and discussion presented in this study demonstrate that ANI-2x tends to predict stronger-than-expected hydrogen bonding and overstabilize global minima and shows problems related to inadequate description of dispersion interactions. Furthermore, while ANI-2x is a viable model for modeling in the gas phase, conventional force fields still play an important role, especially for condensed-phase simulations. Overall, this study highlights the strengths and weaknesses of each model, providing guidelines for the use and future development of force fields and machine learning potentials.
Collapse
Affiliation(s)
- João Morado
- School of Chemistry, University of Southampton, Highfield, Southampton SO17 1BJ, United Kingdom
| | - Paul N Mortenson
- Astex Pharmaceuticals, 436 Cambridge Science Park, Milton Road, Cambridge CB4 0QA, United Kingdom
| | - J Willem M Nissink
- Computational Chemistry, Oncology R&D, AstraZeneca, Cambridge CB4 0WG, United Kingdom
| | - Jonathan W Essex
- School of Chemistry, University of Southampton, Highfield, Southampton SO17 1BJ, United Kingdom
| | - Chris-Kriton Skylaris
- School of Chemistry, University of Southampton, Highfield, Southampton SO17 1BJ, United Kingdom
| |
Collapse
|
44
|
Goldman N, Fried LE, Lindsey RK, Pham CH, Dettori R. Enhancing the accuracy of density functional tight binding models through ChIMES many-body interaction potentials. J Chem Phys 2023; 158:144112. [PMID: 37061479 DOI: 10.1063/5.0141616] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/17/2023] Open
Abstract
Semi-empirical quantum models such as Density Functional Tight Binding (DFTB) are attractive methods for obtaining quantum simulation data at longer time and length scales than possible with standard approaches. However, application of these models can require lengthy effort due to the lack of a systematic approach for their development. In this work, we discuss the use of the Chebyshev Interaction Model for Efficient Simulation (ChIMES) to create rapidly parameterized DFTB models, which exhibit strong transferability due to the inclusion of many-body interactions that might otherwise be inaccurate. We apply our modeling approach to silicon polymorphs and review previous work on titanium hydride. We also review the creation of a general purpose DFTB/ChIMES model for organic molecules and compounds that approaches hybrid functional and coupled cluster accuracy with two orders of magnitude fewer parameters than similar neural network approaches. In all cases, DFTB/ChIMES yields similar accuracy to the underlying quantum method with orders of magnitude improvement in computational cost. Our developments provide a way to create computationally efficient and highly accurate simulations over varying extreme thermodynamic conditions, where physical and chemical properties can be difficult to interrogate directly, and there is historically a significant reliance on theoretical approaches for interpretation and validation of experimental results.
Collapse
Affiliation(s)
- Nir Goldman
- Physical and Life Sciences Directorate, Lawrence Livermore National Laboratory, Livermore, California 94550, USA
| | - Laurence E Fried
- Physical and Life Sciences Directorate, Lawrence Livermore National Laboratory, Livermore, California 94550, USA
| | - Rebecca K Lindsey
- Department of Chemical Engineering, University of Michigan, Ann Arbor, Michigan 48109, USA
| | - C Huy Pham
- Physical and Life Sciences Directorate, Lawrence Livermore National Laboratory, Livermore, California 94550, USA
| | - R Dettori
- Physical and Life Sciences Directorate, Lawrence Livermore National Laboratory, Livermore, California 94550, USA
| |
Collapse
|
45
|
Chang CF, Rangarajan S. Machine Learning and Informatics Based Elucidation of Reaction Pathways for Upcycling Model Polyolefin to Aromatics. J Phys Chem A 2023; 127:2958-2966. [PMID: 36975726 PMCID: PMC10249406 DOI: 10.1021/acs.jpca.3c01444] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2023] [Revised: 03/13/2023] [Indexed: 03/29/2023]
Abstract
Catalytic upcycling of plastics results in a complex network of potentially thousands of reactions and intermediates. Manual analysis of such a network using ab initio methods to identify plausible reaction pathways and rate-controlling steps is intractable. Here, we combine informatics-based reaction network generation and machine learning based thermochemistry calculation to identify plausible (nonelementary step) pathways involved in dehydroaromatization of a model polyolefin, n-decane, to form aromatic products. All 78 aromatic molecules found involve a sequence comprising dehydrogenation, β-scission, and cyclization steps (in slightly different order). The plausible flux-carrying pathway depends on the family of reactions that is rate-controlling while the thermodynamic bottleneck is the first dehydrogenation step of n-decane. The adopted workflow is system agnostic and can be applied to understand the overall thermochemistry of other upcycling systems.
Collapse
Affiliation(s)
- Chin-Fei Chang
- Department of Chemical & Biomolecular
Engineering, Lehigh University, Bethlehem, Pennsylvania 18015, United States
| | - Srinivas Rangarajan
- Department of Chemical & Biomolecular
Engineering, Lehigh University, Bethlehem, Pennsylvania 18015, United States
| |
Collapse
|
46
|
Anstine D, Isayev O. Machine Learning Interatomic Potentials and Long-Range Physics. J Phys Chem A 2023; 127:2417-2431. [PMID: 36802360 PMCID: PMC10041642 DOI: 10.1021/acs.jpca.2c06778] [Citation(s) in RCA: 18] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2022] [Revised: 02/03/2023] [Indexed: 02/23/2023]
Abstract
Advances in machine learned interatomic potentials (MLIPs), such as those using neural networks, have resulted in short-range models that can infer interaction energies with near ab initio accuracy and orders of magnitude reduced computational cost. For many atom systems, including macromolecules, biomolecules, and condensed matter, model accuracy can become reliant on the description of short- and long-range physical interactions. The latter terms can be difficult to incorporate into an MLIP framework. Recent research has produced numerous models with considerations for nonlocal electrostatic and dispersion interactions, leading to a large range of applications that can be addressed using MLIPs. In light of this, we present a Perspective focused on key methodologies and models being used where the presence of nonlocal physics and chemistry are crucial for describing system properties. The strategies covered include MLIPs augmented with dispersion corrections, electrostatics calculated with charges predicted from atomic environment descriptors, the use of self-consistency and message passing iterations to propagated nonlocal system information, and charges obtained via equilibration schemes. We aim to provide a pointed discussion to support the development of machine learning-based interatomic potentials for systems where contributions from only nearsighted terms are deficient.
Collapse
Affiliation(s)
- Dylan
M. Anstine
- Department of Chemistry,
Mellon College of Science, Carnegie Mellon
University, Pittsburgh, Pennsylvania 15213, United States
| | - Olexandr Isayev
- Department of Chemistry,
Mellon College of Science, Carnegie Mellon
University, Pittsburgh, Pennsylvania 15213, United States
| |
Collapse
|
47
|
Pan X, Zhao F, Zhang Y, Wang X, Xiao X, Zhang JZH, Ji C. MolTaut: A Tool for the Rapid Generation of Favorable Tautomer in Aqueous Solution. J Chem Inf Model 2023; 63:1833-1840. [PMID: 36939644 DOI: 10.1021/acs.jcim.2c01393] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/21/2023]
Abstract
Fast and proper treatment of the tautomeric states for drug-like molecules is critical in computer-aided drug discovery since the major tautomer of a molecule determines its pharmacophore features and physical properties. We present MolTaut, a tool for the rapid generation of favorable states of drug-like molecules in water. MolTaut works by enumerating possible tautomeric states with tautomeric transformation rules, ranking tautomers with their relative internal energies and solvation energies calculated by AI-based models, and generating preferred ionization states according to predicted microscopic pKa. Our test shows that the ranking ability of the AI-based tautomer scoring approach is comparable to the DFT method (wB97X/6-31G*//M062X/6-31G*/SMD) from which the AI models try to learn. We find that the substitution effect on tautomeric equilibrium is well predicted by MolTaut, which is helpful in computer-aided ligand design. The source code of MolTaut is freely available to researchers and can be accessed at https://github.com/xundrug/moltaut. To facilitate the usage of MolTaut by medicinal chemists, we made a free web server, which is available at http://moltaut.xundrug.cn. MolTaut is a handy tool for investigating the tautomerization issue in drug discovery.
Collapse
Affiliation(s)
- Xiaolin Pan
- Shanghai Engineering Research Center of Molecular Therapeutics and New Drug Development, School of Chemistry and Molecular Engineering, East China Normal University, Shanghai 200062, China.,NYU-ECNU Center for Computational Chemistry at NYU Shanghai, Shanghai 200062, China
| | - Fanyu Zhao
- NYU-ECNU Center for Computational Chemistry at NYU Shanghai, Shanghai 200062, China.,Department of Chemistry, New York University, New York 10003, United States
| | - Yueqing Zhang
- Shanghai Engineering Research Center of Molecular Therapeutics and New Drug Development, School of Chemistry and Molecular Engineering, East China Normal University, Shanghai 200062, China.,NYU-ECNU Center for Computational Chemistry at NYU Shanghai, Shanghai 200062, China
| | - Xingyu Wang
- NYU-ECNU Center for Computational Chemistry at NYU Shanghai, Shanghai 200062, China
| | - Xudong Xiao
- Shanghai Engineering Research Center of Molecular Therapeutics and New Drug Development, School of Chemistry and Molecular Engineering, East China Normal University, Shanghai 200062, China
| | - John Z H Zhang
- Shanghai Engineering Research Center of Molecular Therapeutics and New Drug Development, School of Chemistry and Molecular Engineering, East China Normal University, Shanghai 200062, China.,CAS Key Laboratory of Quantitative Engineering Biology, Shenzhen Institute of Synthetic Biology, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, Guangdong 518055, China.,Faculty of Synthetic Biology, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China.,NYU-ECNU Center for Computational Chemistry at NYU Shanghai, Shanghai 200062, China.,Department of Chemistry, New York University, New York 10003, United States.,Collaborative Innovation Center of Extreme Optics, Shanxi University, Taiyuan, Shanxi 030006, China
| | - Changge Ji
- Shanghai Engineering Research Center of Molecular Therapeutics and New Drug Development, School of Chemistry and Molecular Engineering, East China Normal University, Shanghai 200062, China.,NYU-ECNU Center for Computational Chemistry at NYU Shanghai, Shanghai 200062, China
| |
Collapse
|
48
|
Kulichenko M, Barros K, Lubbers N, Li YW, Messerly R, Tretiak S, Smith JS, Nebgen B. Uncertainty-driven dynamics for active learning of interatomic potentials. NATURE COMPUTATIONAL SCIENCE 2023; 3:230-239. [PMID: 38177878 PMCID: PMC10766548 DOI: 10.1038/s43588-023-00406-5] [Citation(s) in RCA: 15] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/27/2022] [Accepted: 01/24/2023] [Indexed: 01/06/2024]
Abstract
Machine learning (ML) models, if trained to data sets of high-fidelity quantum simulations, produce accurate and efficient interatomic potentials. Active learning (AL) is a powerful tool to iteratively generate diverse data sets. In this approach, the ML model provides an uncertainty estimate along with its prediction for each new atomic configuration. If the uncertainty estimate passes a certain threshold, then the configuration is included in the data set. Here we develop a strategy to more rapidly discover configurations that meaningfully augment the training data set. The approach, uncertainty-driven dynamics for active learning (UDD-AL), modifies the potential energy surface used in molecular dynamics simulations to favor regions of configuration space for which there is large model uncertainty. The performance of UDD-AL is demonstrated for two AL tasks: sampling the conformational space of glycine and sampling the promotion of proton transfer in acetylacetone. The method is shown to efficiently explore the chemically relevant configuration space, which may be inaccessible using regular dynamical sampling at target temperature conditions.
Collapse
Affiliation(s)
- Maksim Kulichenko
- Theoretical Division, Los Alamos National Laboratory, Los Alamos, NM, USA.
| | - Kipton Barros
- Theoretical Division, Los Alamos National Laboratory, Los Alamos, NM, USA
- Center for Nonlinear Studies, Los Alamos National Laboratory, Los Alamos, NM, USA
| | - Nicholas Lubbers
- Computer, Computational, and Statistical Sciences Division, Los Alamos National Laboratory, Los Alamos, NM, USA
| | - Ying Wai Li
- Computer, Computational, and Statistical Sciences Division, Los Alamos National Laboratory, Los Alamos, NM, USA
| | - Richard Messerly
- Theoretical Division, Los Alamos National Laboratory, Los Alamos, NM, USA
| | - Sergei Tretiak
- Theoretical Division, Los Alamos National Laboratory, Los Alamos, NM, USA
- Center for Nonlinear Studies, Los Alamos National Laboratory, Los Alamos, NM, USA
- Center for Integrated Nanotechnologies, Los Alamos National Laboratory, Los Alamos, NM, USA
| | - Justin S Smith
- Theoretical Division, Los Alamos National Laboratory, Los Alamos, NM, USA.
- Nvidia Corporation, Santa Clara, CA, USA.
| | - Benjamin Nebgen
- Theoretical Division, Los Alamos National Laboratory, Los Alamos, NM, USA.
| |
Collapse
|
49
|
Zeng J, Tao Y, Giese TJ, York DM. QDπ: A Quantum Deep Potential Interaction Model for Drug Discovery. J Chem Theory Comput 2023; 19:1261-1275. [PMID: 36696673 PMCID: PMC9992268 DOI: 10.1021/acs.jctc.2c01172] [Citation(s) in RCA: 12] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/27/2023]
Abstract
We report QDπ-v1.0 for modeling the internal energy of drug molecules containing H, C, N, and O atoms. The QDπ model is in the form of a quantum mechanical/machine learning potential correction (QM/Δ-MLP) that uses a fast third-order self-consistent density-functional tight-binding (DFTB3/3OB) model that is corrected to a quantitatively high-level of accuracy through a deep-learning potential (DeepPot-SE). The model has the advantage that it is able to properly treat electrostatic interactions and handle changes in charge/protonation states. The model is trained against reference data computed at the ωB97X/6-31G* level (as in the ANI-1x data set) and compared to several other approximate semiempirical and machine learning potentials (ANI-1x, ANI-2x, DFTB3, MNDO/d, AM1, PM6, GFN1-xTB, and GFN2-xTB). The QDπ model is demonstrated to be accurate for a wide range of intra- and intermolecular interactions (despite its intended use as an internal energy model) and has shown to perform exceptionally well for relative protonation/deprotonation energies and tautomers. An example application to model reactions involved in RNA strand cleavage catalyzed by protein and nucleic acid enzymes illustrates QDπ has average errors less than 0.5 kcal/mol, whereas the other models compared have errors over an order of magnitude greater. Taken together, this makes QDπ highly attractive as a potential force field model for drug discovery.
Collapse
Affiliation(s)
- Jinzhe Zeng
- Laboratory for Biomolecular Simulation Research, Institute for Quantitative Biomedicine and Department of Chemistry and Chemical Biology, Rutgers University, Piscataway, NJ 08854, USA
| | - Yujun Tao
- Laboratory for Biomolecular Simulation Research, Institute for Quantitative Biomedicine and Department of Chemistry and Chemical Biology, Rutgers University, Piscataway, NJ 08854, USA
| | - Timothy J. Giese
- Laboratory for Biomolecular Simulation Research, Institute for Quantitative Biomedicine and Department of Chemistry and Chemical Biology, Rutgers University, Piscataway, NJ 08854, USA
| | - Darrin M. York
- Laboratory for Biomolecular Simulation Research, Institute for Quantitative Biomedicine and Department of Chemistry and Chemical Biology, Rutgers University, Piscataway, NJ 08854, USA
| |
Collapse
|
50
|
Pinheiro M, Zhang S, Dral PO, Barbatti M. WS22 database, Wigner Sampling and geometry interpolation for configurationally diverse molecular datasets. Sci Data 2023; 10:95. [PMID: 36792601 PMCID: PMC9931705 DOI: 10.1038/s41597-023-01998-3] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2022] [Accepted: 02/01/2023] [Indexed: 02/17/2023] Open
Abstract
Multidimensional surfaces of quantum chemical properties, such as potential energies and dipole moments, are common targets for machine learning, requiring the development of robust and diverse databases extensively exploring molecular configurational spaces. Here we composed the WS22 database covering several quantum mechanical (QM) properties (including potential energies, forces, dipole moments, polarizabilities, HOMO, and LUMO energies) for ten flexible organic molecules of increasing complexity and with up to 22 atoms. This database consists of 1.18 million equilibrium and non-equilibrium geometries carefully sampled from Wigner distributions centered at different equilibrium conformations (either at the ground or excited electronic states) and further augmented with interpolated structures. The diversity of our datasets is demonstrated by visualizing the geometries distribution with dimensionality reduction as well as via comparison of statistical features of the QM properties with those available in existing datasets. Our sampling targets broader quantum mechanical distribution of the configurational space than provided by commonly used sampling through classical molecular dynamics, upping the challenge for machine learning models.
Collapse
Affiliation(s)
- Max Pinheiro
- Aix Marseille University, CNRS, ICR, Marseille, France.
| | - Shuang Zhang
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, Department of Chemistry, and College of Chemistry and Chemical Engineering, Xiamen University, Xiamen, China
| | - Pavlo O Dral
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, Department of Chemistry, and College of Chemistry and Chemical Engineering, Xiamen University, Xiamen, China
| | - Mario Barbatti
- Aix Marseille University, CNRS, ICR, Marseille, France.
- Institut Universitaire de France, 75231, Paris, France.
| |
Collapse
|